Considerable investments have been made in Europe and worldwide in research data infrastructures. Instead of a general lack of data about data, it has become apparent that the pivotal factor that drastically constrains the use of data is the absence of contextual knowledge about how data was created and how it has been used. This applies especially to many branches of SSH research where data is highly heterogeneous, both by its kind (e.g. being qualitative, quantitative, naturalistic, purposefully created) and origins (e.g. being historical/contemporary, from different contexts and geographical places). The problem is that there may be enough metadata (data about data) but there is too little paradata (data on the processes of its creation and use).
In contrast to the rather straightforward problem of describing the data, the high-risk/high-gain problem no-one has managed to solve, is the lack of comprehensive understanding of what information about the creation and use of research data is needed and how to capture enough of that information to make the data reusable and to avoid the risk that currently collected vast amounts of research data become useless in the future. The wickedness of the problem lies in the practical impossibility to document and keep everything and the difficulty to determine optimal procedures for capturing just enough.
With an empirical focus on archaeological and cultural heritage data, which stands out by its extreme heterogeneity and rapid accumulation due to the scale of ongoing development-led archaeological fieldwork, CAPTURE develops an in-depth understanding of how paradata is #1 created and #2 used at the moment, #3 elicits methods for capturing paradata on the basis of the findings of #1-2, #4 tests the new methods in field trials, and #5 synthesises the findings in a reference model to inform the capturing of paradata and enabling data-intensive research using heterogeneous research data stemming from diverse origins.