Policy is increasingly demanding that analysis is provided in a more timely and comprehensible manner. At the same time, data is coming from more diverse sources, and needs to be combined across disciplines and organisations. COVID-19 has thrown that into stark relief.
To meet the challenges, all parts of the research process will need to be scaled and connected so that analysis can be assured of the provenance and meaning of the underlying data. Just throwing people or compute at the problem is not a solution.
Structured metadata allows description of the data lifecycle, that can be actioned by machines and made available for analysts to be assured that they are looking at data that is trustworthy and can be combined in a meaningful way.
However, we don't live in a world where data providers or data collectors are well versed in or incentivised to produce good quality structured metadata.
We believe a new approach can assist data providers, collectors and managers in enhancing data with this vital information.
We are in the enviable position of having a large resource of questionnaires which can be used as a training dataset for a range of machine learning approaches. Our initial work will focus on understanding the strengths and weaknesses of both the different approaches and of our underlying datasets to inform a discussion of how we can move towards higher levels of automation of metadata capture.