Project Overview

The project was conceived as a collaboration between Jon Johnson at CLOSER & Dr Suparna De at the University of Surrey, to investigate the intersection of metadata annotation in the social sciences and machine learning.

We aim to disseminate our findings to the Computer Science, Social Science and Metadata communities at conferences and in journals, so that these communities can gain a better understanding of the subjects under consideration and the benefits cross-discplinary collaboration.

Pilot projects

We secured a small grant from ESRC in 2021 to look at extraction of metadata from social science questionnaires, this was supplemented by a grant from DIRAC to look at predicting concepts from question text, which is related to an adjacent problem in astronomy - it produces a large amount of unstructured text than would benefit from conceptual classification

DIRAC supported further work to look at more sophisticated models for concept prediction.

METACURATE-ML

This project is funded by UKRI through the ESRC Future Data Service Program and brings together CLOSER, UK Data Service (UKDS), the Computer Science Department at the University of Surrey, and the Scottish Centre for Social Research (ScotCen) to generate metadata which is FAIR ready and can be utilised by these emerging data services.

We will be bulding on the pilot projects successes and learning to overcome the barriers. More details at: the project page

Contact Us

Metadata Automation Project

Grants

  • Extraction and Utilisation of Metadata from Non-machine-actionable Documents to Improve Data Curation and Discovery. (2024). ESRC. ES/Z502935/1
  • Understanding the multiple dimensions of prediction of concepts in social and biomedical science questionnaires. (2022). STFC. ST/S003916/1
  • Machine learning to enhance metadata in cohort studies. (2021). STFC. ST/S003916/1
  • Automating capturing structured content from questionnaires. (2021) ESRC. ES/K000357/1

Our Funders