Outputs

Publications

Li, W.Y., Wang, Z., Johnson, J., De, S (2025). Are Information Retrieval Approaches Good at Harmonising Longitudinal Surveys in Social Science? arXiv preprint arXiv:2504.20679

De, S., Jangra, S., Agarwal, V., Johnson, J., Sastry, N. (2023). Biases and Ethical Considerations for Machine Learning Pipelines in the Computational Social Sciences. In Ethics in Artificial Intelligence: Bias, Fairness and Beyond (pp. 99-113). Singapore: Springer Nature Singapore

Sharifian-Attar, V., De, S., Jabbari, S., Li, J., Moss, H. (2022). Analysing Longitudinal Social Science Questionnaires: Topic modelling with BERT-based Embeddings. IEEE International Conference on Big Data Special session MLBD 2022

Johnson, J., Lukose, E., De, S. (2022). Privacy Pitfalls of Online Service Terms and Conditions: a Hybrid Approach for Classification and Summarization. NLLP 2022 Workshop at EMNLP

De, S., Moss, H., Johnson, J., Li, J., Pereira, H., Jabbari, S. (2022). Engineering a machine learning pipeline for automating metadata extraction from longitudinal survey questionnaires. IASSIST Quarterly. 46, 1 (Mar. 2022)

De, S., Johnson, J., Li, J., Moss, H., Jabbari, S. (2022). Understanding the multiple dimensions of prediction of concepts in social and biomedical science questionnaires. DiRAC Federation Project, D-FED 3.1.7

Conference Presentations (forthcoming)

Li, W.Y., Wang, Z., Johnson, J., De, S (2025). Are Information Retrieval Approaches Good at Harmonising Longitudinal Surveys in Social Science? SIGIR 2025 (48th International ACM SIGIR Conference on Research and Development in Information Retrieval) Padua, Italy.

Johnson, J., De, S, Li, W.Y., Pravin, C., Bradshaw, P. (2025). What should FAIR Question Banks look like and how do we get there? European Social Research Association Conference, Utrecht, Netherlands

Lungley, D. (2025). Towards High Performance Data Curation Statistical Disclosure Control Tooling ADR UK Conference, Cardiff, UK

Johnson, J. & Bradshaw, P. (2025). Investing in metadata – improving survey data management and data sharing. Society for Longitudinal and Lifecourse Studies Conference

Lungley, D. (2025). Metacurate-ML. SciDataCon'25, Brisbane, Australia

Conference Presentations

Lungley, D., De, S., Pravin, C., Johnson, J., Bradshaw, P. (June 2025) Back To The Rough Ground! Retrieving Concepts In Survey Research and Its Potential Uses. iAssist 2025, Bristol, UK

De, S., Pravin, C., Li, W.Y., Johnson, J. (June 2025) Back to the rough ground! Retrieving concepts in survey research and its potential uses: AI modelling for Social Science data. iAssist 2025, Bristol, UK

Lungley, D., Joy, J., Evodokimov, I. (June 2025) Back to the rough ground! Retrieving concepts in survey research and its potential uses: UKDS Curation Tooling iAssist 2025, Bristol, UK

Li, W.Y., De, S., Wang, Z., Lungley, D., Bradshaw, P., Johnson, J. (December 2024) Metacurate-ML: Conceptual Comparison. European DDI Users Conference 2024, Chur, Switzerland

Pravin, C., De, S., Wang, Z., Lungley, D., Bradshaw, P., Johnson, J. (December 2024) Metacurate-ML: Metadata Extraction from CAIs. European DDI Users Conference 2024, Chur, Switzerland

Lungley, D., Joy, J., Tayal, S., Evodokimov, I., Bolton, S., Smy,L., Afkhami, R., Bradshaw, P., Johnson, J. & De, S. (December 2024) Metacurate-ML: Enhanced Data Curation - Automation of Disclosure Control Assessment. European DDI Users Conference 2024, Chur, Switzerland

​​Johnson, J. (2024) Extraction and Utilisation of Metadata from Non-machine-actionable Documents to Improve Data Curation and Discovery NCRM MethodsCon: Futures, Seprmber 2024, Mancnester, United Kingdom

Johnson, Jon., De, Suparna. (2023) Initial findings from the automation of extraction of metadata from questionnaires and its classification. European Social Research Association Conference, Milan, Italy

De, Suparna., Moss, Harry., Jabbari, Sanaz., Pereira, Haeron., Johnson, Jon., Li, Jenny. (2021) Engineering a Machine Learning Pipeline for Automating Metadata Extraction from Longitudinal Survey Questionnaires. European DDI Users Conference 2021, Paris, France

Workshops

Li, W.Y., Pravin, C. (June 2025) AI-Enabled Data Practices For Metadata Discovery and Access: Best Practices For Developing Training Data. iAssist 2025, Bristol, UK

Johnson, J. (November 2023) Metadata Uplift and Machine Learning - European Perspectives. European DDI Users Conference, Ljubljana, Slovenia