20 December 2019

New grant to research synthetic health data

Data

A new research project will work towards developing a method which can generate synthetic data based on original data. The project is funded by Novo Nordisk Foundation.

An image of codingA new research project based at the University of Copenhagen will research synthetic health data. The researchers aims to develop and refine a method that can use original data to generate synthetic data sets. The Novo Nordisk Foundation is supporting the project with a grant of DKK 7.5 million.

“Data must be used for the purpose for which they are collected. This is a good starting-point, because we need to know what our data is used for. Further, the healthcare sector urgently needs to develop new solutions, but this requires sharing data more flexibly. Synthetic data can help meet this need because they are based on original data cleared of any details that could be traced back to the original data and thereby the people who provided them,” says Henning Langberg, Professor, Department of Public Health, University of Copenhagen, the recipient of the grant from the Foundation.

Open-source access will ensure quality

The project, called Synthetic Health and Research Data (SHARED), is a proof-of-concept project intended to show that a method can be found that can actually transform original data into synthetic data in a way that makes it impossible to trace the data back to the sources. The synthetic data are created by running an original data set through a mathematical program that adds noise on the data set to ensure that the synthetic data cannot be attributed to specific individuals while maintaining a dispersion and context that makes them statistically valid. This enables data to be shared – without compromising data security.

“An elaborate and secure model capable of generating synthetic data can help to harness the great potential inherent in deriving new contexts from our common health data in a safe and secure way. The results of the project can influence both disease prevention and treatment, not only in Denmark’s healthcare sector but throughout the Nordic countries,” says Niels-Henrik von Holstein-Rathlou, Head of Biomed, Novo Nordisk Foundation.

Together with Finnish partners Turku University Hospital and Institute for Molecular Medicine Finland (FIMM), Henning Langberg will work to develop a mathematical method that can transform the original data into synthetic data and to test the methods and models developed in a test battery that enables them to test how well the synthetic data are like the original data.

“Our major challenge is to include as many parameters as possible in the synthetic dataset without losing the contexts between data. In addition, it is important for us to have an open-source approach to developing the method so that the academic community can ask relevant questions about the method during the project. This is essential when working in such a sensitive and regulated area as health data,” explains Henning Langberg.