Healthcare Industry Grapples With Synthetic Data Obstacles


Healthcare companies have been intrigued by the potential of “synthetic data,” which involves using artificial intelligence algorithms to generate data from real datasets. However, challenges in technology have hindered its widespread adoption in the industry.

Researchers in the medical field have experimented with synthetic data to analyze the effects of drugs on specific subpopulations without privacy and regulatory obstacles. Although, as per an insight, it was predicted that 60% of data used for AI and analytics projects would be synthetically generated by the following year, this projection has not been realized. Synthetic data has been successful in certain domains like training self-driving cars, but its adoption in health and drug research, where it could aid in generating medical records, remains limited.

Obstacles to adoption include high costs, a limited number of vendors, and the difficulty of ensuring that synthetic data accurately reflects the target population. Complexity and variability in healthcare make this a challenging problem to solve. Generating representative data is complicated due to the numerous variables present in patients, such as medications, lifestyle factors, and medical conditions. Maintaining the right mix of variables in synthetic data to enable accurate analysis is crucial.

There is a trade-off between accuracy and privacy when creating synthetic data. Closer resemblance to the original data enhances accuracy but raises the risk of leaking sensitive information. Developing techniques to preserve original data privacy without compromising synthetic data accuracy is a potential avenue for improvement.

The healthcare industry is cautious due to uncertainties surrounding synthetic data’s representativeness. While some companies have shown interest and conducted experiments, many are holding back. Illumina, a genomics company, published a promising insight on synthetic data but later deprioritized it in favor of real data usage.

Another challenge is the emerging vendor market, which mainly comprises startups. Established cloud providers, preferred by businesses, have not significantly entered the market, creating a demand-supply conundrum.

The healthcare sector’s reluctance to adopt synthetic data might change as the technology matures and its challenges are addressed.