The data scientists trying to unlock our medical future

Precision medicine needs experts in Raspberry Pi.

Peter Lovegrove

Peter Lovegrove

Media Relations and Video Production Manager, DNV GL - Group

If Soojin Ji was to slip a flyer into your post box advertising her many employable credentials, it might be signed off with, “no job too big or too small.” She is not, though, seeking to landscape your garden or walk your chihuahuas, she is a Cornell graduate with a toolbox shaped by the internet of things. Her education in biomedical engineering has not limited her professional life to one field.

Raspberry Pi

Vibeke Binz Vallevik

Principal Researcher, precision medicine

“It uses sensors and Raspberry Pi.” She is trying to rectify a problem familiar to anyone who has spent time in Singapore; overactive air-conditioning.  The simple computer (or Raspberry Pi) sends humidity and temperature data to a dashboard created by Soojin, which also receives feedback from users about how comfortable they were in the room.

Next? Rectangle boxes move carefully up and down her screen in the rudimentary style of a computer game from the 1980s.  

“They’re data twins of elevators,” Soojin explains, who spends her days exploring data as part of DNV GL’s Digital Solutions team in Singapore.  She has developed a program that measures various parameters of elevators that alerts the maintenance crews when one of their assets requires routine or emergency attention.

Soojin is taking this data-centric approach to her latest challenge that fits into the “too big” category – treating aggressive brain cancer.  Glioblastoma has an earth-shattering prognosis.  The insidious tumor cells have a tendency to infiltrate the surrounding brain tissue, meaning that surgery and supplementary irradiation and chemotherapy only have a moderate impact on the overall outcome.  Median survival of patients is less than a year.  

The Oslo Health Hackathon, which lists Soojin amongst the hundred participants, seeks a relatively modest outcome over its two month timeframe.  Many patients undergo multiple resections of the tumor, but it is unclear what the benefits of the procedure are, and by using different datasets, the teams are aiming to find a best practice for doctors. 

Running a virtual Hackathon – where individual or groups of domain experts, normally programmers, try to solve a problem over a limited period of time - might seem like an unlikely approach considering the very human nature of the problem, but like so many industries, healthcare is being transformed by big data.  

“We’re going to write an AI program to read the MRIs.  It could spot tumors that doctors miss,” says Soojin. MRIs are those semi-translucent scans of the brain that actors in hospital dramas scroll through on their computers and are rarely the harbinger of good news.  The introductory literature offers advice on Python (the programming language) packages that can read the files whilst suggested parameters are picked out from the more general data sets, such as days since diagnosis. 

Building trust through doing

“We’re going to write an AI program to read the MRIs. It could spot tumors that doctors miss"

Soojin Ji

The diverse backgrounds of the Hackathon’s organizers - Acando, BigMed, DNV GL and the Cancer Society of Norway – is another vivid example of how the promised land of precision medicine is blurring the boundaries between industries.  For these players, it is the process that is paramount.  From practical considerations, such as the secure distribution of huge datasets, to ethical questions around the transfer of health data – these different partners can only overcome such boundaries by doing. 

The much-vaunted Moore’s Law is often used to predict the cheapening of technology per the doubling of capacity - most famously in the case of computer chips - but the cost of mapping the genome has reduced at even more rapid rate.   By the time the mapping of the first genome was complete at the beginning of the millennium, fifteen years had ticked by since the project’s inception and more than two billion dollars had been spent.  Now, companies are battling it out to deliver the same service for hundreds of dollars per test.  And whilst the dream of producing tailored treatments based on such data is easy to conjure, the reality of dealing with genetic information is much more complex - each genome consists of 23 chromosomes that are built on around three-billion DNA pairs. 

Even the relatively narrow set of data that the Hackathon deals with can cause problems, as encountered by Soojin. The 400GB of data is accessed from DNV GL’s Veracity platform but when she downloads and opens it on her regular work laptop, her computer says ‘no’.  The average computer does not take kindly to the unzipping of such large files, rather this is the domain of the Azure-powered Veracity platform – the cloud-based ecosystem has been specifically designed to deal with such big data challenges.  Veracity is described by its makers as “secure and scalable,” and with access to Microsoft Azure’s fifty-plus datacenters globally, its cloud-based solution is the most viable option for partners collaborating across international borders.

Jo Øvstaas

Lead Data Scientist at DNV GL

Selecting the right platform does not just have technical implications, it is also central to addressing issues around trust and data confidentiality. Although the data container is on Veracity, neither the platform itself, nor DNV GL, can ‘see’ the contents. Rather, the data managers (in this case Acando) have reserved what can be compared to a highly secure parking space in the data fabric, and only Acando can distribute the keys to access it.  “We have no interest in harvesting the data on Veracity, which is why the ‘keys’ to the datasets always remain in the owner’s hands,” says Jo Øvstaas, who is a Lead Data Scientist at DNV GL and one of the Hackathon’s organizers. "Veracity is an ecosystem where data managers can securely share data and access services to analyze it and it is potentially an ideal location for sensitive information such as healthcare data.  The Hackathon has allowed us to explore issues around GDPR compliance and Data Processing Impact Assessment (DPIA) and although these are matters are mundane for the general user, they are vital to building trustworthy process to share medical data."

As is the case with most new technologies, the key stakeholders move cautiously when personal wellbeing is involved. Scandinavia has great potential to exploit precision medicine - with its centralized system of recordkeeping and high trust in government - but authorities have so far been watchful, and it is illuminating that the dataset in the Oslo Health Hackathon originates from the US.  

“The American TCGA cancer datasets are unique because there are not many open datasets in the world. It is very sad that politicians and hospital authorities do not fully understand how important open data is for research and development of new cancer treatments,” says Biljana Stangeland, Managing Consultant in Data Science at Acando Norway and one of the organizers.  “As a former scientist in the area of cancer research at Oslo University Hospital I tried to identify new targets for glioblastoma treatment. My published bioinformatic work consisted predominantly (up to 80%) of the analysis performed on the open data sets. Only 20% of the work was related to data that we generated from local (Norwegian) patients. In other words, if our team didn’t have access to the open cancer data we would have published much less, and more importantly, the impact of the work would have been much lower.”

Convincing policymakers to think and act beyond traditional healthcare boundaries will continue to be a gradual process and the Hackathon will be a demonstration of how partners from different industries can collaborate to improve patient wellbeing.

The winners of the Hackathon will be announcement at a Hackathon-seminar on the value of health data on 25th April 2019.