By Mike Miliard, HealthcareIT News | October 2, 2018
A 16-year-old high school student built a website around public data from the NIH ClinVar archive, helping clinicians advance genomic treatments, and says similar innovations depend on more freely-available data.
When Justin Aronson takes the stage at the HIMSS Big Data and Healthcare Analytics Forum on October 23, he’ll be the youngest-ever speaker at one of our events — a feat that only makes the 16-year-old high school junior’s accomplishments all the more impressive.
As student at Boston University Academy High School, Aronson has a knack for a computer science and a curiosity about genomics that dates back to grade school.
So he spent about $60 in hosting fees, and with the help of his father’s Partners HealthCare colleagues built a website – fueled by public data from NIH’s ClinVar archive – that helps laboratories doing genetic testing to check whether their own genetic variant classifications conflict with those other labs.
The site, VariantExplorer, helps drive quality and safety improvements for precision medicine by enabling healthcare organizations to more easily identify the various interpretation discrepancies in ClinVar, which comprises reports on the relationships among different genomic variants and phenotypes submitted by labs, researchers and other clinicians.
“Given the large number of submitters to ClinVar, many variants have interpretations from multiple submitters and those interpretations may not always agree,” the site explains
“I was interested in biology and genetics since I was young, but I never got an opportunity to do work with the topics until I was in seventh grade,” said Aronson. “My dad works at Partners HealthCare and two of his coworkers needed a website to process the ClinVar file on conflicting interpretations of genetic variants.”
At his suggestion, Aronson enlisted the assistance of those colleagues, and sought advice from programmers and geneticists, from Geisinger, Broad Institute and others, who work on ClinVar (and its affiliated archive, ClinGen) to “help me make the site more useful and applicable for the intended users,” he said.
Variant Explorer has been in the works for several years. “I started developing the site in the seventh grade, and in the beginning the site had one function: a dropdown box containing all the variants in the ClinVar conflicting interpretations file,” said Aronson.
Eventually, it grew in functionality and load time, and got an aesthetic facelift too. Along the way, its development has proven to be a “long and meaningful experience,” he said. “Whether it was parsing a genomic data file or trying to adequately test a site with such massive amounts of data, these challenges require a lot of effort.”
Those challenges have only grown as ClinVar’s data set has increased, Aronson said – but so has his thinking about the potential for how that data could be leveraged for other potential use cases – including some powered, perhaps, by artificial intelligence.
“I am currently starting to work on formatting the ClinVar FullRelease XML into a two dimensional matrix.” he explained. “Once I reorganize the file, it should be easier to use clinvar data for machine learning.
“I have thoughts about some machine learning projects that could be done with this data, such as estimating how laboratories might analyze different variants (given current classifications from other labs),” he added. “However, after I reformat this file, I will open source it – and hopefully people who are better at machine learning than me will use it.”
More data, more freely available, means more innovation
The wealth of publicly available genomic data that’s housed in ClinVar and made the development of Variant Explorer possible has given him a keen appreciation of the importance of data democratization, which he says is essential for enabling innovation to happen efficiently, effectively and equitably.
Those needs will only increase as others his age, raised with 21st technology and the norms and expectations it has created, enter the workforce and continue working toward new AI-driven advancements in healthcare. It’s why his talk at the BDHA Forum is titled “Data Democratization & My Generation’s Future.”
“I would not have been able to make Variant Explorer without publicly available data,” said Aronson. “As a 16-year old, I do not have access to the data of large companies or organizations. Without ClinVar providing free and available data, I would not have any data to organize. If the people who built ClinVar had been less open and generous, the data would likely be securely locked away from the public, and laboratories that benefit from Variant Explorer might not have had as much use out of the data as they do today.”
Data democratization is essential to the ongoing maturation of artificial intelligence and machine learning, he said. “Without publicly available data, the group of people who can use the extremely valuable resource is very limited.
“Even those who have access to data will be limited in how they can use other data sets in conjunction with their own to produce much more useful and accurate algorithms,” he added. “Only by democratizing data can we broadly provide the people who can truly use data for good with the information that they need.”
While, he’s generally of the mindset that more data leads to more innovation, he also recognizes that there are challenges: “Privacy is a large concern when discussing the level at which to free up data. While data is necessary to the development of beneficial technologies, it also has the potential for ill. For example, data can be used to manipulate people, as it was used in Cambridge Analytica.”
And while he’s generally optimistic about the future of data-driven medicine – “with the rise of machine learning and related technology, healthcare will become cheaper and more effective in the coming years,” he predicts – but still thinks things could be better.
“The U.S. is doing a lot to make its healthcare data accessible to the public,” said Aronson. “Many organizations within the U.S. government and at HHS have started to open their healthcare data up to the public. However, this is not to say that the US could not do more to make data available. If the U.S. opened scientific literature to the public, many projects and analyses could be done with that data to improve variant classifications.
“There are measures that need to be taken in order to prevent privacy concerns from open data,” he added. “But I think that it is the government’s job to create restrictions and an environment in which data can be openly shared, to some extent.”
As he works to further hone the VariantExplorer site – and, of course, toward his high school diploma – what are Aronson’s own plans for college and beyond?
“I would like to go into computer science and machine learning at college,” he said. “I think that with education in machine learning I could play a small part in the innovation and leaps society is about to make.”