The field of genetic research has never been larger or more involved. Millions of dollars have been spent to study the human genome and the many implications of what happens when things go wrong. Another field seeing increased interest in recent years is artificial intelligence (AI).
Due to the complexity and vast amount of data associated with each area, it was unavoidable for AI and genetics to become intertwined. AI is an important asset for researchers studying genetics since it helps spot patterns and crunch through numbers faster than traditional methods.
Now, researchers are using AI to study DNA in a new way. As spotted by The Next Web, a team from Estonia’s University of Tartu and France’s Paris-Saclay University are using AI to generate entire genomes for people that don’t really exist.
Although the research sounds like it could be used for nefarious purposes, the team actually has something very wholesome in mind. The researchers behind the project note that artificial DNA could be a valuable resource to future genetic research while simultaneously protecting human privacy. If researchers can study accurately generated artificial DNA rather than the genome of a real person, future genetic research may be much more accessible.
Even so, it may be some time before the team’s algorithms are ready for any sort of real-world application. There is simply too much we still don’t understand, both about AI and genetics.
Today’s artificial intelligence algorithms are getting good at generating meaningful outputs from “noise,” or random bits of data. This is most often accomplished using generative adversarial network (GAN) technology. A GAN works by letting the AI generate something, compare it to a training dataset, and determine if the result is real or fake.
The framework utilizes two neural networks working in tandem. First is the “generator,” which uses noise to try and create a convincing output. It then sends the output to the “discriminator,” a second neural network that analyzes it. If the discriminator determines that the output came from the generator and not from training data, it sends the output back to the generator.
From there, the first algorithm is able to try again, applying what it learned to make a more convincing output. Over time, the generator becomes so good at creating fake outputs that the discriminator can no longer tell them apart from the training data.
This method has been applied to a number of complex scenarios. For instance, a site called ThisPersonDoesNotExist.com uses GAN to generate lifelike photos of people that, as the name suggests, don’t really exist. A program called BigSleep generates photos that don’t exist based on a user’s text input.
In this case, researchers want to use the technology to create entire genomes. Burak Yelmen, one of the scientists from the University of Tartu’s Institute of Genomics, told Digital Trends, “One of the main problems here is assessing the quality of artificial genomes. You can look at an image and decide if it looks real, but this is not possible for genomes.”
The technology could theoretically be used to create the genetic blueprint of a person who doesn’t exist. It would be indistinguishable from a real person’s genetic code. However, some remain skeptical of whether the AI would be able to compile an entire genome successfully and accurately.
Genetics research is highly restricted due to the sensitive, extremely personal nature of DNA. Any genome that currently sits in a database comes from a real person and is unique to them. In a sense, there is nothing more personal than your DNA.
Companies and organizations that handle genetic data are tasked with guarding the privacy of those to whom the DNA belongs. This makes research in the field difficult. Many organizations aren’t allowed to release genetic data for research purposes—even if it has been anonymized. That’s because there is simply no way to study a complete genome without being able to link it back to the person it came from—even if that isn’t the intention of the research.
An artificial genome could help researchers overcome the obstacles associated with obtaining genetic datasets to study. Since the AI-generated DNA can’t be traced back to a real person, it wouldn’t need to be handled with the same level of caution.
Moreover, it could help speed up the research process by giving researchers larger and more diverse datasets to study. In a paper published in the journal PLOS Genetics, the researchers say, “Generative models and [artificial genomes] AGs have the potential to become valuable assets in genetic studies by providing a rich yet compact representation of existing genomes and high-quality, easy-access and anonymous alternatives for private databases.”
More Work to Do
At this stage, the researchers’ work in generating artificial genomes is still in the early stages. Much more work will be needed before the AI-generated DNA can be used in real-world studies. Understandably, before artificial genomes can be used in studies that affect humans, researchers need to be certain that they are perfect.
As mentioned earlier, doing so is not an easy task. This has led to plenty of scrutiny around the approach and whether or not it is viable for real-world use.
Deanna Church, a geneticist not associated with the project, told Futurism, “My initial take is that it is interesting, but I’m not sure I see real practical implications.”
“Just because you can’t computationally distinguish these generated genomes from real genomes doesn’t mean they’ve really preserved functional motifs and domains that are important—there is much of this we still don’t understand,” she adds.
Indeed, Church highlights the biggest flaw with the AI-based approach. Humanity still has very limited knowledge about how AI technology works. Likewise, the realm of genomics also remains shrouded in mystery. Putting the two together creates a number of complications simply because we don’t understand each field enough.
While artificial genomes may be very helpful in the future, it doesn’t appear that they will be ready for real-world applications anytime soon.
Still, the research is promising. By investing time to start understanding these technologies today, we will be better prepared to use them in the future. We’ll also be better equipped to use them ethically and stop those who would do otherwise.