On the Horizon in Machine Learning: Identifying Natural Selection At Work in the Human Genome

Written by: Janet Wilson

Machine Learning (ML) is a rapidly evolving branch of artificial intelligence in which a program is developed that can evolve on its own and improve by learning from data and experience. It can be used to make predictions, classify items, estimate probabilities and more. For example, ML is used in Apple’s face recognition method, Google’s search result algorithms and fraud detection methods used by credit card companies.

In most recent news, ML is being trained and has been successfully identifying evolutionary pressures that the human genome is under and how natural selection is shaping it over time.

Due to the fact that the human genome is comprised of more than 20,000 genes and more than 3 million base pairs, ML has the potential to outperform humans by a large margin. The error-prone, tedious data analysis and DNA sequence searching/comparing methods that would have to be employed by humans would take ages and wouldn’t be nearly as accurate as computer-based methods have the potential to be.

Usually ML involves teaching a program how to perform a task using a method called supervised learning. The programmer will provide the machine with the expected output of the program and from this the machine will determine how it should generate the output. This is called the training phase. This challenging in the case of genome analysis, because the expected result of the computation we want the program to perform simply isn’t known.

Currently researchers are trying to train ML systems to identify evolution based on simulated examples of natural selection, allowing the machine to create and internalize a definition of what natural selection looks like from its own statistical, computerized point of view.

The second phase of ML involves testing the program on data other than that which it has encountered in its training phase. ML algorithms have been tested on genome data and have successfully identified the evolution of the lactase gene in caucasian populations. This is a clear, known example of evolution in the human genome which has allowed individuals with the lactase gene to digest cows milk.

Based on massive human genome sequencing, over 20,000 mutations have been identified that researchers want to understand further through ML. Now the task of researchers is to continue to train and perfect their programs to be able to identify and visualize the evolution of this massive number of mutations, that continues to increase.

Hopefully soon, these machines will be able to trace the evolutionary roots and propagation of all mutations found in the human population. Their refined skills will allow us to understand our evolutionary history and perhaps even predict the future evolutionary patterns of the human race. Although it is unfortunate that this is yet another example of computers outperforming humans, the potential applications of ML are vast and exciting. On top of genetic analysis and evolution-modelling, ML methods are being employed in many other fields and have the potential to drastically improve medicine, technology, marketing and much, much more. So we will have to sit back and see where ML leads!


