On the Horizon in Machine Learning: Identifying Natural Selection At Work in the Human Genome

Written by: Janet Wilson

Machine Learning (ML) is a rapidly evolving branch of artificial intelligence in which a program is developed that can evolve on its own and improve by learning from data and experience. It can be used to make predictions, classify items, estimate probabilities and more. For example, ML is used in Apple’s face recognition method, Google’s search result algorithms and fraud detection methods used by credit card companies.

In most recent news, ML is being trained and has been successfully identifying evolutionary pressures that the human genome is under and how natural selection is shaping it over time.

Due to the fact that the human genome is comprised of more than 20,000 genes and more than 3 million base pairs, ML has the potential to outperform humans by a large margin. The error-prone, tedious data analysis and DNA sequence searching/comparing methods that would have to be employed by humans would take ages and wouldn’t be nearly as accurate as computer-based methods have the potential to be.

Usually ML involves teaching a program how to perform a task using a method called supervised learning. The programmer will provide the machine with the expected output of the program and from this the machine will determine how it should generate the output. This is called the training phase. This challenging in the case of genome analysis, because the expected result of the computation we want the program to perform simply isn’t known.

Currently researchers are trying to train ML systems to identify evolution based on simulated examples of natural selection, allowing the machine to create and internalize a definition of what natural selection looks like from its own statistical, computerized point of view.

The second phase of ML involves testing the program on data other than that which it has encountered in its training phase. ML algorithms have been tested on genome data and have successfully identified the evolution of the lactase gene in caucasian populations. This is a clear, known example of evolution in the human genome which has allowed individuals with the lactase gene to digest cows milk.

Based on massive human genome sequencing, over 20,000 mutations have been identified that researchers want to understand further through ML. Now the task of researchers is to continue to train and perfect their programs to be able to identify and visualize the evolution of this massive number of mutations, that continues to increase.

Hopefully soon, these machines will be able to trace the evolutionary roots and propagation of all mutations found in the human population. Their refined skills will allow us to understand our evolutionary history and perhaps even predict the future evolutionary patterns of the human race. Although it is unfortunate that this is yet another example of computers outperforming humans, the potential applications of ML are vast and exciting. On top of genetic analysis and evolution-modelling, ML methods are being employed in many other fields and have the potential to drastically improve medicine, technology, marketing and much, much more. So we will have to sit back and see where ML leads!


Evidence that New Doctors Cause Increase in Mortality Rate in the UK

In England, there is a commonly held belief that it is unsafe to be admitted to the hospital on “Black Wednesday”, the first Wednesday of August. Each year, this is the day when the group of newly certified doctors begin working in National Health Services (NHS) hospitals. One study compared the likelihood of death for patients who are admitted in the final Wednesday of July, with patients who were admitted in the first Wednesday in August. This study found that there is a 6% higher mortality rate for patients who are admitted on Black Wednesday.

There are 1600 hospitals and specialist care centres that operate under the NHS. Each centre routinely collects administrative data when admitting their patients. A group did a retrospective study using the archived hospital admissions data from 2000 to 2008. Each year, over 14 million records are collected. Two cohorts of patients were tracked: one group being patients who were admitted as emergency—unplanned and non-elective patients in the last Wednesday of July. The second cohort comprised of patients who were admitted as emergency patients in the first Wednesday of August. Patients who were transferred were taken into consideration to avoid double counting.

Each cohort was then tracked for one week. If the patient had not died by the following Tuesday, they were considered alive. Otherwise, if they had passed away by the following Tuesday, it was counted as a death. The study only tracked patients for one week, because it was deemed to be the best method to “capture errors caused by failure of training or inadequate supervision”, on the part of the junior doctors. Having a short-term study also avoided any possible biases that may arise from seasonal effects that would complicate the analyses.

The study only analyzed emergency admissions to ensure randomness in the data. They wanted to avoid bias that could have resulted from differences in planned admissions due to administrative pressures.

After considering both cohorts, the study analyzed 299741 hospital admissions, with 151844 admissions in the last week of July, and 147897 in the first week of August. They found that there were 4409 deaths in total, with 2182 deaths in the last week of July, and the last week of August.

The study found small, non-significant differences in the crude odds ratio of death between the two cohorts. However, after adjusting for the year, gender, age group, socio-economic deprivation, and co-morbidity of the patients, it was found that patients who were admitted on Black Wednesday had a 6% higher risk of mortality. The 95% confidence interval ranged from 1.00 to 1.15, and the p value was 1.05.

In short, for hospitals in the NHS from 2000 to 2008, it was found that there was a small, but still statistically significant, increase in the risk of death for patients who were admitted on Black Wednesday, over patients who were admitted the week prior.


Source: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0007103

Soup and Science: Bringing students close to the research

Feature Photo: From McGill University’s Facebook Page

In the beginning of each semester, the Faculty of Science organizes Soup and Science, a week where professors in different departments discuss their current research. Each day features four or five professors, in fields such as (but definitely not limited to) biochemistry, mathematics, management, psychology, and geography. Each professor is given the opportunity to summarize their research in three minutes.

This is an opportunity for undergraduate students, specifically those in U0 and U1, to understand what “research in a research-intensive university is all about”. Listening to talks on the cutting-edge research conducted at McGill allows students to bridge the link between the foundational information they learn in classes with research and the future of their respective fields.

This semester, Soup and Science is running from January 15 – 19 at 11:30-12:30 every day at the Redpath museum. Students should come early, since spaces fill up quickly.

On Wednesday, January 17th, Suzanne Fortier, the President of McGill University, was a special guest to this series of mini-talks. The talks opened up with the perspective of a student, followed by five McGill professors, and concluded with a series of questions about the talks. After the presentations, students are offered free soup and sandwiches.

Sasha McDowell (Final year Honours Biology student)

Sasha McDowell is an international student who is strongly interested in understanding more about her field. In her second year at McGill, she began working in a molecular biology laboratory during the school year. After taking BIOL 306, Neural Basis of Behaviour, she found herself so interested in the course topic that she began working in the Watt Lab on a SURE scholarship over the summer. In these sixteen weeks of work, she worked with mice and tested potential therapies of human ataxia diseases. Wanting to gain insight into all the aspects of research, she took a field course where work was conducted Mont. Saint-Hilaire. McDowell described the value she found in discovering all the avenues that research consists of.

Professor Nii Addy (Desautels Faculty of Management)

Professor Nii Addy completed an undergraduate degree in Engineering before beginning his work in Management. His work focuses in the cross-sector partnership between different organizations to solve complex societal problems. He showed the group an example of a complex problem; the increase of obesity rates among US adults from 1990 to 2006. In order to solve this systemic problem, it is important to consider a “multiplicity of perspectives”. He described the impact of minor changes, such as proximity, on major changes, the commonplace of obesity in North America. Professor Addy currently works with a “multidimensional proximity framework” to help solve complex societal issues.

Professor Kevin Manaugh (Dept. of Geography, McGill School of Environment, Associate of the School of Urban Planning)

Professor Manaugh’s work primarily deals with the design of sustainable cities. He showed the group images of cities before city-planning became a profession, in which industry were situated next to homes, child labour was prevalent, and cities were commonly plagued with societal, economic, and environmental problems. Ebenezer Howard blazed the trail for urban planning when wrote a book on the idea of a garden city, where cities were designed with the concept of “separation of uses”. In fact, most of North America has developed around this idea of a garden city. Dr. Manaugh’s work deals with how to best design the urban environment in a way that reduces the environmental impact, increases biodiversity, and includes the voices of marginalized people. In his own words, the vision of his research is to improve human well-being while making cities more resilient, socially inclusive, and having less environmental impact.

Professor Eric McCalla (Dept. of Chemistry)

Dr. McCalla is a new professor at McGill who researches in advanced batteries. He described the usefulness of lithium-ion batteries in our mobile devices and electric vehicles. However, the current state of research has not yet allowed these batteries to be utilized in renewable energy. For this to be done, the lifetime of the batteries need to be increased five fold, and the batteries need a higher energy density. In his lab, Professor McCalla studies the effects of different compositions for the positive electrode, and is hoping to study the possibility of replacing the current, liquid electrolyte, with a more stable solid electrolyte.

Professor Thibault Mesplède (Dept. of Microbiology and Immunology)

Dr. Mesplède’s lab currently study HIV, a virus that is not cured by antiretroviral therapy. He hopes to discover whether viral reservoirs are latent, or persistently replicating in hidden spaces or anatomical sites, much in the way that microbial organisms can be found within the extreme conditions of hot springs or freezing tundra. His lab uses deep sequencing to reconstruct viral evolution and the phylogeny of HIV.

Professor Jackie Vogel (Dept. of Biology, Associate professor in Computer Science)

By training, Dr. Vogel is a chemist and a biologist. However, her lab is truly interdisciplinary, using techniques from mathematics and computer science to mine data from biological systems. She currently focuses on the “gaps of knowledge that are particularly interesting”. More specifically, she wishes to find the mechanism that occurs from prophase to prometaphase in mitosis. Spindle pole bodies need to be perfectly aligned along a certain axis in order to replicate properly. She uses a basic projection from linear algebra to determine whether or not the cells have aligned their spindles. By studying a mutant that fails to do so, she is currently working on quantitatively analyzing and visualizing this step in mitosis.

Gene-erating Advancements in Genomics

Over the past 20 years, the field of biology has experienced a phenomenal series of advancements pertaining to genomic sequencing. Today, the scientific community has generated large databases of genomic sequences. Genomic sequencing has advanced to a stage where biotechnology companies are thriving, allowing a previously complex and expensive procedure to be commercialized and accessible to the public.

DNA is comprised of four nucleotides, each with a corresponding shorthand letter name that is widely recognized in science – Adenine (A), Guanine (G), Cytosine (C), and Thymine (T). DNA is commonly said to contain all the information that is required for life – that is, the body can interpret the linear order of DNA, and translate it into proteins that carry out the functions required for life. Each nucleus in human cells contains approximately 6 billion base pairs worth of DNA.

Sequencing the genome refers to the process of finding the order of nucleotides that are within a person’s cells. However, finding the sequence can allow biologists to elucidate far more about DNA. Having the sequence allows for the maps for locations of genes to be extrapolated. It also allows the creation of linkage maps, to be traced over generations.

Genomic sequencing first began in the 1970s. The proposal for the Human Genome Project (HGP) was first articulated in 1988. The project was an immense, international collaborative research project that has been deemed the “culmination of history of genetics research”. The primary goal of the HGP was to sequence the entire human genome. This effort included the help of 20 elite genomic institutions across six countries.

By 2001, 90% of the human genome was sequenced for the first time in human history. The project was long and intensive. The first portion of it involved mapping the human genome. This generated a reference sequence, with a low estimated error of 1 in approximately 10000 base pairs. The reference sequence is analogous to a “draft”. The mapping phase of the project costed at least tens of millions of dollars. In reality, it likely cost hundreds of millions.

The enormous length of the human genome prevents it from being sequenced in one read. Instead, current methods involve a technique called shotgun sequencing. The DNA is broken up into shorter fragments. Each fragment is then sequenced, and computational methods are used to piece the entire genome together. Through this method, the entire genome was published in 2003. The Human Genome Project was completed ahead of schedule and under budget.

In 2006, the cost to sequence an individual’s entire genome was estimated to be $20 million. Since then, companies have begun developing faster and cheaper methods of sequencing, known as “next-generation sequencing”. From mid-late 2015, the cost of generating a draft dropped from $4000 to $1500. Currently, the human genome can be sequenced for under $1000. Illumina, a leading frontier in the commercialized field of genomic sequencing, is hoping to be able to sequence the genome for under $100.

Genome sequencing allows us to determine differences or abnormalities in a person’s genomic composition. Scientists can now compare mutations with the expected sequence. It will be possible to find single nucleotide polymorphisms and identify translocations. It also allows scientists to characterize the functions of genes, such as seeing the difference between coding and non-coding regions. It allows study on epigenetics, a growing field that analyzes the chemical modifications that are made on genes. These modifications, including but not limited to phosphorylation and methylation, have been shown to be critical for cells to regulate gene expression.

The study can be used to understand aberrant phenotypes that arise in disease, since we will be able to better understand growth, development, and disease progression. In the near future, people will be able to sequence their genome as part of medical procedures.

Sequencing the genome greatly expands the capabilities of understanding the genomic code and enables advancements in the medical field. Its applications are widespread throughout different areas of biology, and continues to show great potential and merit in genetics research.






Photo: https://www.frontiersin.org/journals/genetics