How Machine Learning Is Transforming Bioscience Research

How Machine Learning Is Transforming Bio science Research

The relationship between biology and machine learning is not new and has existed for decades, even before data science and machine learning became fashionable. Fields like protein structure prediction, homology modeling and cheminformatics frequently employ tools from machine learning. PCA or dimensionality reduction/SVMs/clustering/random forest classifier, etc. are all a fundamental part of bioinformatics literature.

So, what is new?

For a long time machine learning was defined by the ability to choose effective features, which is often (a) labor intensive and (b) requires a need to understand or have an idea about solutions, which limited the application of machine learning. It is also important to keep in mind that biological data derived from experiments are prone to error, hence domain specific knowledge is almost always required, and biological or-omics data tend to be high dimensional and sparse.

Figure 1: Four stages of traditional machine learning workflow, (a) preprocessing data, (b) identifying features, (c) developing a model and (d) evaluating results.

Recommended by Forbes

That being said, even today, a lot of these traditional concepts are very much applied to build useful predictive models from vast sets of experimental data. But what really changed was the introduction of deep learning, along with (a) access to newer information and technology, (b) an exponential decrease in computing costs, (c) an exponential decrease in cost for genome sequencing, (d) advances in lab instrumentation and (e) a generation of trained scientists who understand the complexities of biology and biological systems and also have the ability to go deep into computer science.

Where is deep learning making inroads?

Life science research is vast and it is almost impossible to provide a comprehensive answer to this question. A lot of interesting work, ranging from biomedicine to understanding gene regulation have been published in the last few years. To me, one of the more interesting areas of application is the drug discovery space, such as predicting molecule toxicity and reactivity, which is often a huge burden on the drug discovery pipeline or even drug repurposing.

In my personal opinion, it is the best time to be a computational biologist, as we have access to innumerable amount of resources and information; and biology is filled with unanswered questions.