to classify compounds using relationship criteria between their chemical structure and toxicity profiles [111]. The comparative study of ML algorithms shows that non-linear/ensemble-based classification algorithms are more successful in classifying the compounds using ADMET properties. Random Forest algorithms can also be used in ligand pose prediction, finding receptor-ligand interactions and predicting the efficiency of docking simulations [112]. Nowadays, Deep Learning (DL) methods are achieving remarkable success in the area of pharmaceutical research starting from biological-image analysis, de novo molecule design, ligand– receptor interaction to biological activity prediction [113]. So the continuous improvements in machine learning and deep learning algorithms will help to achieve desired results with higher prediction accuracy in the drug designing field.
Multiple descriptors represent the molecular data in terms of their structural and physicochemical features. These descriptors are responsible for diverse bioactivity of compounds [114]. Apart from descriptor-based bioactivity prediction of chemicals, substructure mining is also an established technique in the field of drug discovery. The substructure mining is also a data-driven approach that uses a combination of algorithms to detect the most frequently occurring substructures from a large subset of the known ligands [115]. There are two ways to use the substructure mining: one way is to use a predefined list of candidate scaffolds. The substructure mining algorithm identifies and extracts all the candidate scaffolds present in known compounds of a given database. While the second approach of substructure mining adaptively learns the substructures from the compounds. Both the ways are capable of getting all the significant 2D substructures from any chemical databases [116]. The popularity of the substructure mining approaches is highly appreciable for establishing a common consensus among medicinal chemists who later on start treating chemical compounds as a collection of their sub-structural parts. Application of the approach to establish structure–activity relationships will build more confidence in stating that biological properties of molecules are dependent upon their structural properties.
Later on, several substructure mining algorithms have been developed to accommodate the needs of an ever-changing drug discovery process [117]. The subgraph mining approach is unique as it is free from any kind of arbitrary assumption, compared to other approaches. In other words, the current subgraph mining techniques are capable of retrieving all frequent occurring subgraphs from a given database of chemical compounds in significantly less time with minimum support [118]. Furthermore, as described above, the idea behind these techniques is to enable us to find the most significant subgraph out of all possible subgraphs. Shortly, the use of Artificial intelligence-based techniques in medicinal chemistry will become more complex, due to the increasing availability of huge repositories containing chemical, biological, genetic, and structural data. The implementation of the complex algorithm on ever-increasing data volume for searching a new, safer and more effective drug candidates leads to the use of quantum computing and high-performance computing. In summary, we believe that these techniques will become a much more significant part of drug discovery endeavours within a very short time.
2.8 Conclusions
AI and ML have come up as great tools for structure prediction but these techniques rely to a great deal on collection of phenotype data, and not genomic data which may be its disadvantage. Genome researchers have learned that much of the variation between individuals is the result of a number of discrete, single-base changes also known as single nucleotide polymorphisms, or SNP’s in the human genome, which affects the phenotype. Application of ML to SNP data can be done in a manner similar to its application to microarray data which can be employed for supervised learning to identify differences in SNP patterns between people who respond well to a particular drug versus those who respond poorly. This can also be used for supervised learning to identify SNP patterns predictive of disease if possible. If the highly predictive SNP’s appear within genes may indicate that these genes may be important for conferring disease resistance or susceptibility, or the proteins they encode may be potential drug targets an important finding for a doctor and a researcher. Constructing models of biological pathways or even an entire cell in silico cell is a goal of systems biology which may be possible using the advanced computational techniques.
Machine learning has revolutionized the field of biology and medicine where researchers have employed machine learning to make gene chips more practical and useful. Data that might have taken years to collect, now takes a week. Biologist are aided greatly by the supervised and unsupervised learning methods that many are using to make sense of the large amount of data now available to them. As a result a rapid increase has occurred in the rate at which biologists are able to understand the molecular processes that underlie and govern the function of biological systems which can be used for a variety of important medical applications such as diagnosis, prognosis, and drug response. As our vast amount of genomic and similar types of data continues to grow, the role of computational techniques, especially machine learning, will grow with it. These algorithms will enable us to handle the task of analyzing this data to yield valuable insight into the biological systems that surround us and the diseases that affect us.
References
1. Lancet, T., Artificial intelligence in healthcare: Within touching distance. Lancet, 390, 10114, 2739, 2018.
2. Kantarjian, H. and Yu, P.P., Artificial Intelligence, Big Data, and Cancer. JAMA Oncol., 1, 5, 573–574, 2015.
3. Topol, E.J., High-performance medicine: The convergence of human and artificial intelligence. Nat. Med., 25, 1, 44–56, 2019.
4. Kanasi, E., Ayilavarapu, S., Jone, J., The aging population: Demographics and the biology of aging. Periodontol. 2000, 72, 1, 13–18, 2016.
5. Naughton, M.J., Brunner, R.L., Hogan, P.E., Danhauer, S.C., Brenes, G.A., Bowen, D.J. et al., Global quality of life among WHI women aged 80 years and older. J. Gerontol. A Biol. Sci. Med. Sci., 71 Suppl. 1, S72–8, 2016.
6. Cohen, C., Kampel, T., Verloo, H., Acceptability among community health-care nurses of intelligent wireless sensor-system technology for the rapid detection of health issues in home-dwelling older adults. Open Nurs. J., 11, 54–63, 2017.
7. Labovitz, D.L., Shafner, L., Reyes, G.M., Virmani, D., Hanina, A., Using artificial intelligence to reduce the risk of nonadherence in patients on anticoagulation therapy. Stroke, 48, 5, 1416–1419, 2017.
8. Ching, T., Himmelstein, D.S., Beaulieu-Jones, B.K. et al., Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface, 15, 141, pii:20170387, 2018.
9. Goh, G.B., Hodas, N.O., Vishnu, A., Deep learning for computational chemistry. J. Comput. Chem., 38, 16, 1291–1307, 2017.
10. Ramsundar, B., Liu, B., Wu, Z. et al., Is multi task deep learning practical for pharma? J. Chem. Inf. Model., 57, 8, 2068–2076, 2017.
11. So, H.C. and Sham, P.C., Improving polygenic risk prediction from summary statistics by an empirical Bayes approach. Sci. Rep., 7, 41262, 2017.
12. English, A.C., Salerno, W.J., Hampton, O.A., GonzagaJauregui, C., Ambreth, S., Ritter, D.I., Beck, C.R., Davis, C.F., Dahdouli, M., Ma, S. et al., Assessing structural variation in a personal genome—Towards a human reference diploid genome. BMC Genomics, 16, 286, 2015.
13. Angermueller, C., Parnamaa, T., Parts, L., Stegle, O., Deep learning for computational biology. Mol. Syst. Biol., 12, 878, 2016.
14. Meuwissen, T. and Goddard, M., Accurate Prediction of Genetic Values for Complex Traits by Whole-Genome Resequencing. Genetics, 185, 623–631, 2010.
15. Pérez-Enciso, M., Rincón, J.C., Legarra, A., Sequence- vs. chip-assisted genomic selection: Accurate Biological information is advised. Genet. Sel. Evol., 47, 1–14, 2015.
16. Heidaritabar, M., Calus, M.P.L., Megens, H.-J., Vereijken, A., Groenen, M.A.M., Bastiaansen, J.W.M., Accuracy of genomic prediction using imputed whole-genome sequence data in white layers. J. Anim. Breed. Genet., 133, 167–179,