Drugs are usually small organic molecules that bind to disease-causing proteins in order to neutralize their harmful effects. These proteins can be described at the atomic level, and the resulting information can be used to generate mathematical models able to guide the design of better drug candidates.
Until recently, these models employed classical linear regression, which limited their accuracy. As the sizes of relevant datasets grow, algorithms able to learn from them are achieving models of increased accuracy and broader applicability. Devising and applying such algorithms is the object of a research area known as “machine learning” (ML), the most successful subdivision of artificial intelligence. This approach is particularly timely given the amount of relevant data now available along with the maturity of ML algorithms.
In a recent publication in WIREs Computational Molecular Science, Hongjian Li, Kam-Heung Sze, and Gang Lu from the Chinese University of Hong Kong, in collaboration with Pedro Ballester from INSERM in France, review the state of the art in ML research.
The performance gap between classical and ML-based models, which was already large, has now widened owing to further methodological improvements. This is also the case for target-specific models, although there are a few exceptions that might be due to having insufficient data for that target. Also, against the expectations of many experts, deep-ML algorithms have not always been more predictive than those based on more-established ML techniques.
Instead, the most successful strategies to generate predictive models have been those identifying better combinations of ML algorithms with numerical descriptions of the protein–drug interaction. There are also more of these models that are freely available for others to use, which is important for promoting their application to real-world problems.
For example, it is often the case that initial drug candidates do not bind sufficiently tightly to the protein and thus are unable to neutralize its effects completely.
A way to increase the potency of a candidate is to make and experimentally test a wide range of its chemical derivatives. Often, the number of derivatives that need to be considered is such that one cannot test all of them experimentally due to time and cost constraints. In this case, a predictive ML model would greatly shorten the time and cost of identifying potent drug candidates by only testing those derivatives predicted to bind most tightly to the protein.
Studies intended to elucidate which type of numerical description works better for a given target are expected to increase in the future. Another probable future trend will be to investigate which targets can be better modelled when their datasets are complemented with datasets from other targets — for example, by exploiting inter-target similarities.
Kindly contributed by the authors
Research article found at: P. Ballester, et al. WIREs Computational and Molecular Science, 2020, doi.org/10.1002/wcms.1465