LEVERAGING MACHINE LEARNING ALGORITHMS FOR PREDICTING DRUG EFFICACY AND TOXICITY IN EARLY DRUG DEVELOPMENT

Shazia Khalid

Authors

Shazia Khalid Allama Iqbal Medical College, Lahore, Punjab Pakistan Author

Keywords:

Machine Learning, Drug Discovery, Toxicity Prediction, Drug Efficacy, Xgboost, Shap Analysis

Abstract

The high attrition rate, prolonged timelines, and escalating costs of traditional drug development necessitate innovative strategies for early prediction of drug efficacy and toxicity. This study explores the application of nine machine learning algorithms—SVM, Random Forest, XGBoost, DNN, Logistic Regression, Naive Bayes, KNN, AdaBoost, and LightGBM—in predicting drug responses using large-scale datasets such as ChEMBL, PubChem, and Tox21.The greatest AUC- ROC scores, which exceed 0.90 in classifications and regressions, showed the better performance of XGBoost, LightGBM, and Random Forest according to quantitative evaluations. Particularly with XGBoost (IC50 RMSE: With RMSEs for IC50 and EC50 predictions at 0.296, these models showed remarkable accuracy of medication impact prediction. Organ-specific toxicity prediction showed that LightGBM and XGBoost demonstrated excellent performance with over 85% accuracy for many toxicity classes, including hepatotoxicity and cardiotoxicity. The SHAP study revealed some critical molecular factors, that confirmed model predictions, making it more predictable and transparent. In some systems, changes of hyperparameters produced relatively significant changes in model quality (which were up to 15% increase in AUC-ROC score). Scatter plots showed relationships between dataset size and model performance, and further evidence from training time results demonstrated feasibility of these models in high-throughput applications. Notwithstanding such advances, the challenges such as data-variability, interpretability of models limitations, and the need for standardized processes persist. However, as three main considerations demonstrate, this research offers a far-reaching analysis and reveals enormous potential machine learning has to offer to speed up early stage development of drugs and block late phase failures at a lower cost. Such a strategy as developing transparent AI models might contribute to the practical concept of improving efficacy profiling and safety evaluation in future drug development procedures.