A Hybrid Unsupervised Methodology on Artificial Intelligence Filtering for automatically processing cellular DNA-Encoded Library (DEL) Datasets
Abstract
Results Herein we report an innovative automatic method that enables the most promising hit identification from large quantities of cell-based DEL datasets with improved accuracy and efficiency. This processing workflow is based on a comprehensive unsupervised algorithm incorporating data pre-processing, feature extracting and outlier filtering, descriptor-based classification, similarity score ranking and active compound prediction. We performed methodology development with two DEL selection datasets targeting insulin receptor (INSR) on live cells, from both ห30 million- and 1.033 billion- membered libraries. The automated scheme has demonstrated high consistency with experimental results as well as self-adaptivity to on-cell DEL datasets with varied library scales. Extended methodology application to cellular thrombopoietin receptor (TPOR) further substantiated the algorithmic generalization capability regarding target proteins. Thus, this approach can serve as a widely applicable workflow automatically differentiating hit compounds and thereby facilitates drug development from candidate discovery.
Availability and Implementation The complete datasets, source code, and pre-trained models are made available at https://doi.org/10.5281/zenodo.17452392 and https://doi.org/10.5281/zenodo.17569557.
Issue Section: Original Paper Associate Editor: Jonathan Wren Collection: Bioinformatics Journals