ExplorerAI in Drug DiscoveryAI
Research PaperResearchia:202606.18043

TxBench-PP: Analyzing AI Agent Performance on Small-Molecule Preclinical Pharmacology

Hannah Le

Abstract

Artificial intelligence (AI) agents promise to accelerate drug discovery by compressing interpretation and decision-making loops, but practical deployment requires trusted evaluation on realistic program decisions. We introduce TherapeuticsBench Preclinical Pharmacology (TxBench-PP), a verifiable benchmark for small-molecule preclinical pharmacology and the first focused slice of a broader TherapeuticsBench effort across drug-discovery stages and therapeutic modalities. TxBench-PP tests whether ...

Submitted: June 18, 2026Subjects: AI; AI in Drug Discovery

Description / Details

Artificial intelligence (AI) agents promise to accelerate drug discovery by compressing interpretation and decision-making loops, but practical deployment requires trusted evaluation on realistic program decisions. We introduce TherapeuticsBench Preclinical Pharmacology (TxBench-PP), a verifiable benchmark for small-molecule preclinical pharmacology and the first focused slice of a broader TherapeuticsBench effort across drug-discovery stages and therapeutic modalities. TxBench-PP tests whether agents can recover accurate conclusions from real-world assay data rather than memorized facts from literature. The benchmark contains 100 evaluations indexed by program stage, assay type, and task structure, spanning mechanism-of-action (MoA) and pharmacodynamic (PD) reasoning, compound-target engagement, causal target validation, developability and safety, and translational efficacy. Agents receive realistic workflow snapshots, inspect files in a coding environment, and return structured answers graded deterministically. Across 16 model-harness configurations, comprising 11 models and 4,800 trajectories, no system reliably recovered preclinical pharmacology decisions. The strongest configuration, Claude Opus 4.8 / Pi, passed 59.3% of endpoint attempts (178/300; 95% CI, 51.1-67.6), followed by GPT-5.5 / Pi at 55.3% (166/300; 47.0-63.6).


Source: arXiv:2606.19245v1 - http://arxiv.org/abs/2606.19245v1 PDF: https://arxiv.org/pdf/2606.19245v1 Original Link: http://arxiv.org/abs/2606.19245v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Jun 18, 2026
Topic:
AI in Drug Discovery
Area:
AI
Comments:
0
Bookmark
TxBench-PP: Analyzing AI Agent Performance on Small-Molecule Preclinical Pharmacology | Researchia