The Measurement Gap in the Automation of EU Law: Benchmarking Doctrinal Legal Reasoning under the EU AI Act
Abstract
Large language models now produce legal text of at least median quality, yet no existing benchmark can evaluate whether they perform doctrinal legal reasoning, which forms the interpretive core of legal work, rather than the ancillary, paralegal tasks that most current legal-AI evaluations measure. This measurement gap is not only methodological but legal: the EU AI Act makes "appropriate accuracy" a binding requirement for high-risk AI used in the judicial domain, yet that requirement cannot ac...
Description / Details
Large language models now produce legal text of at least median quality, yet no existing benchmark can evaluate whether they perform doctrinal legal reasoning, which forms the interpretive core of legal work, rather than the ancillary, paralegal tasks that most current legal-AI evaluations measure. This measurement gap is not only methodological but legal: the EU AI Act makes "appropriate accuracy" a binding requirement for high-risk AI used in the judicial domain, yet that requirement cannot acquire operational content without the very doctrinal-reasoning benchmark the field lacks.
Source: arXiv:2606.18158v1 - http://arxiv.org/abs/2606.18158v1 PDF: https://arxiv.org/pdf/2606.18158v1 Original Link: http://arxiv.org/abs/2606.18158v1
Please sign in to join the discussion.
No comments yet. Be the first to share your thoughts!
Jun 17, 2026
Artificial Intelligence
AI
0