ExplorerComputational LinguisticsNLP
Research PaperResearchia:202606.09008

Causally Evaluating the Learnability of Formal Language Tasks

Vésteinn Snæbjarnarson

Abstract

Language models, as multi-task learners, acquire a wide range of abilities during training. A fundamental question is how much task-specific data is needed to learn a given task. Answering this for natural language is difficult: tasks are hard to delineate and can confound one another. To rigorously investigate the relationship between data frequency and learnability, we turn to a controlled setting using formal languages induced from probabilistic finite automata. These serve as a methodologica...

Submitted: June 9, 2026Subjects: NLP; Computational Linguistics

Description / Details

Language models, as multi-task learners, acquire a wide range of abilities during training. A fundamental question is how much task-specific data is needed to learn a given task. Answering this for natural language is difficult: tasks are hard to delineate and can confound one another. To rigorously investigate the relationship between data frequency and learnability, we turn to a controlled setting using formal languages induced from probabilistic finite automata. These serve as a methodological testbed to demonstrate that standard correlational evaluation practices are inherently flawed. To enable causal analysis, we introduce the binning semiring, an algebraic object that lets us control how often a targeted property occurs in a sampled corpus. We formulate the experimental pipeline as a causal graphical model and derive decomposed Kullback-Leibler divergence metrics to measure the learnability of specific sub-tasks. Our experiments show that evaluating learnability without causal intervention leads to incorrect conclusions due to confounders in correlational analysis, and serve as a warning about correlational pitfalls in natural-language settings.


Source: arXiv:2606.09822v1 - http://arxiv.org/abs/2606.09822v1 PDF: https://arxiv.org/pdf/2606.09822v1 Original Link: http://arxiv.org/abs/2606.09822v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Jun 9, 2026
Topic:
Computational Linguistics
Area:
NLP
Comments:
0
Bookmark