ExplorerArtificial IntelligenceAI
Research PaperResearchia:202606.15003

Flood and Harvest: The Provable Necessity of Trivia for Generating Valuable Mathematics via the Lens of Language Generation in the Limit

Xiaoyu Li

Abstract

AI systems coupled to proof assistants now generate formal mathematics at scale, and the gap between what a checker can verify and what a mathematician would value has become the binding constraint. We model the generation of valuable mathematics as nested language generation in the limit: a verifiable formal language $F$, accessed through a membership oracle (the proof checker), contains an unknown valuable language $H \in \mathcal{H}$ revealed only through an adversarial enumeration of a core ...

Submitted: June 15, 2026Subjects: AI; Artificial Intelligence

Description / Details

AI systems coupled to proof assistants now generate formal mathematics at scale, and the gap between what a checker can verify and what a mathematician would value has become the binding constraint. We model the generation of valuable mathematics as nested language generation in the limit: a verifiable formal language FF, accessed through a membership oracle (the proof checker), contains an unknown valuable language HHH \in \mathcal{H} revealed only through an adversarial enumeration of a core CHC \subseteq H of exact density αα (the literature). Every output is valuable (H\in H), trivial (FH\in F \setminus H), or a hallucination (F\notin F). We settle four questions. First, the verifier is not taste: the collections admitting generation with breadth are exactly those of the oracle-free model, characterized fiber-wise by Angluin's condition. Second, the verifier does buy sound coverage, covering all unseen valuable statements while asserting only valid ones: possible with it, impossible without it; it relocates unavoidable errors from false to trivial. Third, and centrally, a sharp dichotomy on the tight family: generators emitting finitely many trivia achieve optimal coverage α/2α/2, while any infinite trivia allowance, even at vanishing rate, jumps the optimum to 1α/21-α/2 (both tight, for cores presented as the candidate intersection), and one generator attains both ends. The transition is in trivia count, not rate; the gap 1α1-α is the unrecorded mass. Fourth, both regimes instantiate in a compression model of mathematics. A perfect verifier cannot substitute for taste: the unbounded stream of correct-but-worthless statements is not an engineering accident but a provable necessity, since covering unrecorded valuable mathematics requires an infinite, but asymptotically negligible, stream of certified trivia.


Source: arXiv:2606.14688v1 - http://arxiv.org/abs/2606.14688v1 PDF: https://arxiv.org/pdf/2606.14688v1 Original Link: http://arxiv.org/abs/2606.14688v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Jun 15, 2026
Topic:
Artificial Intelligence
Area:
AI
Comments:
0
Bookmark
Flood and Harvest: The Provable Necessity of Trivia for Generating Valuable Mathematics via the Lens of Language Generation in the Limit | Researchia