ExplorerData ScienceMachine Learning
Research PaperResearchia:202605.25057

Hierarchical Concept Geometry in Language Models Emerges from Word Co-occurrence

Andres Nava

Abstract

We propose a distributional theory of how hypernymy -- the is-a'' relation between general and specific concepts -- is encoded geometrically in language representations. Starting from the empirically verified assumption that words closer on the WordNet hypernym graph co-occur more often, we characterize theoretically the spectrum of the resulting embedding Gram matrix of word2vec embeddings. Under mild positivity and decay conditions on the co-occurrence kernel, we prove that the leading eigenve...

Submitted: May 25, 2026Subjects: Machine Learning; Data Science

Description / Details

We propose a distributional theory of how hypernymy -- the ``is-a'' relation between general and specific concepts -- is encoded geometrically in language representations. Starting from the empirically verified assumption that words closer on the WordNet hypernym graph co-occur more often, we characterize theoretically the spectrum of the resulting embedding Gram matrix of word2vec embeddings. Under mild positivity and decay conditions on the co-occurrence kernel, we prove that the leading eigenvectors first separate broad taxonomic branches and then progressively finer sub-branches, producing a \emph{hierarchical splitting geometry} with a coarse-to-fine spectral organization that mirrors the tree. We confirm these predictions in word2vec embeddings across many sampled WordNet subtrees, and show that the same signature extends strikingly well to Gemma 2B unembeddings. Our results indicate that hierarchical concept geometry in LLMs need not reflect a hierarchy-specific functional mechanism, but emerges from the spectral structure of pairwise word statistics.


Source: arXiv:2605.23821v1 - http://arxiv.org/abs/2605.23821v1 PDF: https://arxiv.org/pdf/2605.23821v1 Original Link: http://arxiv.org/abs/2605.23821v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
May 25, 2026
Topic:
Data Science
Area:
Machine Learning
Comments:
0
Bookmark