ExplorerData ScienceData Science
Research PaperResearchia:202601.07bb8214

Learning Multinomial Logits in $O(n \log n)$ time

Flavio Chierichetti

Abstract

A Multinomial Logit (MNL) model is composed of a finite universe of items $[n]=\{1,..., n\}$, each assigned a positive weight. A query specifies an admissible subset -- called a slate -- and the model chooses one item from that slate with probability proportional to its weight. This query model is also known as the Plackett-Luce model or conditional sampling oracle in the literature. Although MNLs have been studied extensively, a basic computational question remains open: given query access to s...

Submitted: January 7, 2026Subjects: Data Science; Data Science

Description / Details

A Multinomial Logit (MNL) model is composed of a finite universe of items [n]={1,...,n}[n]=\{1,..., n\}, each assigned a positive weight. A query specifies an admissible subset -- called a slate -- and the model chooses one item from that slate with probability proportional to its weight. This query model is also known as the Plackett-Luce model or conditional sampling oracle in the literature. Although MNLs have been studied extensively, a basic computational question remains open: given query access to slates, how efficiently can we learn weights so that, for every slate, the induced choice distribution is within total variation distance ε\varepsilon of the ground truth? This question is central to MNL learning and has direct implications for modern recommender system interfaces. We provide two algorithms for this task, one with adaptive queries and one with non-adaptive queries. Each algorithm outputs an MNL MM' that induces, for each slate SS, a distribution MSM'_S on SS that is within ε\varepsilon total variation distance of the true distribution. Our adaptive algorithm makes O(nε3logn)O\left(\frac{n}{\varepsilon^{3}}\log n\right) queries, while our non-adaptive algorithm makes O(n2ε3lognlognε)O\left(\frac{n^{2}}{\varepsilon^{3}}\log n \log\frac{n}{\varepsilon}\right) queries. Both algorithms query only slates of size two and run in time proportional to their query complexity. We complement these upper bounds with lower bounds of Ω(nε2logn)Ω\left(\frac{n}{\varepsilon^{2}}\log n\right) for adaptive queries and Ω(n2ε2logn)Ω\left(\frac{n^{2}}{\varepsilon^{2}}\log n\right) for non-adaptive queries, thus proving that our adaptive algorithm is optimal in its dependence on the support size nn, while the non-adaptive one is tight within a logn\log n factor.

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Jan 7, 2026
Topic:
Data Science
Area:
Data Science
Comments:
0
Bookmark
Learning Multinomial Logits in $O(n \log n)$ time | Researchia