Explorerโ€บChemical Engineeringโ€บEngineering
Research PaperResearchia:202605.06039

PHALAR: Phasors for Learned Musical Audio Representations

Davide Marincione

Abstract

Stem retrieval, the task of matching missing stems to a given audio submix, is a key challenge currently limited by models that discard temporal information. We introduce PHALAR, a contrastive framework achieving a relative accuracy increase of up to $\approx 70\%$ over the state-of-the-art while requiring $<50\%$ of the parameters and a 7$\times$ training speedup. By utilizing a Learned Spectral Pooling layer and a complex-valued head, PHALAR enforces pitch-equivariant and phase-equivariant bia...

Submitted: May 6, 2026Subjects: Engineering; Chemical Engineering

Description / Details

Stem retrieval, the task of matching missing stems to a given audio submix, is a key challenge currently limited by models that discard temporal information. We introduce PHALAR, a contrastive framework achieving a relative accuracy increase of up to โ‰ˆ70%\approx 70\% over the state-of-the-art while requiring <50%<50\% of the parameters and a 7ร—\times training speedup. By utilizing a Learned Spectral Pooling layer and a complex-valued head, PHALAR enforces pitch-equivariant and phase-equivariant biases. PHALAR establishes new retrieval state-of-the-art across MoisesDB, Slakh, and ChocoChorales, correlating significantly higher with human coherence judgment than semantic baselines. Finally, zero-shot beat tracking and linear chord probing confirm that PHALAR captures robust musical structures beyond the retrieval task.


Source: arXiv:2605.03929v1 - http://arxiv.org/abs/2605.03929v1 PDF: https://arxiv.org/pdf/2605.03929v1 Original Link: http://arxiv.org/abs/2605.03929v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
May 6, 2026
Topic:
Chemical Engineering
Area:
Engineering
Comments:
0
Bookmark