ExplorerArtificial IntelligenceAI
Research PaperResearchia:202603.11060

Benchmarking Language Modeling for Lossless Compression of Full-Fidelity Audio

Phillip Long

Abstract

Autoregressive "language" models (LMs) trained on raw waveforms can be repurposed for lossless audio compression, but prior work is limited to 8-bit audio, leaving open whether such approaches work for practical settings (16/24-bit) and can compete with existing codecs. We benchmark LM-based compression on full-fidelity audio across diverse domains (music, speech, bioacoustics), sampling rates (16kHz-48kHz), and bit depths (8, 16, 24-bit). Standard sample-level tokenization becomes intractable a...

Submitted: March 11, 2026Subjects: AI; Artificial Intelligence

Description / Details

Autoregressive "language" models (LMs) trained on raw waveforms can be repurposed for lossless audio compression, but prior work is limited to 8-bit audio, leaving open whether such approaches work for practical settings (16/24-bit) and can compete with existing codecs. We benchmark LM-based compression on full-fidelity audio across diverse domains (music, speech, bioacoustics), sampling rates (16kHz-48kHz), and bit depths (8, 16, 24-bit). Standard sample-level tokenization becomes intractable at higher bit depths due to vocabulary size (65K for 16-bit; 16.7M for 24-bit). We propose Trilobyte, a byte-level tokenization schema for full resolution audio, improving vocabulary scaling from O(2b)O(2^{b}) to O(1)O(1) and enabling the first tractable 24-bit LM-based lossless compression. While LMs consistently outperform FLAC and yield state-of-the-art compression at 8-bit and 16-bit, we observe that compression gains become more modest as bit depth increases beyond 8-bit.


Source: arXiv:2603.08683v1 - http://arxiv.org/abs/2603.08683v1 PDF: https://arxiv.org/pdf/2603.08683v1 Original Link: http://arxiv.org/abs/2603.08683v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Mar 11, 2026
Topic:
Artificial Intelligence
Area:
AI
Comments:
0
Bookmark