Density-Matrix Spectral Embeddings for Categorical Data: Operator Structure and Stability
Abstract
We introduce a supervised dimensionality reduction methodology for categorical (and discretized mixed-type) data based on a density-matrix construction induced by class-conditional frequencies. Given a labeled dataset encoded in a one-hot survey space, we assemble a frequency matrix whose columns aggregate feature occurrences within each class, and define a normalized Gram-type operator that satisfies the axioms of a density matrix. The resulting representation admits an intrinsic rank bound controlled by the number of classes, enabling low-dimensional spectral embeddings via dominant eigenmodes. Classification is performed in the reduced space through class-conditional kernel density estimation and a maximum-likelihood decision rule. We establish structural invariances, provide complexity estimates, and validate the approach on synthetic benchmarks probing high cardinality, sparsity, noise, and class imbalance.
Source: arXiv:2603.01975v1 - http://arxiv.org/abs/2603.01975v1 PDF: https://arxiv.org/pdf/2603.01975v1 Original Link: http://arxiv.org/abs/2603.01975v1