ReinPool: Reinforcement Learning Pooling Multi-Vector Embeddings for Retrieval System
Abstract
Multi-vector embedding models have emerged as a powerful paradigm for document retrieval, preserving fine-grained visual and textual details through token-level representations. However, this expressiveness comes at a staggering cost: storing embeddings for every token inflates index sizes by over $1000\times$ compared to single-vector approaches, severely limiting scalability. We introduce \textbf{ReinPool}, a reinforcement learning framework that learns to dynamically filter and pool multi-vec...
Description / Details
Multi-vector embedding models have emerged as a powerful paradigm for document retrieval, preserving fine-grained visual and textual details through token-level representations. However, this expressiveness comes at a staggering cost: storing embeddings for every token inflates index sizes by over compared to single-vector approaches, severely limiting scalability. We introduce \textbf{ReinPool}, a reinforcement learning framework that learns to dynamically filter and pool multi-vector embeddings into compact, retrieval-optimized representations. By training with an inverse retrieval objective and NDCG-based rewards, ReinPool identifies and retains only the most discriminative vectors without requiring manual importance annotations. On the Vidore V2 benchmark across three vision-language embedding models, ReinPool compresses multi-vector representations by -- into single vectors while recovering 76--81% of full multi-vector retrieval performance. Compared to static mean pooling baselines, ReinPool achieves 22--33% absolute NDCG@3 improvement, demonstrating that learned selection significantly outperforms heuristic aggregation.
Please sign in to join the discussion.
No comments yet. Be the first to share your thoughts!
Jan 12, 2026
Computer Science
Computer Science
0