Back to Explorer
Research PaperResearchia:202601.12945434[Computer Science > Computer Science]

ReinPool: Reinforcement Learning Pooling Multi-Vector Embeddings for Retrieval System

Sungguk Cha

Abstract

Multi-vector embedding models have emerged as a powerful paradigm for document retrieval, preserving fine-grained visual and textual details through token-level representations. However, this expressiveness comes at a staggering cost: storing embeddings for every token inflates index sizes by over 1000ร—1000\times compared to single-vector approaches, severely limiting scalability. We introduce \textbf{ReinPool}, a reinforcement learning framework that learns to dynamically filter and pool multi-vector embeddings into compact, retrieval-optimized representations. By training with an inverse retrieval objective and NDCG-based rewards, ReinPool identifies and retains only the most discriminative vectors without requiring manual importance annotations. On the Vidore V2 benchmark across three vision-language embedding models, ReinPool compresses multi-vector representations by 746746--1249ร—1249\times into single vectors while recovering 76--81% of full multi-vector retrieval performance. Compared to static mean pooling baselines, ReinPool achieves 22--33% absolute NDCG@3 improvement, demonstrating that learned selection significantly outperforms heuristic aggregation.

Submission:1/12/2026
Comments:0 comments
Subjects:Computer Science; Computer Science
Original Source:
Was this helpful?

Discussion (0)

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

ReinPool: Reinforcement Learning Pooling Multi-Vector Embeddings for Retrieval System | Researchia