ExplorerComputer VisionComputer Vision
Research PaperResearchia:202605.12006

Personal Visual Context Learning in Large Multimodal Models

Zihui Xue

Abstract

As wearable devices like smart glasses integrate Large Multimodal Models (LMMs) into the continuous first-person visual streams of individual users, the evolution of these models into true personal assistants hinges on visual personalization: the ability to reason over visual information unique to the wearer. We formalize this capability as Personal Visual Context Learning (Personal VCL), the prompt-time capability of using user-specific visual context to resolve personalized queries. To systema...

Submitted: May 12, 2026Subjects: Computer Vision; Computer Vision

Description / Details

As wearable devices like smart glasses integrate Large Multimodal Models (LMMs) into the continuous first-person visual streams of individual users, the evolution of these models into true personal assistants hinges on visual personalization: the ability to reason over visual information unique to the wearer. We formalize this capability as Personal Visual Context Learning (Personal VCL), the prompt-time capability of using user-specific visual context to resolve personalized queries. To systematically evaluate this, we present Personal-VCL-Bench, a comprehensive benchmark capturing the personal visual world across persons, objects, and behaviors. Our analysis of frontier LMMs identifies a profound context utilization gap, revealing that the mechanisms for leveraging visual evidence, as well as aggregating multiple visual observations, remain critically understudied. Motivated by these findings, we propose the Agentic Context Bank, a strong inference-time baseline that structures a user's visual context into a self-refining memory bank and employs query-adaptive evidence selection. Our baseline approach consistently improves over standard context prompting regimes across tasks and evaluated backbones, demonstrating a practical path towards future personalized LMMs.


Source: arXiv:2605.10936v1 - http://arxiv.org/abs/2605.10936v1 PDF: https://arxiv.org/pdf/2605.10936v1 Original Link: http://arxiv.org/abs/2605.10936v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
May 12, 2026
Topic:
Computer Vision
Area:
Computer Vision
Comments:
0
Bookmark
Personal Visual Context Learning in Large Multimodal Models | Researchia