ExplorerComputer VisionComputer Vision
Research PaperResearchia:202602.17004

Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision

Aadarsh Sahoo

Abstract

Conversational image segmentation grounds abstract, intent-driven concepts into pixel-accurate masks. Prior work on referring image grounding focuses on categorical and spatial queries (e.g., "left-most apple") and overlooks functional and physical reasoning (e.g., "where can I safely store the knife?"). We address this gap and introduce Conversational Image Segmentation (CIS) and ConverSeg, a benchmark spanning entities, spatial relations, intent, affordances, functions, safety, and physical re...

Submitted: February 17, 2026Subjects: Computer Vision; Computer Vision

Description / Details

Conversational image segmentation grounds abstract, intent-driven concepts into pixel-accurate masks. Prior work on referring image grounding focuses on categorical and spatial queries (e.g., "left-most apple") and overlooks functional and physical reasoning (e.g., "where can I safely store the knife?"). We address this gap and introduce Conversational Image Segmentation (CIS) and ConverSeg, a benchmark spanning entities, spatial relations, intent, affordances, functions, safety, and physical reasoning. We also present ConverSeg-Net, which fuses strong segmentation priors with language understanding, and an AI-powered data engine that generates prompt-mask pairs without human supervision. We show that current language-guided segmentation models are inadequate for CIS, while ConverSeg-Net trained on our data engine achieves significant gains on ConverSeg and maintains strong performance on existing language-guided segmentation benchmarks. Project webpage: https://glab-caltech.github.io/converseg/


Source: arXiv:2602.13195v1 - http://arxiv.org/abs/2602.13195v1 PDF: https://arxiv.org/pdf/2602.13195v1 Original Link: http://arxiv.org/abs/2602.13195v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Feb 17, 2026
Topic:
Computer Vision
Area:
Computer Vision
Comments:
0
Bookmark