ExplorerComputer VisionComputer Vision
Research PaperResearchia:202602.10003

Thinking in Structures: Evaluating Spatial Intelligence through Reasoning on Constrained Manifolds

Chen Yang

Abstract

Spatial intelligence is crucial for vision--language models (VLMs) in the physical world, yet many benchmarks evaluate largely unconstrained scenes where models can exploit 2D shortcuts. We introduce SSI-Bench, a VQA benchmark for spatial reasoning on constrained manifolds, built from complex real-world 3D structures whose feasible configurations are tightly governed by geometric, topological, and physical constraints. SSI-Bench contains 1,000 ranking questions spanning geometric and topological...

Submitted: February 10, 2026Subjects: Computer Vision; Computer Vision

Description / Details

Spatial intelligence is crucial for vision--language models (VLMs) in the physical world, yet many benchmarks evaluate largely unconstrained scenes where models can exploit 2D shortcuts. We introduce SSI-Bench, a VQA benchmark for spatial reasoning on constrained manifolds, built from complex real-world 3D structures whose feasible configurations are tightly governed by geometric, topological, and physical constraints. SSI-Bench contains 1,000 ranking questions spanning geometric and topological reasoning and requiring a diverse repertoire of compositional spatial operations, such as mental rotation, cross-sectional inference, occlusion reasoning, and force-path reasoning. It is created via a fully human-centered pipeline: ten researchers spent over 400 hours curating images, annotating structural components, and designing questions to minimize pixel-level cues. Evaluating 31 widely used VLMs reveals a large gap to humans: the best open-source model achieves 22.2% accuracy and the strongest closed-source model reaches 33.6%, while humans score 91.6%. Encouraging models to think yields only marginal gains, and error analysis points to failures in structural grounding and constraint-consistent 3D reasoning. Project page: https://ssi-bench.github.io.


Source: arXiv:2602.07864v1 - http://arxiv.org/abs/2602.07864v1 PDF: https://arxiv.org/pdf/2602.07864v1 Original Link: http://arxiv.org/abs/2602.07864v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Feb 10, 2026
Topic:
Computer Vision
Area:
Computer Vision
Comments:
0
Bookmark
Thinking in Structures: Evaluating Spatial Intelligence through Reasoning on Constrained Manifolds | Researchia