GENIUS: Generative Fluid Intelligence Evaluation Suite
Abstract
Unified Multimodal Models (UMMs) have shown remarkable progress in visual generation. Yet, existing benchmarks predominantly assess , which relies on recalling accumulated knowledge and learned schemas. This focus overlooks : the capacity to induce patterns, reason through constraints, and adapt to novel scenarios on the fly. To rigorously assess this capability, we introduce ( Fluid ntelligence Evalation uite). We formalize as a synthesis of three primitives. These include (e.g., inferring personalized visual preferences), (e.g., visualizing abstract metaphors), and (e.g., simulating counter-intuitive physics). Collectively, these primitives challenge models to solve problems grounded entirely in the immediate context. Our systematic evaluation of 12 representative models reveals significant performance deficits in these tasks. Crucially, our diagnostic analysis disentangles these failure modes. It demonstrates that deficits stem from limited context comprehension rather than insufficient intrinsic generative capability. To bridge this gap, we propose a training-free attention intervention strategy. Ultimately, establishes a rigorous standard for , guiding the field beyond knowledge utilization toward dynamic, general-purpose reasoning. Our dataset and code will be released at: .
Source: arXiv:2602.11144v1 - http://arxiv.org/abs/2602.11144v1 PDF: https://arxiv.org/pdf/2602.11144v1 Original Link: http://arxiv.org/abs/2602.11144v1