ExplorerComputer VisionComputer Vision
Research PaperResearchia:202606.30008

GROW$^2$: Grounding Which and Where for Robot Tool Use

Yuhong Deng

Abstract

Can the robot use a plate to cut a cake if no knife is available? Tool use greatly expands robot capabilities, but to use tools creatively beyond their intended functions, the robot faces the challenge of $\textit{open-world affordance grounding}$: select an open-category object to act as a tool and localize its specific region of action. To this end, we introduce GROW$^2$ (GROunding Which and Where), which leverages object parts as a natural abstraction to split the grounding process hierarchic...

Submitted: June 30, 2026Subjects: Computer Vision; Computer Vision

Description / Details

Can the robot use a plate to cut a cake if no knife is available? Tool use greatly expands robot capabilities, but to use tools creatively beyond their intended functions, the robot faces the challenge of open-world affordance grounding\textit{open-world affordance grounding}: select an open-category object to act as a tool and localize its specific region of action. To this end, we introduce GROW2^2 (GROunding Which and Where), which leverages object parts as a natural abstraction to split the grounding process hierarchically into semantic and geometric levels, thus bypassing the need for data-heavy, end-to-end training. Semantically, GROW2^2 harnesses the commonsense reasoning of Vision-Language Models (VLMs) to parse a natural-language task instruction, select a suitable object as the tool, and identify task-relevant parts on the tool and the target object. Geometrically, vision foundation models then ground the selected parts into precise 3D regions from a single RGB-D image. Experiments on established benchmarks show that GROW2^2 outperforms state-of-the-art baselines on affordance prediction benchmarks. Further, it achieves zero-shot generalization over open-category objects and outperforms baselines in both simulated and real-world robot tool use experiments.


Source: arXiv:2606.30632v1 - http://arxiv.org/abs/2606.30632v1 PDF: https://arxiv.org/pdf/2606.30632v1 Original Link: http://arxiv.org/abs/2606.30632v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Jun 30, 2026
Topic:
Computer Vision
Area:
Computer Vision
Comments:
0
Bookmark
GROW$^2$: Grounding Which and Where for Robot Tool Use | Researchia