GEOPARD: Geometric Pretraining for Articulation Prediction in 3D Shapes.

ICCV 2025

1UMass Amherst, 2École de technologie supérieure 3Roblox 4TU Crete
teaser for geopard

Abstract

We present GEOPARD, a transformer-based architecture for predicting articulation from a single static snapshot of a 3D shape. The key idea of our method is a pretraining strategy that allows our transformer to learn plausible candidate articulations for 3D shapes based on a geometric-driven search without manual articulation annotation. The search automatically discovers physically valid part motions that do not cause detachments or collisions with other shape parts. Our experiments indicate that this geometric pretraining strategy, along with carefully designed choices in our transformer architecture, yields state-of-the-art results in articulation inference in the PartNet-Mobility dataset.

Architecture

Architecture overview

GEOPARD overview. First, we learn part feature representations a from the part points along with shape context representation b. Second, we enhance the part‑level feature representations with the shape context c. Third, the representations are aggregated to a compact, articulation‑aware part feature vector d, which is used to predict the part articulation through three decoding branches: part pivot prediction (e), motion axis prediction (f), and motion type prediction (g).

Pretraining

pretraining

To mitigate the challenge of limited annotated data, we proposed a method to generate plausible articulations. For a segmented input (left), we compute a set of possible articulations, reject the ones that introduce detachments or collisions to the rest of the part (right), and keep the valid candidate articulations (middle) for our pretraining.

Articulation Parameters Prediction on PartNet-Mobility Dataset

pretraining

are parts predicted or labeled as revolute, are parts predicted or labeled as prismatic, are input parts. Predicted axes are shown with an arrow (). While baselines based on part abstractions struggle to predict plausible articulation parameters, our base model, using fine‑grained point features, produces articulation parameters closely matching the ground truth — which are further enhanced by our pretraining strategy, supplying geometric and articulation priors refined during fine‑tuning.

BibTeX

@article{goyal2025geopard,
  title = {GEOPARD: Geometric Pretraining for Articulation Prediction in 3D Shapes},
  author = {Goyal, Pradyumn and Petrov, Dmitry and Andrews, Sheldon and Ben-Shabat, Yizhak and Liu, Hsueh-Ti Derek and Kalogerakis, Evangelos},
  journal = {arXiv preprint arXiv:2504.02747},
  year = {2025},
}