Reconstructing Hand-Object Interactions in the Wild

Zhe Cao*     Ilija Radosavovic*     Angjoo Kanazawa     Jitendra Malik
University of California, Berkeley

Abstract: In this work we explore reconstructing hand-object interactions in the wild. The core challenge of this problem is the lack of appropriate 3D labeled data. To overcome this issue, we propose an optimization-based procedure which does not require direct 3D supervision. The general strategy we adopt is to exploit all available related data (2D bounding boxes, 2D hand keypoints, 2D instance masks, 3D object models, 3D in-the-lab MoCap) to provide constraints for the 3D reconstruction. Rather than optimizing the hand and object individually, we optimize them jointly which allows us to impose additional constraints based on hand-object contact, collision, and occlusion. Our method produces compelling reconstructions on the challenging in-the-wild data from the EPIC Kitchens and the 100 Days of Hands datasets, across a range of object categories. Quantitatively, we demonstrate that our approach compares favorably to existing approaches in the lab settings where ground truth 3D annotations are available.

Reconstruction Procedure

Reconstruction procedure. To reconstruct hand-object interactions in the wild, we leverage all available related data (2D keypoints, 2D instance masks, 3D object models, 3D in-the-lab MoCap) through an optimization-based procedure that consists of four steps: (a) hand pose estimation by 2D keypoints fitting, (b) object pose estimation via differentiable rendering, (c) joint optimization for spatial arrangement, and (d) pose refinement using 3D contact priors.

Intermediate results. Top row: input images. 2nd row: results from individually optimizing hand and object. 3rd row: results from joint optimization (two viewpoints per example). Bottom row: results after the refinement.

Qualitative Results

Qualitative results on images from the EPIC kitchen dataset (row 1-2) and 100 Days of Hands dataset (row 3-4). Our method produces reconstructions of reasonably high-quality across a range of viewpoints, activities, and objects.

Additional qualitative results. Our procedure produces promising results across a range of scenarios and objects.

Failure Cases

Failure cases. We show representative failure cases of our reconstruction procedure. We observe several failure modes due to the failure of the individual steps in our procedure:hand pose estimation (first column), object pose estimation (columns 2-4), and the joint optimization (last column).

Collected CAD models

Collected Models. We collected 120 object models with both within and across category variation.


MOW Dataset. We collected a 3D dataset of humans Manipulating Objects in-the-Wild (MOW). Follow the instructions in this github repo to download and use the data.


Cao*, Radosavovic*, Kanazawa, Malik.

Reconstructing Hand-Object Interaction
in the Wild

ICCV, 2021.

[Paper]     [Bibtex]


We thank members of the BAIR community for helpful discussions and comments. This webpage template was borrowed from some colorful folks.