A Multi-layer Composite Model for Human Pose Estimation

Kun Duan, Dhruv Batra, and David Crandall

We propose a new approach for part-based human pose estimation using multi-layer composite models, in which each layer is a tree-structured pictorial structure that models pose at a different scale and with a different graphical structure. At the highest level, the submodel acts as a person detector, while at the lowest level, the body is decomposed into a collection of many local parts. Edges between adjacent layers of the composite model encode cross-model constraints. This multi-layer composite model is able to relax the independence assumptions of traditional  tree-structured pictorial-structure models while permitting efficient inference using dual-decomposition. We propose an optimization procedure for joint learning of the entire composite model. Our approach outperforms the state-of-the-art on the challenging Parse and UIUC Sport datasets.

Diagram showing our multi-level model.

Figure 1: Illustration of our multi-layer composite part-based model.

Papers and presentations

BibTeX entries:

@article{humanpose2015sp,
    author = {Duan, Kun and Batra, Dhruv and Crandall, David},
    title = {Human pose estimation through composite multi-layer models},
    journal = {Signal Processing},
    volume = {110},
    pages = {15--26},
    month = {May},
    year = {2015}
}

@inproceedings{poseest2012bmvc,
    author = {Duan, Kun and Batra, Dhruv and Crandall, David},
    title = {A Multi-layer Composite Model for Human Pose Estimation},
    booktitle = {British Machine Vision Conference (BMVC)},
    year = {2012}
}

The IU Computer Vision Lab's projects and activities have been funded, in part, by grants and contracts from the Air Force Office of Scientific Research (AFOSR), the Defense Threat Reduction Agency (DTRA), Dzyne Technologies, EgoVid, Inc., ETRI, Facebook, Google, Grant Thornton LLP, IARPA, the Indiana Innovation Institute (IN3), the IU Data to Insight Center, the IU Office of the Vice Provost for Research through an Emerging Areas of Research grant, the IU Social Sciences Research Commons, the Lilly Endowment, NASA, National Science Foundation (IIS-1253549, CNS-1834899, CNS-1408730, BCS-1842817, CNS-1744748, IIS-1257141, IIS-1852294), NVidia, ObjectVideo, Office of Naval Research (ONR), Pixm, Inc., and the U.S. Navy. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the U.S. Government, or any sponsor.