Discrete-continuous optimization for large-scale structure from motion

David CrandallAndrew OwensNoah SnavelyDan Huttenlocher

Runner-up best paper at CVPR 2011!


Recent work in structure from motion (SfM) has successfully built 3D models from large unstructured collections of images downloaded from the Internet. Most approaches use incremental algorithms that solve progressively larger bundle adjustment problems. These incremental techniques scale poorly as the number of images grows, and can drift or fall into bad local minima. We present an alternative formulation for SfM based on finding a coarse initial solution using a hybrid discrete-continuous optimization, and then improving that solution using bundle adjustment. The initial optimization step uses a discrete Markov random field (MRF) formulation, coupled with a continuous Levenberg-Marquardt refinement. The formulation naturally incorporates various sources of information about both the cameras and the points, including noisy geotags and vanishing point estimates. We test our method on several large-scale photo collections, including one with measured camera positions, and show that it can produce models that are similar to or better than those produced with incremental bundle adjustment, but more robustly and in a fraction of the time.

For more details, please see our CVPR 2011 paper and slides from our CVPR talk.

 


Sample reconstruction videos


Central Rome
(after multi-view stereo)

Cornell Arts Quad

Cornell Arts Quad
(after multi-view stereo)

 


Downloads

 

  • — Quad dataset with ground-truth camera positions: [images & ground truth (8GB tarball)] [results & comparison code (250MB zip)] [bundler tracks file (350MB tarball) *New!*]
    This dataset contains 6,514 images of the Arts Quad at Cornell University. About 5,000 images include geotags recorded by a consumer GPS receiver (an iPhone 3G), while 348 images have very precise GPS coordinates measured using survey-quality differential GPS (with an accuracy of about 10cm) that can be used for ground truth. Also available are reconstruction results from both our method and Incremental Bundle Adjustment, code to compare two sets of camera positions (that may differ by an unknown similarity transform), and the tracks file needed by bundler to perform the final bundle adjustment.
  • Rotation/translation MRF code and data: [zip file of code and data] [readme file]
    This download includes the multi-core, single-computer version of rotation and translation MRF inference using Belief Propagation. It also includes MRF parameters for the Acropolis dataset and the relative pose estimates for the Arts Quad dataset. These files may be useful to researchers interested in inference on MRFs with large multidimensional label spaces.

 


Papers and presentations

 


CVPR 2011 paper

CVPR talk slides

CVPR talk video

 

BibTeX entry:

@inproceedings{crandall-disco11,
author    = {David Crandall and Andrew Owens and Noah Snavely and Daniel P. Huttenlocher},
booktitle = {Proc. IEEE Conf. on Computer Vision and Pattern Recognition},
title     = {Discrete-Continuous Optimization for Large-Scale Structure from Motion},
year      = {2011}
}

 

Errata:

The reference to “V. Govindu. Lie-algebraic averaging for globally consistent motion estimation. CVPR, 2004″ should instead be to “V. Govindu.  Combining Two-view Constraints For Motion Estimation. CVPR, 2001″.

 


Acknowledgements:

We would like to thank the Cornell Facilities Team for helping us collect the ground truth Arts Quad dataset. We also gratefully acknowledge the support of the following:

National Science Foundation MIT Lincoln Labs Google Intel Corporation Lily Endowment
National Science
Foundation
MIT Lincoln Labs Google Intel Corporation Lilly Endowment IU Data to Insight Center