Discrete-continuous optimization for large-scale structure from motion

David Crandall, Andrew Owens, Noah Snavely, Dan Huttenlocher

Runner-up best paper at CVPR 2011!

Recent work in structure from motion (SfM) has successfully built 3D models from large unstructured collections of images downloaded from the Internet. Most approaches use incremental algorithms that solve progressively larger bundle adjustment problems. These incremental techniques scale poorly as the number of images grows, and can drift or fall into bad local minima. We present an alternative formulation for SfM based on finding a coarse initial solution using a hybrid discrete-continuous optimization, and then improving that solution using bundle adjustment. The initial optimization step uses a discrete Markov random field (MRF) formulation, coupled with a continuous Levenberg-Marquardt refinement. The formulation naturally incorporates various sources of information about both the cameras and the points, including noisy geotags and vanishing point estimates. We test our method on several large-scale photo collections, including one with measured camera positions, and show that it can produce models that are similar to or better than those produced with incremental bundle adjustment, but more robustly and in a fraction of the time.

For more details, please see our CVPR 2011 paper and slides from our CVPR talk.

Sample reconstruction videos

Downloads

Rotation/translation MRF code and data:
[zip file of code and data (updated 7/20/2014, 66MB zip file)]
[readme file (updated 7/20/2014)]
This download includes the multi-core, single-computer version of rotation and translation MRF inference using Belief Propagation. It also includes MRF parameters for the Acropolis dataset and the relative pose estimates for the Arts Quad dataset. These files may be useful to researchers interested in inference on MRFs with large multidimensional label spaces. (Files updated 7/20/2014 to clarify file formats.)
Quad dataset with ground-truth camera positions:
[images & ground truth (updated 7/26/2014 to include pairs.txt, 8GB tarball)]
[results & comparison code (250MB zip)]
[bundler tracks file (350MB tarball)]
[Matlab encoding of pairs file and ground truth, thanks to the Indian Institute of Science Computer Vision Lab]
This dataset contains 6,514 images of the Arts Quad at Cornell University. About 5,000 images include geotags recorded by a consumer GPS receiver (an iPhone 3G), while 348 images have very precise GPS coordinates measured using survey-quality differential GPS (with an accuracy of about 10cm) that can be used for ground truth. Also available are reconstruction results from both our method and Incremental Bundle Adjustment, code to compare two sets of camera positions (that may differ by an unknown similarity transform), and the tracks file needed by bundler to perform the final bundle adjustment.
San Francisco dataset:
[pairs file, ground truth data, bundler tracks (new 7/20/2014, 280MB tarball)]
This dataset is based on 17,357 Google StreetView images from San Francisco (derived from the same dataset as in Chen et al, “City-scale landmark identification on mobile devices,” CVPR 2011). The dataset includes ground truth translations and rotations, calculated from StreetView metadata.

[papersandpresentations proj=socialmining:3d]

Errata

In the CVPR 2011 paper, the reference to “V. Govindu. Lie-algebraic averaging for globally consistent motion estimation. CVPR, 2004” should instead be to “V. Govindu. Combining Two-view Constraints For Motion Estimation. CVPR, 2001”.

Acknowledgements

We would like to thank the Cornell Facilities Team for helping us collect the ground truth Arts Quad dataset. We also gratefully acknowledge the support of the following:


National Science Foundation	MIT Lincoln Labs	Google	Intel Corporation	Lilly Endowment	IU Data to Insight Center

IU Computer Vision Lab

Discrete-continuous optimization for large-scale structure from motion

Sample reconstruction videos

Downloads

Errata

Acknowledgements