Mining Photo-sharing Websites to Study Ecological Phenomena

Haipeng Zhang, Mohammed Korayem, David CrandallGretchen LeBuhn

The popularity of social media websites like Flickr and Twitter has created enormous collections of user-generated content online. Latent in these content collections are observations of the world: each photo is a visual snapshot of what the world looked like at a particular point in time and space, for example, while each tweet is a textual expression of the state of a person and his or her environment. Aggregating these observations across millions of social sharing users could lead to new techniques for large-scale monitoring of the state of the world and how it is changing over time. In this paper we step towards that goal, showing that by analyzing the tags and image features of geo-tagged, time-stamped photos we can measure and quantify the occurrence of ecological phenomena including ground snow cover, snow fall and vegetation density. We compare several techniques for dealing with the large degree of noise in the dataset, and show how machine learning can be used to reduce errors caused by misleading tags and ambiguous visual content. We evaluate the accuracy of these techniques by comparing to ground truth data collected both by surface stations and by Earth observing satellites. Besides the immediate application to ecology, our study gives insight into how to accurately crowd-source other types of information from large, noisy social sharing datasets.

For more details, please see our WWW 2012 paper and slides from our WWW talk.


Sample videos showing appearance estimates produced by the Flickr photo analysis described in this paper. Green indicates high probability of appearances, and grey and black indicate low-confidence areas (with few photos or ambiguous evidence).

Butterfly Leaves Dragonfly

Media coverage

Snow Snaps Give You a Better Weather Picture:


Photos by different people are (almost) independent observations, with uncorrelated noise

Estimate daily snow cover for individual cities

Estimate daily snow quantity for individual cities

Philadelphia Boston New York Chicago
RMS error (inch) 1.44 1.26 1.15 1.06


Estimate snow cover on each day at each place in North America

  • For each geographic bin of size 1° x 1° (~35 million total decisions)
  • Use ground truth data from Terra satellite


[papersandpresentations proj=socialmining:eco]


We thank Professor Michael Trosset for discussions on the linear regression models. We also gratefully acknowledge the support of the following:

Lilly Endowment National Science Foundation IBM
Lilly Endowment IU Data to Insight Center National Science Foundation IBM