Measuring Public Opinion via Digital Footprints

Roberto Cerina

Individuals make an increasingly voluminous number of “digital” choices or decisions on a daily basis.  We propose a novel MRP estimation strategy that combines samples of these digital traces with a population frame with extensive individual-level socio-economic data in order to generate area forecasts of the outcome variable of interest. In our example, we forecast the two-party vote share for Democrats and Republicans in the 2018 Texas congressional district elections (all 36 districts) and the senate seat election.  Our implementation assumes we can observe, and sample, individuals signalling their preference by favoring one virtual location over another. In our case, visiting a Democrat versus Republican Facebook page during the election campaign. We demonstrate that a relatively large virtual sample can be quite representative of the overall population. Finally, we train a random forest machine to estimate the probability of voting Republican, conditional on individual-level data from the complete voting history and registration data for Texas. Over the course of eight weeks preceding the mid-term elections we generate vote share forecasts for all 36 congressional seat contests and for the senate race.The forecasts do not use any survey results as input. Nevertheless, they generate vote share forecasts that are accurate when compared to the actual outcomes.

More info about this project at

Download the paper