New findings by scientists from the Data Science Lab at Warwick Business School suggest that data generated simply through our use of mobile phones and Twitter might offer surprisingly accurate estimates of crowd sizes.

Federico Botta, Suzy Moat and Tobias Preis, of Warwick Business School, analysed Twitter and mobile phone data from Milan, Italy, and found that they could estimate attendance numbers for football matches at the San Siro stadium, as well as the number of people at Linate Airport at any given time.

Their research, published in Royal Society Open Science today, could be of value in a range of emergency situations, such as evacuations and crowd disasters.

Mr Botta said: “Measuring crowd size is a difficult task, as the hugely varying estimates we see of the number of people at protests underline. Given that most people now carry a mobile phone with them, we wondered if we could measure the number of people in a given location simply by analysing data on usage of these mobile phones. We found that this automatically generated data provides an excellent basis for estimating the size of a crowd. Quick and accurate measurements of crowd size could be of vital use for police and other authorities charged with avoiding crowd disasters."

In the paper, Quantifying crowd size with mobile phone and Twitter data, the scientists analysed two months of both Twitter data and mobile phone data from Milan, from November 1 to December 31, 2013. The mobile phone activity dataset was provided by Telecom Italia and reflects both the volume of outgoing and incoming calls and text messages, as well as the number of active internet connections. Both datasets make it possible for the scientists to determine not only when mobile phones were active, but where their users were.

Remarkably, they found that the size of spikes in Twitter and mobile phone activity allowed them to estimate the number of attendees at football matches in the San Siro stadium, home of AC Milan and Internazionale. 

twitter and mobile phone data spikes correspond with football matches in Milan
Crowded house: Mobile phone and Twitter activity in the San Siro stadium correspond with the dates of football matches held at the ground and the heights of the spikes bear a strong similarity to the number of people attending each match.

 

Dr Preis, Associate Professor of Behavioural Science and Finance, said: “We plotted mobile phone calls, Twitter and SMS activity in the geographical area in which the San Siro is located and in all three we observed 10 distinct spikes. We found that the dates these spikes occurred coincided exactly with the dates on which the 10 football matches took place in the stadium.

“Furthermore, we noted that the relative sizes of the spikes strongly resembled the official attendance figure for each match. By drawing on historic internet activity in the San Siro, we were able to generate estimates of the number of attendees which fell within 13 per cent of the true value.”

Dr Moat, Assistant Professor of Behavioural Science, said: “One of the key challenges we faced was to identify situations for which we had a reliable measurement of the number of people present, against which we could calibrate our method. The football stadium at the San Siro was ideal, as football fans need to buy a ticket to attend a match. We found that data on nine football matches was sufficient for us to generate accurate estimates of the number of people attending a 10th match.

"The relationship between data on internet usage and match attendance was strongest of all – perhaps because smartphones automatically check services such as email, without the need for the user to actively intervene."

The researchers also investigated how mobile phone and Twitter usage related to passenger activity at Linate Airport. While exact passenger counts were not available, the researchers estimated the number of people in the airport by assuming passengers arrived two hours before their flight and those landing left an hour after touching down.

Mr Botta said: "Again, we found that the greater phone call and SMS activity corresponded with a larger number of estimated passengers at the airport. Similarly, we discovered that greater internet activity related to a higher estimated number of passengers and the same with Twitter activity.

"The relationships are weaker than those found in the case study at San Siro, but remarkable given the coarse nature of our estimate of the number of passengers.”

Dr Preis added: “Our research provides evidence that accurate estimates of the number of people in a given location at a given time can be extrapolated from mobile phone or Twitter data. This shows that data generated through everyday interactions with our mobile phones could be of clear value for a range of business and policy stakeholders, potentially offering an almost instant measurement of the size of crowds.”

Suzy Moat and Tobias Preis are currently leading a FutureLearn MOOC called "Big Data: Measuring and Predicting Human Behaviour", in which this research is featured. The course can be found here.

Tobias Preis and Suzy Moat teach Behavioural Sciences for the Manager on the Executive MBA, which is now being taught part-time from WBS London at The Shard. They also teach Big Data Analytics on MSc Finance and  the suite of MSc Business courses, on which they also lecture on Complexity in the Social Sciences.