Prediction of Infectious Disease Spread using Twitter:
A Case of Influenza


Hideo Hirose, Liangliang Wang


The 2012 IEEE 5th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP'12), pp.100-105, December 17-20, 2012, Taipei, Taiwan


Nowadays, detecting the disaster phenomena and predicting the final stage become very important in the risk analysis view-point.
The statistical methods provide accurate estimates of parameters when the data are completely given. However, when the data are incomplete, the accuracy of the estimates becomes poor. Therefore, statistical methods are weak in predicting the future trends.
The SIR methods, for infectious disease spread prediction, using the differential equations can sometimes provide accurate estimates for the final stage.
These methods, however, require some inspection time, which means the delay of analysis at least one week or so when we want to predict the future trends. To detect the disasters and to predict the future trends much earlier, we can use the social network system (SNS).
In this paper, we have proposed a method to predict the future trend of influenza by using Twitter. We have analyzed the possibility of building a regression model by combining Twitter messages and CDC's Influenza-Like Illness (ILI) data, and we have found that the multiple linear regression model with ridge regularization outperforms the single linear regression model and other un-regularized least squared methods. The model of multiple linear regression with ridge can notably improve the prediction accuracy.

Key Words
Twitter; early detection; influenza; infectious disease; logistic regression; ridge; ILI; AIC; SNS; truncated data.



Times Cited in Web of Science:

Times Cited in Google Scholar: 8

Cited in Books: