收稿日期: 2016-07-19
修回日期: 2016-09-08
网络出版日期: 2016-09-20
Detecting Influenza Epidemics by Comparing and Optimizing Models Based on Internet Search Engine Query Data
Received date: 2016-07-19
Revised date: 2016-09-08
Online published: 2016-09-20
[目的/意义] 分析国内互联网搜索数据和我国流感疫情的相关性,探讨利用搜索数据辅助流行病监测的应用可能,为相关搜索引擎和疾病防控中心提供参考。[方法/过程] 通过分析百度中文搜索词搜索情况和我国流感活动情况的相关性,选择合适的搜索关键词,构建并比较一元线性回归、多元线性回归、主成分回归及人工神经网络模型,选出最优模型;引入官方发布的流感监测历史信息,进行模型优化。[结果/结论] 多元线性回归和人工神经网络模型具有更好的拟合优度,其中多元线性回归的精度更高;主成分回归模型在理论上可以减少变量之间的共线性,但实践证明无论是其拟合效果还是监测效果相对于多元回归模型来说都有所下降;历史数据和搜索数据包含的信息具有一定程度的互补性,综合使用两种数据具有最好的监测效果。
王若佳 , 李培 . 基于互联网搜索数据的流感监测模型比较与优化[J]. 图书情报工作, 2016 , 60(18) : 122 -132 . DOI: 10.13266/j.issn.0252-3116.2016.18.015
[Purpose/significance] This studyexplores the possibility of detecting influenza epidemics by Internet data in order to inform the design for the Disease Control and Prevention Center.[Method/process] First, select appropriate keywords by investigating the relationship between online information searches and conventional surveillance data in China.Then, 4 models are established according to the differences of theories. Finally, historical ILI cases are introduced to optimize models.[Result/conclusion] Results show that (i) multiple linear regressive model and artificial neural network model have more significant goodness-of-fit, and the former has better accuracy, (ii) principal component regression model could reduce the collinear among the variables in theory, whereas both the fitting effect and prediction accuracy of it are relatively lower than those of multiple linear regressive model in practice, (iii) historical data and search data are complementary in influenza surveillance, combining the two can achieve better monitoring results.
Key words: influenza; search engine; Baidu Index; early-warning model
[1] WAGNER M M, ROBINSON J M, TSUI F C, et al. Design of a national retail data monitor for public health surveillance[J]. Journal of the American Medical Informatics Association, 2003, 10(5):409-418.
[2] MAGRUDER S F. Evaluation of over-the-counter pharmaceutical sales as a possible early warning indicator of human disease[J]. Johns Hopkins APL technical digest, 2003, 24(4):349-353.
[3] JOHNSON H A, WAGNER M M, HOGAN W R, et al. Analysis of Web access logs for surveillance of influenza[J]. Studies in health technology and informatics, 2004, 107(2):1202-1206.
[4] MADOFF L C. ProMED-mail:an early warning system for emerging diseases[J]. Clinical infectious diseases, 2004, 39(2):227-32.
[5] EYSENBACH G. Infodemiology:tracking flu-related searches on the Web for syndromic surveillance[J]. Annual symposium proceedings, 2006, 24(4):244-248.
[6] FOX S. Online health search 2006[EB/OL].[2016-06-15].http://www.pewinternet.org/2006/10/29/online-health-search-2006/.
[7] GINSBERG J, MOHEBBI M H, PATEL R S, et al. Detecting influenza epidemics using search engine query data[J]. Nature, 2008, 457(7232):1012-1014.
[8] 李锐, 孙利谦, 熊成龙,等. 基于互联网搜索数据研究全球高致病性禽流感病毒H5N1的暴发监测[J]. 中华疾病控制杂志, 2015, 19(8):773-777.
[9] KERRIDGE H C. Using Google Trends for public health surveillance:a case study of influenza in Hong Kong[D]. Hong Kong:University of Hong Kong, 2015.
[10] POLGREEN P M, CHEN Y, PENNOCK D M, et al. Using Internet searches for influenza surveillance[J]. Clinical infectious diseases, 2008, 47(11):1443-1448.
[11] VALDIVIA A, LOPEZ-ALCALDE J, VICENTE M, et al. Monitoring influenza activity in Europe with Google Flu Trends:comparison with the findings of sentinel physician networks-results for 2009-10[J]. Eurosurveillance, 2010, 15(29):2-7.
[12] WADA K, OHTA H, AIZAWA Y. Correlation of "Google Flu Trends" with sentinel surveillance data for influenza in 2009 in Japan[J]. Open public health journal, 2011,4(1):17-20.
[13] COOK S, CONRAD C, FOWLKES A L, et al. Assessing Google Flu Trends performance in the United States during the 2009 influenza virus A (H1N1) pandemic[J/OL]. PLOS ONE, 2011, 6(8)[2016-07-10]. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0023610.
[14] OLSON D R, KONTY K J, PALADINI M, et al. Reassessing Google Flu Trends data for detection of seasonal and pandemic influenza:a comparative epidemiological study at three geographic scales[J/OL]. PLOS computational biology, 2013, 9(10)[2016-07-10]. http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003256.
[15] BUTLER D. When Google got flu wrong[J]. Nature, 2013, 494(7436):155-156.
[16] LAMPOS V, MILLER A C, CROSSAN S, et al. Advances in nowcasting influenza-like illness rates using search query logs[J/OL]. Scientific reports, 2015, 5[2016-07-10].http://discovery.ucl.ac.uk/1470223/1/srep12760.pdf.
[17] BARDAK B, TAN M. Prediction of influenza outbreaks by integrating Wikipedia article access logs and Google Flu Trend data[C]//2015 IEEE 15th international conference on bioinformatics and bioengineering. Belgrade:IEEE, 2015:1-6.
[18] CHO S, SOHN C H, JO M W, et al., et al. Correlation between national influenza surveillance data and Google Trends in South Korea[J/OL]. PLOS ONE, 2014, 9(1)[2016-07-10]. http://journals.plos.org/plosone/article/asset?id=10.1371/journal.pone.0081422.PDF.
[19] KANG M, ZHONG H, HE J, et al. Using Google Trends for influenza surveillance in South China[J/OL]. PLOS ONE, 2013, 8(1)[2016-07-10]. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0055205.
[20] YUAN Q, NSOESIE E O, LV B, et al. Monitoring influenza epidemics in China with search query from Baidu[J/OL]. PLOS ONE, 2013, 8(5)[2016-07-10]. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0064323.
[21] 鲁力, 邹远强, 彭友松,等. 百度指数和微指数在中国流感监测中的比较分析[J]. 计算机应用研究, 2016, 33(2):392-395.
[22] HULTH A, RYDEVIK G, LINDE A. Web queries as a source for syndromic surveillance[J/OL]. PLOS ONE, 2009, 4(2)[2016-07-10]. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0004378.
[23] SANTILLANA M, NSOESIE E O, MEKARU S R, et al. Using clinicians' search query data to monitor influenza epidemics[J]. Clinical infectious diseases, 2014, 59(10):1446-1450.
[24] 中国国家流感中心. 中国流感监测方案(2010年版)[EB/OL].[2016-03-19].http://www.chinaivdc.cn/cnic/zyzx/jcfa/201605/t20160520_129694.htm.
[25] DOORNIK J A. Improving the timeliness of data on influenza-like illnesses using Google search data[EB/OL].[2016-07-10].https://www2.gwu.edu/~forcpgm/JurgenDoornik-final-Doornik2009Flu-Jan31.pdf.
[26] CULOTTA A. Towards detecting influenza epidemics by analyzing Twitter messages[C]//Proceedings of the first workshop on social media analytics.Washington, DC:ACM, 2010:115-122.
[27] ORTIZ J R, ZHOU H, SHAY D K, et al. Monitoring influenza activity in the United States:a comparison of traditional surveillance systems with Google Flu Trends[J/OL]. PLOS ONE, 2011, 6(4)[2016-07-10]. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0018687.
[28] XU W, HAN Z W, MA J. A neural netwok based approach to detect influenza epidemics using search engine query data[C]//2010 international conference on machine learning and cybernetics. Qingdao:IEEE,2010:1408-1412.
[29] 李锐, 王增亮, 张志杰,等. 互联网搜索数据与流感预警[J]. 中华流行病学杂志, 2013, 34(1):101-103.
/
〈 | 〉 |