收稿日期: 2016-07-19
修回日期: 2016-09-08
网络出版日期: 2016-09-20
Detecting Influenza Epidemics by Comparing and Optimizing Models Based on Internet Search Engine Query Data
Received date: 2016-07-19
Revised date: 2016-09-08
Online published: 2016-09-20
[目的/意义] 分析国内互联网搜索数据和我国流感疫情的相关性,探讨利用搜索数据辅助流行病监测的应用可能,为相关搜索引擎和疾病防控中心提供参考。[方法/过程] 通过分析百度中文搜索词搜索情况和我国流感活动情况的相关性,选择合适的搜索关键词,构建并比较一元线性回归、多元线性回归、主成分回归及人工神经网络模型,选出最优模型;引入官方发布的流感监测历史信息,进行模型优化。[结果/结论] 多元线性回归和人工神经网络模型具有更好的拟合优度,其中多元线性回归的精度更高;主成分回归模型在理论上可以减少变量之间的共线性,但实践证明无论是其拟合效果还是监测效果相对于多元回归模型来说都有所下降;历史数据和搜索数据包含的信息具有一定程度的互补性,综合使用两种数据具有最好的监测效果。
王若佳 , 李培 . 基于互联网搜索数据的流感监测模型比较与优化[J]. 图书情报工作, 2016 , 60(18) : 122 -132 . DOI: 10.13266/j.issn.0252-3116.2016.18.015
[Purpose/significance] This studyexplores the possibility of detecting influenza epidemics by Internet data in order to inform the design for the Disease Control and Prevention Center.[Method/process] First, select appropriate keywords by investigating the relationship between online information searches and conventional surveillance data in China.Then, 4 models are established according to the differences of theories. Finally, historical ILI cases are introduced to optimize models.[Result/conclusion] Results show that (i) multiple linear regressive model and artificial neural network model have more significant goodness-of-fit, and the former has better accuracy, (ii) principal component regression model could reduce the collinear among the variables in theory, whereas both the fitting effect and prediction accuracy of it are relatively lower than those of multiple linear regressive model in practice, (iii) historical data and search data are complementary in influenza surveillance, combining the two can achieve better monitoring results.
Key words: influenza; search engine; Baidu Index; early-warning model
