收稿日期: 2013-11-13
修回日期: 2014-01-04
网络出版日期: 2014-01-20
基金资助
本文系教育部人文社会科学青年基金项目“社会网络环境下信息内容主题挖掘与语义分类研究”(项目编号:13YJC870008)和国家自然科学青年基金项目“社会网络环境下基于用户-资源关联的信息推荐研究(项目编号:71303178)”研究成果之一。
Mining and Evolution of Content Topics Based on Dynamic LDA
Received date: 2013-11-13
Revised date: 2014-01-04
Online published: 2014-01-20
胡吉明 , 陈果 . 基于动态LDA主题模型的内容主题挖掘与演化[J]. 图书情报工作, 2014 , 58(02) : 138 -142 . DOI: 10.13266/j.issn.0252-3116.2014.02.023
The study of mining and evolution of text topics is of important significance for text modeling and classification, as well as the recommendation service. Starting from the analysis of theory of text topic modeling based on LDA, aiming at dynamic characters of text contents under social networking environment, this article constructed a dynamic LDA model for mining of text topics. Subsequently, the accuracy degree of topic mining was improved by incremental Gibbs sampling and estimation. Furthermore, the evolution of dynamic topics of text contents was achieved from the aspects of topic similarity and intensity. The experiment demonstrated that methods proposed in this article were feasible and effective, which will be the foundation of further study about semantic modeling and classification text.
Key words: topics mining; topics evolution; dynamic LDA model
[1] Deerwester S, Dumais S T, Furnas G W, et al. Indexing by latent semantic analysis[J]. Journal of the American Society for Information Science, 1990,114(2):211-244.
[2] Hofmann T. Probabilistic latent semantic analysis[C]//Proceedings of the Twenty-Second Annual International SIGIR,Conference on Research and Development in Information Retrieval.New York:ACM,1999:50-57.
[3] Blei D M, Ng A Y, Jordan M L, et al. Latent Dirichlet allocation[J].Journal of Machine Learning Research, 2003,3(2):993-1022.
[4] Blei D M. Probabilistic topic models[J]. Communications of the ACM,2012,55(4):77-84.
[5] Barbieri N, Manco G, Ritacco E, et al. Probabilistic topic models for sequence data[J]. Machine Learning,2013,93(1):5-29.
[6] Isaly L, Trias E, Peterson G. Improving the latent Dirichlet allocation document model with WordNet[C]//Proceedings of the 5th International Conference on Information Warfare and Security.London:Academic Conferences Ltd,2010:163-170.
[7] Hofmann T. Unsupervised learning by probabilistic latent semantic analysis[J].Machine Learning,2001,42(1):177-196.
[8] Du Lan, Buntine W, Jin Huidong, et al. Sequential latent Dirichlet allocation[J]. Knowledge and Information Systems,2012,31(3):475-503.
[9] Mohd M, Crestani F, Ruthven I. Evaluation of an interactive topic detection and tracking interface[J]. Journal of Information Science,2012,38(4):383-398.
[10] Aksoy C, Can F, Kocberber S. Novelty detection for topic tracking[J].Journal of The American Society for Information Science and Technology,2012,63(4):777-795.
[11] 余传明,张小青,陈雷,等.基于LDA模型的评论热点挖掘:原理与实现[J].情报理论与实践,2010,33(5):103-106.
[12] 刘洪涛,肖开洲,吴渝,等.带舆论评价的引文网络构建与主题发现[J].情报学报,2011,30(4):441-448.
[13] 黄颖. LDA及主题词相关性的新事件检测[J].计算机与现代化,2012(1): 6-9,13.
[14] Kang J H, Lerman K, Plangprasopchok A. Analyzing microblogs with affinity propagation[C]//Proceedings of KDD Workshop on Social Media Analytics. New York:ACM,2010:67-70.
[15] Gohr A, Hinneburg A, Schult R, et al. Topic evolution in a stream of documents[C]//Proceeding of the Society for Industrial and Applied Mathematics. Washington: National Academy of Science, 2009:859-870.
[16] Griffiths T L,Steyvers M. Finding scientific topics[C]//Proceedings of the National Academy of Science. Washington: National Academy of Sciences, 2004:5228-5235.
[17] Walsh B. Markov chain monte carlo and Gibbs sampling[EB/OL].[2014-01-05]. http://web.mit.edu/~wingated/www/introductions/mcmc-gibbs-intro.pdf.
[18] 楚克明. 基于LDA的新闻话题演化研究[D].上海:上海交通大学,2010.
[19] 谭松波,王月粉.中文文本分类语料库-TanCorpV1.0[EB/OL].[2011-11-10].http://www.searchforum.org.cn/tansongbo/corpus.htm.
[20] 中国科学院计算技术研究所. ICTCLAS2011[EB/OL].[2010-12-21]. http://ictclas.org/ictclas_download.aspx.
[21] Guo Xin, Xiang Yang, Chen Qian, et al. LDA-based online topic detection using tensor factorization[J]. Journal of Information Science,2013,39(4): 459-469.
[22] 单斌,李芳.基于LDA话题演化研究方法综述[J].中文信息学报,2010,24(6):43-49,68.
[23] Cao Juan, Xia Tian, Li Jintao, et al. A density-based method for adaptive LDA model selection[J]. Neurocomputing, 2009,72(7-9): 1775-1781.
/
〈 | 〉 |