Overview on Microblog Topic Detection Methods

  • Liang Xiaohe ,
  • Tian Ruya ,
  • Wu Lei ,
  • Zhang Xuefu
Expand
  • Agricultural Information Institute of Chinese Academy of Agricultural Science, Beijing 100081

Received date: 2016-11-29

  Revised date: 2017-05-26

  Online published: 2017-07-20

Abstract

[Purpose/significance] A comprehensive review and analysis of the research literatures of microblog topic detection in China and abroad. It provides a reference for researchers for further study. [Method/process] Firstly, this paper summaries the basic principle and main research methods of traditional topic discovery; Secondly, based on the organizational characteristics of microblog text, microblog text discovery methods are reviewed in detail and characteristics from the following two aspects, the characteristics of microblog's short text and microblog's non-text feature; Finally, it prospects the development direction of microblog topic discovery. [Result/conclusion] In summary, the research of microblog topic detection is still in the primary stage. It should continue to deepen the theoretical exploration and innovative research methods in the future.

Cite this article

Liang Xiaohe , Tian Ruya , Wu Lei , Zhang Xuefu . Overview on Microblog Topic Detection Methods[J]. Library and Information Service, 2017 , 61(14) : 141 -148 . DOI: 10.13266/j.issn.0252-3116.2017.14.019

References

[1] 新浪公司投资者关系部.新浪发布2016年第四季度及全年财报[EB/OL].[2017-05-01].http://finance.sina.com.cn/stock/usstock/c/2017-02-23/doc-ifyavvsh5976970.shtml.
[2] 邬启为. 基于向量空间的文本聚类方法与实现[D]. 北京:北京交通大学, 2014.
[3] 曹娟, 张勇东, 李锦涛, 等.一种基于密度的自适应最优LDA模型选择方法[J]. 计算机学报, 2008, 31(10):1780-1787.
[4] DEERWESTER S C, DUMAIS S T, LANDAUER T K, et al. Indexing by latent semantic analysis[J]. Journal of the American Society for Information Science, 1990, 41(6):391-407.
[5] HOFMANN T. Probabilistic latent semantic indexing[C]//Proceedings of the 22nd annual international SIGIR conference. New York:ACM Press, 1999:50-57.
[6] BEI D, NG A, JORDAN M. Latent Dirichlet Allocation[J]. Journal of machine learning research, 2003(3):993-1022.
[7] RAMAGE D, HALL D, NALLAPATI R, et al. Labeled LDA:a supervised topic model for credit attribution in multi-labeled corpora[C]//Proceeding of the 2009 conference on empirical methods in natural language processing (EMNLP'09). Stroudsburg:Assocation for Comuputational Linguistics, 2009:248-256.
[8] LIU Z, HUANG W, ZHENG Y, et al. Automatic keyphrase extraction via topic decomposition[C]//Proceedings of the 2010 conference on empirical methods in natural language processing (EMNLP'10). Stroudsburg:Assocation for Comuputational Linguistics, 2010:366-376.
[9] FENG Y, LAPATA M. Topic models for image annotation and text illustration[C]//Proceedings of the North American Chapter of the Association for Computational Linguistics. Berlin:Springer, 2010:831-839.
[10] 姜晓伟, 王建民, 丁贵广. 基于主题模型的微博重要话题发现与排序方法[J]. 计算机研究与发展, 2013(增刊):179-185.
[11] 唐晓波, 王洪艳. 基于潜在语义分析的微博主题挖掘模型研究[J]. 图书情报工作, 2012, 56(24):114-119.
[12] 翟延冬, 王康平, 张东娜.一种基于Word Net的短文本语义相似性算法[J]. 电子学报, 2012, 40(3):617-620.
[13] BEIL F, ESTER M, XU X. Frequent term based text clustering[C]//Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD'02). New York:ACM, 2002:436-422.
[14] FUNG B C M, WANG K, ESTER M. Hierarchical document clustering using frequent itemsets[C]//Proceedings of the 3rd SIAM international conference on data mining (SDM'03). Philadephia:SIAM, 2003:59-70.
[15] ZHANG W, YOSHIDA T, TAND X J, et al. Text clustering using frequent itemsets[J]. Knowledge-based systems, 2010, 23(5):379-388.
[16] 彭敏, 黄佳佳, 朱佳晖, 等. 基于频繁项集的海量短文本聚类与主题抽取[J]. 计算机研究与发展, 2015, 52(9):1941-1953.
[17] LI Y, CHUNG S M, HOLT J D. Text document clustering based on frequent word meaning sequences[J]. Data & knowledge engineering, 2008, 64(1):381-404.
[18] MACQUEEN J B. Some methods for classification and analysis of multivariate observations[C]//Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. Berkeley:University of California Press, 1967:281-297.
[19] LI W, MRCALLUM A. Pachinko allocation:DAG-structured mixture models of topic correlations[C]//Proceedings of the international conference on machine learning (ICML). Pittsburgh:Pennsylcanis, 2006:577-584.
[20] NG R T, HAN J. Clarans:a method for clustering objects for spatial data mining[J].IEEE transactions on knowledge and data engineering, 2002,14(5):1003-1016.
[21] GUHA S, RASTOGI R, SHIM K. CURE:an efficient clustering algorithm for large databases[C]//Proceedings of ACM international conference management of data. New York:ACM, 1998:73-84.
[22] GUHA S, RASTOGI R, SHIM K. ROCK:a robust clustering algorithm for categorical attributes[J]. Information systerns, 2000, 25(5):345-366.
[23] KARYPIS G, HAN E H, KUMAR V. Chameleon:hierarchical clustering using dynamic modeling[J]. Computer, 1999, 32(8):68-75.
[24] PONS-PORRATA A, BERLANGA-LLAVORI R, RUIZ-SHULCLOPER J. Topic discovery based on text mining techniques[J]. Information processing & management, 2007, 43(3):752-768.
[25] CUTTNG D R, KARGER D R, PEDERSEN J O, et al. Scatter/gather:a cluster-based approach to browsing large document collections[C]//Proceedings of the 15th annual international ACM AIGIR conference on rrsearch and development in information retrieval. New York:ACM, 1992:318-329.
[26] ZHAO Y, KARYPIS G, FAYYAD U. Hierarchical clustering algorithms for document datasets[J]. Data mining and knowledge discovery, 2005, 10(2):141-168.
[27] 郝洪星, 朱玉全, 陈耿, 等. 基于划分和层次的混合动态聚类算法[J]. 计算机应用研究, 2011, 28(1):51-53.
[28] ASUNCION A U, SMYTH P, WELLING M. Asynchronous distributed learning of topic models[C]//Proceedings of the 22nd annual conference on neural information processing systems advances. British Columbia:Vancouver, 2008:81-88.
[29] YAN X H, GUO J F, LAN Y Y, et al.A biterm topic model for short texts[C]//Proceedings of the 22nd international World Wide Web conferences. New York:ACM, 2013:1445-1456.
[30] BLEI D M. LAFFERY J D. Correlated topic medels[C]//Advances in neural information processing systems 18. Cambridge:MIT Press, 2005:118-120.
[31] 张晨逸, 孙建伶, 丁轶群. 基于MB-LDA模型的微博主题挖掘[J].计算机研究与发展, 2011,48(10):1795-1802.
[32] 唐晓波, 向坤. 基于LDA模型和微博热度的热点挖掘[J]. 图书情报工作, 2014, 58(5):58-63.
[33] WENG J S, LIM P, JIANG J, et al. Twitterrank:finding topic-sensitive influential twitterers[C]//Proceedings of the 3rd ACM international conference on Web search and data mining(WSDM'10). New York:ACM Press, 2010:261-270.
[34] ZVI M, GRIFFITHS T, STEYVERS M, et al. The author-topic model for authors and documents[C]//Proceedings of the 20th conference on uncertainty in artificial intelligence(UAI'04). Arlington:AUAI Press, 2004:487-494.
[35] LIN C, HE Y. Joint sentiment/topic model for sentiment analysis[C]//Proceedings of the 18th ACM conference on information and knowledge management. New York:ACM, 2009:375-384.
[36] DING W Y, SONG X L, GUO L F, et al. A novel hybrid HDP-LDA model for sentiment analysis[C]//Proceedings of IEEE/WIC/ACM international joint conferences on Web intelligence and intelligent agent technology. New York:ACM, 2013:329-336.
[37] JO Y, OH A. Aspect and sentiment unification model for online review analysis[C]//Proceedings of the 4th ACM international conference on Web search and data mining. New York:ACM, 2011:815-824.
[38] TITOV I, MCDONALD R. Modeling online reviews with multi-grain topic models[C]//Proceedings of WWW'08. New York:ACM, 2008:111-120.
[39] MEI Q Z, LING X, WONDER M, et al. Topic sentiment mixture:modeling facets and opinion in weblogs[C]//Proceedings of the 16th international conference on World Wide Web. New York:ACM, 2007:171-170.
[40] WANG C, WANG J, XIE X, et al. Mining geographic knowledge using location aware topic model[C]//Proceedings of the 4th ACM workshop on geographical information retrieval. New York:ACM, 2007:65-70.
[41] MEI Q Z, LIU C, SU H, et al. A probabilistic approach to spatiotemporal theme pattern mining on weblogs[C]//Proceedings of the 15th international conference on World Wide Web. New York:ACM, 2006:533-542.
[42] MEI Q Z, CAI D, ZHANG D, et al. Topic modeling with network regularization[C]//Proceedings of the 17th international conference on World Wide Web. New York:ACM Press, 2008:101-110.
[43] RATTENBURY T, GOOD N, NAAMAN M. Towards automatic extraction of event and place semantics from Flickr tags[C]//Proceedings of the 30th annual International ACM SIGR conference on research and development in information retrieval. New York:ACM Press, 2007:103-110.
[44] CRANDALL D J, BACKSTROM L, HUTTENLOCHER D P, et al. Maping the world's photos[C]//Proceedings of the 18th international conference on World Wide Web. New York:ACM Press, 2009:761-770.
[45] SIZOV S. GeoFolk:latent spatial semantics in web 2.0 social media[C]//Proceedings of the 17th international conference on World Wide Web. New York:ACM Press, 2008:297-306.
[46] YIN Z J, CAO L L, HAN J W, et al. Geographical topic discovery and comparison[C]//Proceedings of the 11th international conference on World Wide Web. New York:ACM, 2011:247-256.
[47] 张寅,汤斯亮,罗斯杰,等. 结合作者与地理信息的主题建模[J]. 计算机辅助设计与图形学学报, 2012, 24(9):1180-1187.
[48] 胡艳丽, 白亮, 张维明. 网络舆情中一种基于OLDA的在线话题演化方法[J].国防科技大学学报,2012(1):150-154.
[49] 洪娜, 钱庆, 李亚子, 等. 网络内容演化趋势影响因素分析——从词的生命周期和背景词簇环境中挖掘演化线索[J].情报理论与实践,2012(6):44-48.
[50] BLEI D M, LAFFERTY J D. Dynamic topic models[C]//Proceedings of the 23rd international conference on machine learning (ICML). New York:ACM, 2006:113-120.
[51] 唐晓波, 房晓可. 基于文本聚类与LDA相融合的微博主题检索模型研究[J]. 情报理论与实践, 2013, 36(8):85-90.
[52] 史剑虹, 陈兴蜀, 王文贤. 基于隐含主题分析的中文微博话题发现[J]. 计算机应用研究, 2014, 31(3):700-704.
[53] 蒋盛益, 麦智凯, 吴美玲, 等. 微博信息挖掘技术研究综述[J]. 图书情报工作, 2012,56(17):136-142.
[54] 李鹏, 于岩, 李英乐, 等. 基于权重微博链的改进LDA微博主题模型[J]. 计算机应用研究, 2015, 33(7):2018-2021.
[55] WANG D S,KYUNGLAG K,SOHN J,et al. Community topical fingerprint analysis based on social semantic networks[M]. Lecture notes in electrical engineering,2014:83-91.
[56] 刘怡君, 李倩倩, 田儒雅, 等.基于超网络的社会舆论形成及应用研究[J].中国科学院院刊, 2012, 27(5):560-567.
Outlines

/