收稿日期: 2013-10-25
修回日期: 2014-01-03
网络出版日期: 2014-01-20
基金资助
本文系国家自然科学基金项目“知识网络的形成机制及演化规律研究”(项目编号:71173249)和国家建设高水平大学公派研究生项目(项目编号:留金发[2011]3005)研究成果之一。
Topic Visualization in Texts and Its Application in Risk Identification of Public Companies
Received date: 2013-10-25
Revised date: 2014-01-03
Online published: 2014-01-20
赵一鸣 , 张进 . 文本主题可视化及其在上市公司风险分析中的应用[J]. 图书情报工作, 2014 , 58(02) : 102 -108 . DOI: 10.13266/j.issn.0252-3116.2014.02.017
This paper proposes a method to discover and describe topics in the terms layer using visualization tool. It indicates that a term set with cohesion relationships represents the topics in text collection. The proximity among terms in transposed vector space is calculated to represent the cohesion relationships. Then it will be projected to visible Multi-Dimensional Scale (MDS) graph by multidimensional scaling analysis method, and terms which have agglomeration relationship and are attached to same topics cluster into topics in space graph. This method is successfully applied to make risk identification for public companies of computer application services, and reveal the specific presentation and semantic content of market risk.
[1] Yaari Y. Segmentation of expository texts by hierarchical agglomerative clustering[J/OL]. arXiv preprint cmp-lg/9709015, 1997.[2013-09-30]http://www.researchgate.net/publication/1858238_Segmentation_of_Expository_Texts_by_Hierarchical_Agglomerative_Clustering.
[2] Salton G,Singhal A,Buckley C,et al.Automatic text decomposition using text segments and text themes[C]//Proceedings of the Seventh ACM Conference on Hypertext. New York: ACM, 1996: 53-65.
[3] 徐永东.多文档自动文摘关键技术研究[D].哈尔滨:哈尔滨工业大学,2007:66.
[4] Ercan G, Cicekli I. Using lexical chains for keyword extraction[J]. Information Processing & Management, 2007, 43(6):1705-1714.
[5] 史忠植.知识发现(第二版)[M].北京:清华大学出版社,2011:111.
[6] Pons-Porrata A, Berlanga-Llavori R, Ruiz-Shulcloper J. Topic discovery based on text mining techniques[J]. Information Processing & Management, 2007, 43(3): 752-768.
[7] 王国勇,徐建锁.TCBLSA:一种中文文本聚类新方法[J].计算机工程,2004,30(5):21-22.
[8] 吴江宁,田海燕.基于主题地图的文献组织方法研究[J].情报学报,2007,26(3):323-331.
[9] 杨峰,周宁,吴佳鑫.基于信息可视化技术的文本聚类方法研究[J].情报学报,2006,24(6):679-683.
[10] Blei D, Ng A, Jordan M. Latent dirichlet allocation[J]. The Journal of Machine Learning Research. 2003, 3(3): 993-1022.
[11] 孔庆苹,刘宗田,廖涛.基于概念获取的多文档主题划分研究[J].计算机科学,2008,35(5):131-133.
[12] 胡珀,何婷婷.基于自适应聚类的文本潜在主题的自动发现[J].郑州大学学报:理学版,2007,39(2): 92-95.
[13] 袁里驰.一种基于互信息的词聚类算法[J].系统工程,2008,26(5):120-122.
[14] Ayad H, Kamel M. Topic discovery from text using aggregation of different clustering methods[C]//Advances in Artificial Intelligence: 15th Conference of the Canadian Society for Computational Studies of Intelligence.Calgary,Canada:Springer, 2002: 161.
[15] 陈炯,张永奎.一种基于词聚类的中文文本主题抽取方法[J].计算机应用,2005,25(4):754-756.
[16] 王波,王厚峰.中文单词聚类的比较研究[C]//第三届学生计算语言学研讨会论文集,北京:中国中文信息学会,2006:140-144.
[17] Zhang Jin, Wolfram D. Visual analysis of obesity-related query terms on HealthLink[J]. Online Information Review, 2009, 33(1): 43-57.
[18] Zhang Jin, Wolfram D, Wang Peiling,et al. Visualization of health subject analysis based on query term co-occurrences[J]. Journal of the American Society for Information Science and Technology, 2008, 59(12): 1933-1947.
[19] 姚天顺,朱靖波.自然语言理解[M].北京:清华大学出版社,2002:133.
[20] Halliday M A K, Hasan R. Cohesion in english[M]. Longman: Addison-Wesley Longman Ltd., 1976.
[21] 张燕飞.信息组织的主题语言[M].武汉:武汉大学出版社,2005.
[22] Salton G, Wong A, Yang C S. A vector space model for automatic indexing[J]. Communications of the ACM, 1975: 613-620.
[23] 冯项云.LSI潜在语义标引方法在情报检索中的应用[J].现代图书情报技术,1998(4):19-21.
[24] Zhang Jin, Zhao Yiming. A user term visualization analysis based on a social question and answer log[J]. Information Processing and Management, 2013, 49(5):1019-1048.
[25] 新浪财经[EB/OL].[2013-09-30]. http://vip.stock.finance.sina.com.cn/mkt/#hangye_ZG87.
[26] 赵鑫,李晓明.主题模型在文本挖掘中的应用[EB/OL].[2013-11-28].http://vdisk.weibo.com/s/6PBbC.
[27] Steyvers M, Griffiths T. Probabilistic topic models[C]//Thomas L, Danielle M, Simon D, et al. Handbook of Latent Semantic Analysis. Mahwah: Lawrence Erlbaum Associates Publishers, 2007: 424-440.
[28] Donohue J. Understanding scientific literature: A bibliographic approach[M].Cambridge: MIT Press, 1973.
[29] 中国网.723动车事故牵连多家上市公司,世纪瑞尔遭质疑[EB/OL].[2013-09-08]. http://www.china.com.cn/economic/txt/2011-07/26/content_23069948.htm, 2012-11-16.
/
〈 | 〉 |