知识组织

专业社交媒体中的主题知识元抽取方法研究

  • 林杰 ,
  • 苗润生 ,
  • 张振宇
展开
  • 同济大学经济与管理学院 上海 200092
林杰(ORCID:0000-0002-5421-603X),教授,博士,博士生导师;张振宇(ORCID:0000-0002-4888-4023),博士研究生。

收稿日期: 2018-08-12

  修回日期: 2019-02-24

  网络出版日期: 2019-07-20

基金资助

本文系国家自然科学基金面上项目"社交媒体中用户创新价值度测量模型及互动创新管理方法研究"(项目编号:71672128)和同济大学基本科研业务费专项资金项目"基于大数据的社交网络传播机理与模型研究"(项目编号:1200219368)研究成果之一。

Research on Extraction Methods of Topic Knowledge Tuples in Professional Social Media

  • Lin Jie ,
  • Miao Runsheng ,
  • Zhang Zhenyu
Expand
  • School of Economics and Management, Tongji University, Shanghai 200092

Received date: 2018-08-12

  Revised date: 2019-02-24

  Online published: 2019-07-20

摘要

[目的/意义]以汽车论坛例,提出一种针对专业社交媒体文本的主题知识元抽取方法。[方法/过程]首先,通过LDA模型提取出汽车论坛中文本的主题,并进行去重,形成主题列表;其次,基于融合主题特征的深度学习模型T-LSTM模型构建适于汽车论坛本文的情感分析模型;然后,通过计算各词汇在图模型TextRank中的重要性与各词汇的Word2Vec主题相似度,抽取情感关键词与关键句,用于对文本主题与情感倾向的解释与补充;最后,对上述方法进行集成,输出结构化的主题知识元。[结果/结论]实验结果中,抽取得到的主题知识元合格率达到69.1%,表明本文提出的主题知识元抽取方法,能够围绕知识主题较为准确地抽取知识元,实现知识的结构化转换。

本文引用格式

林杰 , 苗润生 , 张振宇 . 专业社交媒体中的主题知识元抽取方法研究[J]. 图书情报工作, 2019 , 63(14) : 101 -110 . DOI: 10.13266/j.issn.0252-3116.2019.14.012

Abstract

[Purpose/significance] Topic knowledge tuple is a knowledge unit for operating and managing knowledge oriented to knowledge themes. Accurately extracting topic knowledge tuples facilitates the storage, expression and retrieval of knowledge, and realizes knowledge creation and knowledge evaluation in the process of using knowledge. Therefore, this article discusses the existing extraction methods and then, by taking car products as an example, comes up with a method of extracting topic knowledge tuples from professional social media.[Method/process] First of all, this paper extracted a theme list from the users' comments in car forums with the LDA model. Secondly, based on the deep learning model T-LSTM which integrated thematic features, a sentiment analysis model suitable for the corpus of users in car forums was built. Then, by calculating the importance of each word in the TextRank diagram model and the similarity of each word's Word2Vec topic, we extracted key words and key sentences for the purpose of interpreting the extracted theme and sentiment orientation. Finally, the above methods were encapsulated into an integrated topic knowledge tuple extraction method.[Result/conclusion] In the experimental results, the qualification rate of extracted topic knowledge tuples reaches 69.1%. Experimental results show that the proposed method in this paper is capable of refining and extracting each element of knowledge tuples around the topic, meanwhile it can transforms unstructured information into structural knowledge.

参考文献

[1] 文庭孝, 侯经川, 龚蛟腾,等. 中文文本知识元的构建及其现实意义[J]. 中国图书馆学报, 2007, 33(6):91-95.
[2] 卜曲. 品牌社区网络结构及成员互动内容研究[J]. 现代商贸工业, 2016, 37(4):55-56.
[3] 吴婧. 试论网络论坛的文本构建特色[J]. 新闻研究导刊, 2016, 88(4):66-67.
[4] 王知津. 知识组织的目标与任务[J]. 情报理论与实践, 1999, 22(2):65-68.
[5] 温有奎, 温浩, 徐端颐,等. 基于知识元的文本知识标引[J]. 情报学报, 2006, 25(3):282-288.
[6] 姜永常. 知识构建的基本原理研究(下)——知识构建的技术支撑[J]. 图书情报工作, 2009, 53(6):100-104.
[7] 刘淼, 王宇. 基于主题句的期刊文献知识元库构建[J]. 情报杂志, 2012(11):145-149.
[8] 杨亮. 面向社交媒体的文本情感分析关键技术研究[D]. 大连:大连理工大学, 2016.
[9] YIN Y, SONG Y, ZHANG M. Document-level multi-aspect sentiment classification as machine comprehension[C]//PALMER M. Proceedings of the conference on empirical methods in natural language processing. Copenhagen:Association for Computational Linguistics, 2017:2044-2054.
[10] BLEI D, NG A, JORDAN M. Latent dirichlet allocation[J]. Journal of machine learning research, 2003(3):993-1022.
[11] 涂海丽, 唐晓波, 谢力. 基于在线评论的用户需求挖掘模型研究[J]. 情报学报, 2015, 34(10):1088-1097.
[12] ALEX G. Long short-term memory[M]//Supervised sequence labelling with recurrent neural networks. Berlin:Springer, 2012:1735-1780.
[13] 梁军, 柴玉梅, 原慧斌,等. 基于极性转移和LSTM递归网络的情感分析[J]. 中文信息学报, 2015, 29(5):152-159.
[14] MIHALCEA R, TARAU P. TextRank:bringing order into texts[C]//RILL E. Proceedings of the conference on empirical methods in natural language processing. Barcelona:Association for Computational Linguistics, 2004:404-411.
[15] 韩龙士. 互联网+汽车新思维与商业模式创新[J]. 企业管理, 2015(7):104-106.
文章导航

/