[Purpose/significance] Existing methods of extracting keywords can't be applied to the social Q&A community effectively, because they are not suitable for the characteristics of the social Q&A community which embodies short texts, colloquial contents and sparse data. They rarely think about the impact of users' attention on words. In view of the aforementioned problem, this paper presents a novel keywords extraction method based on multi-attributes weighted for the social Q&A community. [Method/process] This method improved the traditional TF-IDF algorithm by introducing the tuning function and the part of speech. Besides, it calculated the weight of words based on a linear weighting formula, which fused four attributes of user focus by dealing with numbers of users' answer, attention, browse, and comments. [Result/conclusion] Experiments show that this method can extract keywords from the social Q&A community more effectively.
Yu Bengong
,
Li Ting
,
Yang Ying
. Keywords Extraction Method for the Social Q&A Community Based on Multi-attributes Weighted[J]. Library and Information Service, 2018
, 62(5)
: 132
-139
.
DOI: 10.13266/j.issn.0252-3116.2018.05.015
[1] 陈娟,邓胜利.社会化问答平台用户体验影响因素实证分析——以知乎为例[J].图书情报工作,2015, 59(24):102-108.
[2] 袁红,赵娟娟.问答社区中用户与资源互动研究[J].图书情报工作,2014, 58(18):102-109.
[3] WITTEN I H, PAYNTER G W, FRANK E, et al. KEA:Practical automatic keyphrase extraction[C]//Proceedings of the fourth ACM conference on Digital libraries. New York:ACM, 1999:254-255.
[4] HORITA K, KIMURA F, MAEDA A. Automatic keyword extraction for wikification of east asian language documents[J]. International journal of computer theory and engineering, 2016, 8(1):32-35.
[5] 方俊,郭雷,王晓东.基于语义的关键词提取方法[J].计算机科学,2008,35(6):148-151.
[6] 费洪晓,康松林,朱小娟,等.基于词频统计的中文分词的研究[J].计算机工程与应用,2005,41(7):67-68.
[7] 王立霞,淮晓永.基于语义的中文文本关键词提取方法[J].计算机工程,2012,38(1):1-4.
[8] 黄鲁成,蒋林杉,苗红,等.基于网络问答社区的话题识别与分析——以知乎"老年人"话题为例[J].图书情报工作, 2016,60(5):93-100.
[9] 陈娟,高杉,邓胜利.社会化问答用户特征识别与行为动机分析——以"知乎"为例[J].情报科学,2017(5):69-74.
[10] 傅柱,王曰芬,陈必坤.国内外知识流研究热点:基于词频的统计分析[J].图书馆学研究,2016(14):2-12.
[11] 罗燕,赵书良,李晓超,等.基于词频统计的文本关键词提取方法[J].计算机应用,2016,36(3):718-725.
[12] 陈伟鹤,刘云.基于词或词组长度和频数的短中文文本关键词提取方法[J].计算机科学,2016, 43(12):50-57.
[13] 张建娥.基于多特征融合的中文文本关键词提取方法[J].情报理论与实践,2013,36(10):105-108.
[14] 张建娥.基于TFIDF和词语关联度的中文关键词提取方法[J].情报科学, 2012(10):110-112,123.
[15] 张瑾.基于改进TF-IDF方法的情报关键词提取方法[J]. 情报杂志, 2014(4):153-155.
[16] 廖晓,李志宏,席运江.基于加权知识网络的企业社区用户创新知识建模及分析方法[J].系统工程理论与实践,2016,36(1):94-105.
[17] SALTON G, BUCKLEY C. Term-weighting approaches in automatic text retrieval[J]. Information processing & management, 1988, 24(5):513-523.
[18] PAIK J H. A novel TF-IDF weighting scheme for effective ranking[C]//Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval. New York:ACM, 2013:343-352.
[19] 施聪莺,徐朝军,杨晓江.TFIDF方法研究综述[J].计算机应用,2009,29(s1):167-170.
[20] 罗繁明,杨海深.大数据时代基于统计特征的情报关键词提取方法[J].情报资料工作,2013,34(3):19-20.
[21] 钱爱兵,江岚.基于改进TF-IDF的中文网页关键词抽取——以新闻网页为例[J].情报理论与实践,2008, 31(6):147-152.
[22] 张保富,施化吉,马素琴.基于TFIDF文本特征加权方法的改进研究[J].计算机应用与软件,2011,28(2):17-20.
[23] 袁津生,毛新武.基于组合特征的中文新闻网页关键词提取方法[J].计算机工程与应用,2014, 50(19):222-226.
[24] 蒋昌金,彭宏,陈建超,等.基于主题词权重和句子特征的自动文摘[J].华南理工大学学报(自然科学版), 2010,38(7):50-55.
[25] 李湘东,巴志超,黄莉.一种基于加权LDA模型和多粒度的文本特征选择方法[J].现代图书情报技术,2015,31(5):42-49.
[26] 路永和,王鸿滨.文本分类中受词性影响的特征权重计算方法[J].现代图书情报技术,2015,31(4):18-25.
[27] 周鹏,蔡淑琴,石双元,等.基于关键词抽取的微博舆情事件内容聚合[J].情报杂志,2014(1):91-96.
[28] YE H M, CHENG W, DAI G Z. Design and implementation of on-line hot topic discovery model[J]. Wuhan University journal of natural sciences, 2006, 11(1):21-26.
[29] SAATY T L. Modeling unstructured decision problems-the theory of analytical hierarchies[J]. Mathematics and computers in simulation, 1978, 20(3):147-158.
[30] 刘开第,庞彦军,周少玲,等.多准则排序中的路径问题及层次分析法推广[J].系统工程理论与实践, 2015,35(4):973-983.
[31] 邓爱东.多层次模糊综合评价法在图书馆危机管理中的应用[J].现代情报,2008, 28(6):117-119.
[32] 李亚平,焦建玲.网上交易流程效率评价[J].合肥工业大学学报:自然科学版,2009,32(8):1204-1207.