An Authorship Attribution Algorithm Based on Complex Network

  • Li Xiaojun ,
  • Liu Huailiang ,
  • Du Kun
Expand
  • School of Economy and Management, Xidian University, Xi'an 710126

Received date: 2015-08-16

  Revised date: 2015-09-02

  Online published: 2015-09-20

Abstract

[Purpose/significance] Authorship analysis by means of textual features is an important task in text mining and linguistic studies.Tosolve the problem of low efficiency and high costs in authorship attribution using traditional method,complex networks theory has been employedto tackle this disputed problem.[Method/process] In this paper,some measurable quantities of word co-occurrence complex network of text has been for used for authorship characterization.Based on stylistics and the network features,the approach is defined for authorship identification bycomputing theauthors' stylefeatures similarity.[Result/conclusion] The authorship attribution algorithm based on complex network can use authors' style featureseffectively.The experimental results show high accuracy rate in authorship attribution and prove the validity of this method.

Cite this article

Li Xiaojun , Liu Huailiang , Du Kun . An Authorship Attribution Algorithm Based on Complex Network[J]. Library and Information Service, 2015 , 59(18) : 102 -107 . DOI: 10.13266/j.issn.0252-3116.2015.18.016

References

[1] Bozkurt I N,Baghoglu O,Uyar E.Authorship attribution[C]//22nd International Symposium on Computer & Information Sciences.Piscataway:IEEE,2007:1-5.
[2] Collobert R,Weston J.A unified architecture for natural language processing: Deep neural networks with multitask learning[C]// Proceedings of the 25th International Conference on Machine Learning.New York:ACM,2008:160-167.
[3] Stamatatos E,Fakotakis N,Kokkinakis G.Computer-based authorship attribution without lexical measures[J].Computers & the Humanities,2001,35(2):193-214.
[4] Sebastiani F.Machine learning in automated text categorization[J].ACM Computing Surveys,2002,34(2):1-47.
[5] Klein A,Riazanov A,Hindle M M,et al.Benchmarking infrastructure for mutation text mining[J].Journal of Biomedical Semantics,2014,5(1):11.
[6] Liu Wenyin,Quan Xiaojun,Feng Min,et al.A short text modeling method combining semantic and statistical information[J].Information Sciences,2010,180:4031-4041.
[7] Neme A,Pulido J R G,Abril Muñoz,et al.Stylistics analysis and authorship attribution algorithms based on self-organizing maps[J].Neurocomputing,2015,147:147-159.
[8] Parasher S V.Indian English: Certain grammatical,lexical and stylistic features[J].English World-Wide,1983,4(1):27-42.
[9] Savoy J.Authorship attribution based on a probabilistic topic model[J].Information Processing & Management,2013,49(1):341-354.
[10] 吕英杰,范静,刘景方.基于文体学的中文UGC作者身份识别研究[J].现代图书情报技术,2013(9):45-49.
[11] 武晓春,黄萱菁,吴立德.基于语义分析的作者身份识别方法研究[J].中文信息学报,2006,20(6):61-68.
[12] 王少康,董科军,阎保平.基于语句节奏特征的作者身份识别研究[J].计算机工程,2011,37(9):4-5.
[13] 年洪东,陈小荷,王东波.现当代文学作品的作者身份识别研究[J].计算机工程与应用,2010,46(4):226-229.
[14] 祁瑞华,霍跃红,郭旭,等.典籍英译作者身份识别研究[J].现代图书情报技术,2015,31(1):31-37.
[15] 刘海涛.语言是一种复杂网络[J].山西大学学报:哲学社会科学版,2013,36(5):65-77.
[16] Li Yong,Wei Luoxia,Li Wei,et al.Small-world patterns in Chinese phrase networks[J].Science Bulletin,2005,50(3):287-289.
[17] Pavelec D,Oliveira L S,Justino E,et al.Author identification using compression models[C]//10th International Conference on Document Analysis and Recognition.Los Alamitos:IEEE,2009:936-940.
[18] 李菁菁.功能语言学视角下的文体风格研究[J].吉林化工学院学报,2012,29(10):46-48.
[19] Antiqueira L,Pardo T A S,Nunes MG V,et al.Some issues on complex networks for author characterization[J].Revista Iberoamericana de Inteligencia Artificial,2007,11(36):51-58.
[20] 孟海东,张炼,吕海林,等.基于图模型的文本分类方法的研究[J].计算机与现代化,2010(9):38-40,44.
[21] 刘巧凤.基于图结构的中文文本聚类方法研究[D].大连:大连理工大学,2009.
[22] Heaps H C.Information retrieval: Computational and theoretical aspects[M].New York:Academic Press,1978.
[23] Egghe L.Untangling Herdan's law and Heaps' law: Mathematical and informetric arguments[J].Journal of the American Society for Information Science & Technology,2007,58(5):702-709.
[24] Toolan M J.Language in literature:An introduction to stylistics[M].London:Arnold,2009:3-10.
[25] 徐燕文.以功能词为文体标识符:对小说、新闻、诗歌和学术写作的分析[D].杭州:浙江大学,2014.
[26] 张宇,刘雨东,计钊.向量相似度测度方法[J].声学技术,2009,28(4):532-536.

Outlines

/