Derivation of Similar Web Text and Data Provenance

  • Ni Jing ,
  • Meng Xianxue
Expand
  • 1. Economic Management School, Beijing Institute of Petrochemical Technology, Beijing 102617;
    2. Agricultural Institute of Information, Chinese Academy of Agricultural Sciences, Beijing 100081

Received date: 2016-03-22

  Revised date: 2016-06-18

  Online published: 2016-07-05

Abstract

[Purpose/significance] To solve the problem for lacking of provenance metadata in existing web page, we put forward a method of automatic annotation.[Method/process] By clustering algorithm, automatic semantic annotation and linked data technology, combined with the PROV-POL data provenance model, the derivation of the Web page text entities are detected, through implementing the text level and attribute level data provenance structure.[Result/conclusion] Tests show that the semantic web technology and PROV model used to get the data provenance of web page text is feasible. The recall rate of clustering algorithm we applied needs to be improved. This method has a promising practical value for Web provenance.

Cite this article

Ni Jing , Meng Xianxue . Derivation of Similar Web Text and Data Provenance[J]. Library and Information Service, 2016 , 60(13) : 134 -140,148 . DOI: 10.13266/j.issn.0252-3116.2016.13.017

References

[1] LEBO T,SAHOO S,MCGUINNESS D.PROV-O:the PROV Ontology[EB/OL].[2016-03-18] http://www.w3.org/TR/prov-o/.
[2] ZHAO J,GOMADAM K,PRASANNA V.Predicting missing provenance using semantic associations in reservoir engineering[C]//Fifth IEEE international conference on semantic computing.New York:ACM,2011:141-148.
[3] BRAUN U,GARFINKEL S,HOLLAND D A,et al.Issues in automatic provenance collection[EB/OL].[2015-12-30].http://www.eecs.harvard.edu/~margo/papers/ipaw06/paper.pdf.
[4] MISSIER P,CHEN Z.Extracting PROV provenance traces from Wikipedia history pages[EB/OL].[2015-12-30].http://homepages.cs.ncl.ac.uk/paolo.missier/doc/p327-missier.pdf.
[5] TOM D N,SAM C,DAVY V D,et al.Automatic discovery of high-level provenance using semantic similarity[EB/OL].[2015-12-30].https://biblio.ugent.be/publication/3232929/file/3232964.pdf.
[6] MAGLIACANE S,GROTH P T,CHU-CARROLL J,et al.Building Watson:an overview of the DeepQA project[J].AI magazine,2010,31(3):59-79.
[7] MAGLIACANE S.Reconstructing provenance[EB/OL].[2015-12-30].http://www.few.vu.nl/~sme340/papers/reconstructing.pdf.
[8] MAGLIACANE S,GROTH P T.Towards reconstructing the provenance of clinical guidelines[EB/OL].[2015-12-30].http://www.ceur-ws.org/Vol-952/paper_36.pdf.
[9] HUYNH T,GROTH P,ZEDNIK S.PROV implementation report.W3C Working Group Note 30 April 2013[EB/OL].[2016-03-18].http://www.w3.org/TR/prov-implementations/.
[10] MARILENA D,SILVIO P,FRANCESCA T,et al.Political roles ontology (PRoles):enhancing archival authority records through semantic Web technologies[J].Procedia computer science,2014,38:60-67.
[11] DE NIES T,COPPENS S,MANNENS E,et al.Modeling uncertain provenance and provenance of uncertainty in W3C PROV[EB/OL].[2015-12-30].http://semweb.datasciencelab.be/assets/conference/pos10p-denies.pdf.
[12] 沈志宏,张晓林.语义网环境下数据溯源表达模型研究综述[J].现代图书情报技术,2011(4):1-8.
[13] 倪静,孟宪学.关联数据环境下数据溯源描述语言的比较研究[J].现代图书情报技术,2013(2):18-23.
[14] 李文燕,吴振新.起源信息模型及标准PROV的研究分析[J].情报理论与实践,2015(4):23-29.
[15] 贾君枝,寇蕾蕾.基于W7模型的数据起源本体语义分析[J].情报理论与实践,2016(3):118-121,129.
[16] 倪静,孟宪学.PROV数据溯源模型及Web应用[J].图书情报工作,2014,58(3):13-19.
[17] 倪静,孟宪学.Web应用中起源信息的定位和查询机制研究[J].图书情报工作,2014,58(11):97-103.
[18] 朱光,张薇薇,朱晓东.基于数据溯源和RDF语义的群体协作信任模型研究[J].情报理论与实践,2015(10):122-126.
[19] 谢铭.关联数据和知识表示的自动语义标注技术[D].武汉:武汉大学,2012.
Outlines

/