收稿日期: 2014-03-20
修回日期: 2014-04-06
网络出版日期: 2014-04-20
Entity Based Information Retrieval Technology
Received date: 2014-03-20
Revised date: 2014-04-06
Online published: 2014-04-20
旨在实现对给定的实体对象集匹配出尽可能宽的实体对象面,以帮助用户快速找到相关信息,尤其是那些需动态整合的特定领域的语义关联信息。分析Web文档中的实体对象结构及其关系,并借助Schema.org方案中的语义分类思想,提出构建具有语义特性的实体对象数据库建设方案。基于该数据库提出一个自适应的实体对象检索框架,该框架能对用户的查询意图进行分析并进行语义分类,形成一条条涵盖实体对象的查询语句,接着“智能地”选择、执行某些具有优先权的查询语句以匹配出那些保存在事实数据库中的相关实体对象。本研究旨在一定程度上实现“滚雪球”式的高效检索思想,满足智能检索技术的需求,促进以实体对象作为研究对象的情报理论研究工作的开展,并为智能情报检索技术的应用规划提供有用参考。
高广尚 . 面向实体对象的情报检索技术研究[J]. 图书情报工作, 2014 , 58(08) : 96 -104,35 . DOI: 10.13266/j.issn.0252-3116.2014.08.016
The goal is to match the entities which are related to a given set of entities as wide as possible, to help users quickly find relevant information, especially the dynamically integrated semantic information associated with domain-specific. To this end, this paper analyzes the structure of entities and their relationships in a Web document, and proposes a development scheme of building an entity database with semantic properties using semantic classification scheme in Schema.org. Then, proposes an adaptive entity search framework based on the entity database, the framework can carry out analysis on the user's query intent, classify the queries into certain semantic types, and form a number of queries with covered entities separately. Finally, the framework "intelligently" selects and performs some queries with priority to match related entities which stored in the facts databases. The study, to some extent, achieved efficient retrieval through a "snowball" style to meet the needs of intelligent search technology, promoted research work on intelligence theoretical studies using entities, and provided a useful reference for application planning of intelligent information retrieval techniques.
Key words: entity search; intelligent search; semantic categorization
[1] Future SEO: Understanding entity search[EB/OL]. [2013-10-07]. http://searchengineland.com/future-seo-understanding-entity-search-172997.
[2] Nie Zaiqing, Wen Jirong, Ma Weiying.Statistical entity extraction from the Web[J].Proceedings of the IEEE,2012,100(9):2675-2687.
[3] The coming “Entity Search” revolution[EB/OL]. [2013-09-27]. http://searchengineland.com/live-smx-east-the-coming-entity-search-revolution-172907.
[4] Kopliku A, Pinel-Sauvagnat K, Boughanem M.Aggregated search: A new Information retrieval paradigm[J].ACM Computing Surveys,2014,46(3):1-31.
[5] 马费成.情报学发展的历史回顾及前沿课题[J].图书情报知识,2013(2):4-12.
[6] Auger A,Barriere C.Pattern-based approaches to semantic relation extraction: A state-of-the-art [J].Terminology,2008,14(1):1-19.
[7] Alfonseca E,Filippova K,Delort J Y,et al. Pattern learning for relation extraction with a hierarchical topic model [C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2.Jeju Island:Association for Computational Linguistics,2012: 54-59.
[8] Web-scale entity relationship extraction that extracts pattern(s) based on an extracted tuple [J].
[9] Knowledge - Inside Search - Google[EB/OL]. [2014-03-10]. http://www.google.com/insidesearch/features/search/knowledge.html.
[10] Kurt Bollacker,Colin Evans, Praveen Paritosh,et al. Freebase:A collaboratively created graph database for structuring human knowledge [C]//Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. Vancouver: ACM, 2008: 1247-1250.
[11] Probase-Microsoft Research[EB/OL]. [2014-03-10]. http://research.microsoft.com/en-us/projects/probase/.
[12] Wu Wentao,Li Hongsong,Wang Haixun,et al. Probase: A Probabilistic taxonomy for text understanding[C] //Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. Scottsdale:ACM,2012: 481-492.
[13] 人立方关系搜索[EB/OL]. [2014-03-10]. http://renlifang.msra.cn/.
[14] Microsoft Academic Search[EB/OL]. [2014-02-28]. http://academic.research.microsoft.com/.
[15] Google Scholar[EB/OL]. [2014-02-28]. http://scholar.google.com/.
[16] Official Google Blog. Introducing the Knowledge Graph: Things, not strings[EB/OL]. [2014-03-10]. http://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-not.html.
[17] 叶鹰.试论情报学的三大重点研究领域[J].图书情报知识,2003(6):2-5.
[18] Chandra A K, Dabrowski O J, Benjamin D J,et al.Entity Based Search and Resolution:U S, 20130173639[P/OL].[2013-07-04]. https://www.google.com/patents/US20130173639?dq=Entity+based+search+and+resolution&ei=T_ZlU7zoAYXJuAS7hYKQBw&cl=en.
[19] He Yeye, Xin Dong, Ganti V,et al. Crawling deep Web entity pages[C]//Proceedings of the Sixth ACM International Conference on Web Search and Data Mining. Rome: ACM,2013: 355-364.
[20] Cucerzan S. Large-Scale named entity disambiguation based on wikipedia data[C]//Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Prague:ACL, 2007: 708-716.
[21] Zheng Zhicheng,Si Xiance,Chang E Y,et al. Entity disambiguation with freebase[C]//Proceedings of the The 2012 IEEE/WIC/ACM International Joiznt Conferences on Web Intelligence and Intelligent Agent Technology-Volume 01. Washington, DC: IEEE Computer Society,2012: 82-89.
[22] Auer S,Bizer C,Kobilarov G,et al. DBpedia: A nucleus for a Web of open data[C]//Proceedings of the 6th International the Semantic Web and 2nd Asian Conference on Asian Semantic Web Conference. Busan: Springer-Verlag,2007:722-735.
[23] Abadi D J,Marcus A,Madden S R,et al. Scalable semantic Web data management using vertical partitioning[C]//Proceedings of the 33rd International Conference on Very Large Data Bases. Vienna: VLDB Endowment,2007: 411-422.
[24] Jayaram N,Gupta M,Khan A,et al.GQBE: Querying knowledge graphs by example entity tuples[J].ICDE (demo description),2014(2):12-16.
[25] Full Hierarchy-schema.org[EB/OL]. [2014-03-10]. http://schema.org/docs/full.html.
[26] Khalili A,Auer S. WYSIWYM Authoring of structured content based on schema.org[C] //Web Information Systems Engineering-WISE 2013.Berlin: Springer,2013: 425-438.
[27] Chen Zheng, Liu Shengping,Liu Wenyin,et al. Building a Web thesaurus from Web link structure[C]//Proceedings of the 26th Annual International ACM SIGfIR Conference on Research and Development in Informaion Retrieval. Toronto:ACM,2003: 48-55.
[28] Zhu Jun, Nie Zaiqing, Liu Xiaojing,et al. StatSnowball:A statistical approach to extracting entity relationships[C]//Proceedings of the 18th International Conference on World Wide Web. Madrid: ACM,2009: 101-110.
[29] Agichtein E,Gravano L. Snowball: Extracting relations from large plain-text collections[C]//Proceedings of the Fifth ACM Conference on Digital Libraries. New York: ACM,2000: 85-94.
[30] Endrullis S,Thor A,Rahm E.Evaluation of query generators for entity search engines[EB/OL].[2014-03-10].http://arxiv.org/ftp/arxiv/papers/1003/1003.4418.pdf.
[31] Ganti V K,Nig A C,Vernica R. Entity categorization over large document collections[C]//Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Las Vegas: ACM,2008: 274-282.
[32] Köpcke H, Rahm E.Frameworks for entity matching: A comparison[J].Data & Knowledge Engineering,2010,69(2):197-210.
[33] Dalvi N, Kumar R, Pang B,et al. A Web of concepts[C]//Proceedings of the Twenty-eighth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. Rhode Island: ACM,2009: 1-12.
/
〈 | 〉 |