基于超链分析的Web资源自动发现技术

陈定权

图书情报工作 ›› 2003, Vol. 47 ›› Issue (9) : 94-98.

PDF(5069 KB)
PDF(5069 KB)
图书情报工作 ›› 2003, Vol. 47 ›› Issue (9) : 94-98.
信息管理·信息产业

基于超链分析的Web资源自动发现技术

作者信息 +

Web Resource Automatic Discovery Based on Hyperlink Analysis

Author information +
文章历史 +

摘要

传统的Web资源自动发现是基于Web页面内容实现的。本文试图从超链分析的角度探讨Web资源的自动发现技术。超链分析技术起源于社会网络分析和科学引文分析理论,它只分析页面之间的关系,而不关心页面本身的属性。通过试验证明,单纯使用超链,根据用户提供的网页实例,我们能够自动发现与学科资源相关的网站。该技术可以有效的减少网络爬行器的无谓爬行,提高采集效率,减轻网络负担,在学科资源建设中起了重要的作用。

Abstract

The traditional Web resource automatic discovery is based on page content. However, this paper discusses the technology of Web resource automatic discovery from the viewpoint of hyperlink. Hyperlink analysis is originated from social network analysis and science citation analysis, which only analyzes the rela-lions among Web pages, not the Web page content. Through our experiments, the result proves that we can discover many subject一related Web according to subject examples(URIs)only using hyperlink analysis. The technology can provide high quality URIs to Web spiders, improve the efficiency of crawling and lighten the load burden of network.

关键词

Web资源自动发现 / 超链分析 / HITS / 主题爬行

Key words

resource automatic discovery / hyperlink analysis / HITS / focus crawling

引用本文

导出引用
陈定权. 基于超链分析的Web资源自动发现技术[J]. 图书情报工作, 2003, 47(9): 94-98
Chen Dingquan. Web Resource Automatic Discovery Based on Hyperlink Analysis[J]. Library and Information Service, 2003, 47(9): 94-98
中图分类号: TP393   

参考文献

[1] Google Search Engine. www. google. com.

[2] S. Brin and L. Page. The anatomy of a large scale hypertexual Web search engine. Proceedings of the 7th World Wide Web Conference, Brisbane, Australis. 1998 http://www7. scu.edu. au/programme/fullpapers/1921 / com 1921. htm

[3] J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 1999,46 (5) :604 - 632

[4] M. R. Henzinger. Hyperlink analysis for the Web. IEEE Internet Computing, 2001,5( 1 ): 45 - 50

[5] CLEVER Search Engine. http://www. almaden. ibm. com/cs/k53/clever. html

[6] Citeler. http://www. citeseer. com

[7] NSTC-NITRD发展蓝皮书.http://www.nitrd.gov/pubs/blue03/advanced-technologies01. html # advanced

[8] J. Cho, H. Garcia- Molina, L. Page. Efficient crawling through ordering. Computer Networks and ISDN Systems. 1998(30): 161 - 172

[9] F. Menczer, G. Pant, M. E. Ruiz. Evaluating topic-driven Web crawlers, SIGIR' 01. New Orleans, Louisiana, USA.Se ptember 9 - 12, 2001, pp241 - 249

[10] S. Chakrabarti, M. van den Berg, B. Domc. Focused crawling: a new approach to topic- specific Web resource discovery. Available From Elsevier.

[11] S. Chakrabarti, K. Punera, M. Subramanyam. Accelerated focused crawling through online relevance feedback,WWW2002, May 7 - 11, Honolulu, Hawaii, USA.

[12] S. Chakrabarti, B. Dom, et al. Automatic resource compilation by analyzing hyperlink structure and associated text. Computer Networks and ISDN systems. 1998(30): 65 - 74

[13] J. Dean, M. R. Henzinger. Finding related pages in the World Wide Web. Computer Networks, 1999(31 ): 1467 -1479

[14] R. Lempel, S. Moran.. SALSA: The Stochastic approach for link-structure analysis. ACM Transactions on Information Systems, 2001,19(2): 131 - 160

[15] 物理数学学科门户网(国家科学数字图书馆).http://159.226.100.9:8000/


PDF(5069 KB)

Accesses

Citation

Detail

段落导航
相关文章

/