知识组织

多维度属性加权分析的微博用户聚类研究

  • 张海涛 ,
  • 唐诗曼 ,
  • 魏明珠 ,
  • 李泽中
展开
  • 1. 吉林大学管理学院 长春 130022;
    2. 吉林大学信息资源研究中心 长春 130022
张海涛(ORCID:0000-0002-9421-8187),教授,博士生导师,E-mail:zhtinfo@126.com;唐诗曼(ORCID:0000-0002-4355-7963),硕士研究生;魏明珠(ORCID:0000-0001-8430-7461),硕士研究生;李泽中(ORCID:0000-0002-1970-5815),博士研究生。

收稿日期: 2018-05-16

  修回日期: 2018-07-23

  网络出版日期: 2018-12-20

Research on the Clustering of Microblog Users Based on Multi-dimensional Attribute Weighting Analysis

  • Zhang Haitao ,
  • Tang Shiman ,
  • Wei Mingzhu ,
  • Li Zezhong
Expand
  • 1. The Management College of Jilin University, Changchun 130022;
    2. The Information Resource Research Center of Jilin University, Changchun 130022

Received date: 2018-05-16

  Revised date: 2018-07-23

  Online published: 2018-12-20

摘要

[目的/意义]准确把握社交网络用户兴趣倾向,对用户进行分类并形成高聚合的用户群,对研究社交网络信息生态以及信息推荐有重大意义。[方法/过程]通过构造基于多维度的用户属性描述层次模型,根据模型数据需求从新浪微博抓取用户样本数据,对相关用户背景信息、用户博文信息以及用户行为信息的多维度属性下二阶变量进行量化,构造用户向量表达式,比较单一维度与多维度下的用户分类效果,进一步给属性赋予不同的权重值进行加权分析,在取得最优聚类效果后进行方差分析,对模型进行改进。[结果/结论]基于多维度属性加权后的用户聚类效果明显高于单一维度及多维度非加权条件下的用户聚类,且用户博文内容维度对于提高用户聚类效果的有效性最大。

本文引用格式

张海涛 , 唐诗曼 , 魏明珠 , 李泽中 . 多维度属性加权分析的微博用户聚类研究[J]. 图书情报工作, 2018 , 62(24) : 124 -133 . DOI: 10.13266/j.issn.0252-3116.2018.24.016

Abstract

[Purpose/significance] It is of great significance for the study of social network information ecology and information recommendation to accurately grasp the interest tendency of social network users and classify users into highly aggregated user groups.[Method/process] In this paper, by constructing the user attributes describe hierarchical model based on multi-dimensional, according to the model data requirements fetching user sample data from Sina microblog, quantify the secondorder variable based on the multi-dimensional property of the users' background information, users' blog information and user behavior information to construct user vector expression, comparing the classification results based on single dimension and the multi-dimensional, given different weights to attribute for weighted analysis, when achieve the optimal clustering results, based it do variance analysis to improve the model.[Result/conclusion] User clustering effect based on the multi-dimensional attribute weighting is significantly better than the user clustering effect based on the single-dimensional and under the condition of the multidimensional unweighted, and users microblog content dimension for improving the validity of user clustering effect is the largest.

参考文献

[1] ALARCóN-DEL-AMO M C, LORENZO-ROMERO C, GóMEZ-BORJA M Á. Classifying and profiling social networking site users:a latent segmentation approach[J]. Cyberpsychology, behavior, and social networking,2011,14(9):547-553.
[2] 张琳,谢忠红.基于聚类的微博用户类型与影响力研究[J].情报科学,2016,34(8):57-61.
[3] HANNON J,BENNETT M,SMYTH B.Recommending Twitter users to follow using content and collaborative filtering approaches[C]//Proceedings of the 4th ACM conference on pecommender systems. New York:ACM,2010:199-206.
[4] BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation[J]. Journal of machine learning research, 2003,3(1):993-1022.
[5] HONG L J,DAVISION B. Empirical study of topic modeling in Twitter[C]//Proceedings of the first workshop on social media analytics.New York:ACM Press,2010:80-88.
[6] EFRON M. Information search and retrieval in microblogs[J].Journal of the American Society for Information Science and Technology,2011,62(6):996-1008.
[7] 徐志明,李栋,刘挺,等.微博用户的相似性度量及其应用[J].计算机学报,2014,37(1):207-218.
[8] 黄静.消费型虚拟社区的用户行为特征及其应用研究[J].图书情报工作,2011,55(3):97-100,51.
[9] 崔金栋,孙遥遥,王欣,等.基于Folksonmy和本体融合的微博信息推荐方法研究[J].情报科学,2015,33(10):27-31.
[10] 薛云霞. 微博用户属性识别方法研究[D].苏州:苏州大学,2015.
[11] 顾晓雪,章成志.标注内容与用户属性结合的标签聚类研究[J].现代图书情报技术,2015(10):30-39.
[12] 彭希羡,朱庆华,刘璇.微博客用户特征分析及分类研究——以"新浪微博"为例[J].情报科学,2015,33(1):69-75.
[13] 张国英,沙云,刘旭红,等.高维云模型及其在多属性评价中的应用[J].北京理工大学学报,2004(12):1065-1069.
[14] 李小慧. 基于Jaccard项目类别相似性的个性化推荐算法研究[D].长沙:中南大学,2010.
文章导航

/