Application of Random Forest in the Fragmented Integration of University Information

  • Zhang Wende ,
  • Cheng Han ,
  • Liu Tian ,
  • Zeng Jinjing
Expand
  • 1. Institute of Information Management, Fuzhou University, Fuzhou 350108;
    2. Library of Fujian Agriculture and Forestry University, Fuzhou 350002

Received date: 2017-08-26

  Revised date: 2018-01-07

  Online published: 2018-04-05

Abstract

[Purpose/significance] Facing the trend of fragmentation of university information, this paper puts forward the integration process of fragmented university information, and applies the random forest algorithm to construct the feature selection model of information-fragmented integration in universities.[Method/process] This paper represents the development, research status and existing problems of university information integration. Furthermore, in this paper, we elaborate the principles and advantages of the random forest algorithm, and use it to the feature selection model of information fragmented integration process in universities. Finally, we validate the model by using the example of identifying the students in the need of financial help.[Result/conclusion] Random forest algorithm shows higher accuracy and validity in the selection of features for integrating university information and therefore provides a new way for the integration of fragmented university information.

Cite this article

Zhang Wende , Cheng Han , Liu Tian , Zeng Jinjing . Application of Random Forest in the Fragmented Integration of University Information[J]. Library and Information Service, 2018 , 62(7) : 119 -124 . DOI: 10.13266/j.issn.0252-3116.2018.07.014

References

[1] 马文峰,杜小勇. 基于数据的资源整合[J]. 情报资料工作, 2007(1):41-45.
[2] 马文峰,杜小勇,胡宁. 基于信息的资源整合[J]. 情报资料工作, 2007(1):46-50,70.
[3] 马文峰, 杜小勇,卢晓惠. 基于知识的资源整合[J]. 情报资料工作, 2007(1):51-56.
[4] 常桐善.数据挖掘技术在美国院校研究中的应用[J]. 复旦教育论坛, 2009(2):72-79.
[5] 廖凤露,周庆. EDM用于研究生就业能力的预测[J]. 教育教学论坛, 2017(33):65-66.
[6] 施佺,钱源,孙玲. 基于教育数据挖掘的网络学习过程监管研究[J]. 现代教育技术, 2016,26(6):87-93.
[7] 舒忠梅,徐晓东. 学习分析视域下的大学生满意度教育数据挖掘及分析[J]. 电化教育研究, 2014(5):39-44.
[8] 何世明,沈军. 基于BP神经网络的网上学习评价方法[J]. 微机发展, 2004,14(12):26-29.
[9] 刘美玲,李熹,李永胜. 数据挖掘技术在高校教学与管理中的应用[J]. 计算机工程与设计, 2010,31(5):1130-1133.
[10] 李恒贝, 查贵庭, 毛莉菊,等. 基于碎片化服务的高校信息化架构及实践[J]. 中国教育信息化, 2016(19):11-13.
[11] 方匡南, 吴见彬, 朱建平,等. 随机森林方法研究综述[J]. 统计与信息论坛, 2011, 26(3):32-38.
[12] BREIMAN L. Random forests[J]. Machine learning, 2001, 45(1):5-32.
[13] 周志华.机器学习[M].北京:清华大学出版社,2016:178-180.
[14] LIU Y, CHEN M.Random forest method and applicationin stream big data systems[J].Journal of Northwestern Poly-technical University, 2015, 33(6):1055-1061.
[15] 吴辰文, 王伟, 李长生,等. 一种结合随机森林和邻域粗糙集的特征选择方法[J]. 小型微型计算机系统, 2017, 38(6):1358-1362.
[16] YAO D, YANG J, ZHANG X.Feature selection algorithm based on random forest[J]. Journal of Jilin University (Engineering and Technology Edition), 2014,44(1):137-141.
[17] 杨凯,侯艳,李康. 随机森林变量重要性评分及其研究进展[EB/OL].[2017-08-25]. http://www.paper.edu.cn/html/releasepaper/2015/07/212/.
[18] ARCHER K J, KIMES R V. Empirical characterization of random forest variable importance measures[J]. Computational statistics & data analysis, 2008, 52(4):2249-2260.
[19] 王晓杰,孙仁诚,邵峰晶. 基于随机森林的用户对在线课程的放弃预测[J].青岛大学学报(工程技术版), 2016,31(4):17-22.
[20] 张晓凤,侯艳,李康, 基于AUC统计量的随机森林变量重要性评分的研究[J]. 中国卫生统计, 2016,33(3):537-540,542.
[21] JANUTZA S, STROBL C, BOULESTEIX A L. An AUC-based permutation variable importance measure for random forests[J]. BMC bioinformatics, 2013, 14(3):433-440.
[22] 王宇燕, 王杜娟, 王延章,等. 改进随机森林的集成分类方法预测结直肠癌存活性[J]. 管理科学, 2017, 30(1):95-106.
[23] 刘海苑. 基于数据挖掘的贫困生认定辅助系统的研究[J].电脑知识与技术, 2015, 11(24):5-7.
Outlines

/