[目的/意义] 针对关键词共现方法识别领域研究热点过程中数据清洗进行理论研究与探索,以辅助科研工作者准确识别领域研究热点。[方法/过程] 在文献调研的基础上,阐述数据清洗的定义和对象,并分析脏数据产生的原因和影响,进而制定数据清洗的步骤和方案,并采用实证研究方法对数据清洗的效果和方案的可行性进行验证。[结果/结论] 研究结果表明该数据清洗方案能够提高研究热点识别的准确性,从而证明了该方案的可行性。
[Purpose/significance] In order to efficiently aid researchers to identify research hotpot, this paper aims to explore theoretical basis and practical guidance of data cleaning in the process of identifying research hotpots based on keywords co-occurrence. [Method/process] On the basis of literature research, it firstly defines the conception and the objects of data cleaning. Then it analyses the reasons and influences of dirty data. Finally, it proposes the procedures of data cleaning, which is verified by empirical research method. [Result/conclusion] The result indicates that the procedures of data cleaning which are proved to be feasible can increase the accuracy of identification of research hotpot.
