[Purpose/significance] Institution names are numerous and complicated. The normalization of institution names brings the authoritative name and the informal ones(both at different times and in different ways of expression) of the same institution together,enhancing comprehensiveness and accuracy of searches,promoting interoperability with other systems, and thus realizing resource sharing.[Method/process] Based on the analysis of institution names' characteristic and K-means algorithm, this paper utilizes the edit distance similarity algorithm to achieve name normalization of institution names. Then uses TF-IDF to calculate the weight of each item, around the cluster center to normalize institution name based on K-means algorithm and gives the unique identifier to every cluster.[Result/conclusion] It achieves name normalization of the same institution name in different forms. And it improves the precision of institution name cluster, but the choice of K value and distance measurement method still needs to be optimized.
