Library and Information Service >
Technical Topic Analysis in Patents: SAO-based LDA Modeling
Received date: 2016-10-07
Revised date: 2016-12-12
Online published: 2017-02-05
[Purpose/significance] There are three problems we have to fix in performing technical topic analysis:difficult to classify topic; homonyms of words and terms; difficult to identify technical problem and solution.[Method/process] In this paper, we first extract SAO structures from patents, and then we explore and identify the problem & solution patterns embodied in SAO structures. At last, SAO-Based LDA model is built based on the "bag of P&S" assumption and it performs technical topic analysis at concept level.[Result/conclusion] The case study shows that the proposed method can effectively identify topics' distribution, and has great advantages in topic identification and word disambiguation compared with traditional LDA model.
Key words: SAO structures; technical topic analysis; LDA model; P&S pattern; graphene
Yang Chao , Zhu Donghua , Wang Xuefeng , Zhu Fujin , Heng Xiaofan . Technical Topic Analysis in Patents: SAO-based LDA Modeling[J]. Library and Information Service, 2017 , 61(3) : 86 -96 . DOI: 10.13266/j.issn.0252-3116.2017.03.012
[1] ZHANG Y, ZHANG G, CHEN H, et al. Topic analysis and forecasting for science, technology and innovation:methodology with a case study focusing on big data research[J]. Technological forecasting and social change, 2016, 105:179-191.
[2] YU Z G, JOHNSON T R, KAVULURU R. Phrase based topic modeling for semantic information processing in biomedicine[C]//201312th International conference on machine learning and applications. New Jersey:IEEE, 2013:440-445.
[3] PIEPENBRINK A, NURMAMMADOV E. Topics in the literature of transition economies and emerging markets[J].Scientometrics,2015,102(3):2107-2130.
[4] LV P H, WANG G-F, WAN Y, et al. Bibliometric trend analysis on global graphene research[J]. Scientometrics, 2011, 88(2):399-419.
[5] AMJAD T, DING Y, DAUD A, et al. Topic-based heterogeneous rank[J]. Scientometrics, 2015, 104(1):313-334.
[6] CALLON M, COURTIAL J P, TURNER W A, et al. From translations to problematic networks-an introduction to co-word analysis[J]. Social science information, 1983, 22(2):191-235.
[7] SONG M, KIM S Y. Detecting the knowledge structure of bioinformatics by mining full-text collections[J]. Scientometrics, 2013, 96(1):183-201.
[8] ZHANG J, WOLFRAM D, WANG P L, et al. Visualization of health-subject analysis based on query term co-occurrences[J]. Journal of the American Society for Information Science and Technology, 2008, 59(12):1933-1947.
[9] CALLON M, COURTIAL J P, LAVILLE F. Co-word analysis as a tool for describing the network of interactions between basic and technological research:the case of polymer chemsitry[J]. Scientometrics, 1991, 22(1):155-205.
[10] HE Q. Knowledge discovery through co-word analysis[J]. Library trends, 1999, 48(1):133-159.
[11] YAN B N, LEE T S, LEE T P. Analysis of research papers on E-commerce (2000-2013):based on a text mining approach[J]. Scientometrics, 2015, 105(1):403-417.
[12] RAVIKUMAR S, AGRAHARI A, SINGH S N. Mapping the intellectual structure of scientometrics:a co-word analysis of the journal Scientometrics (2005-2010)[J]. Scientometrics, 2015, 102(1):929-955.
[13] NATALE F, FIORE G, HOFHERR J. Mapping the research on aquaculture. A bibliometric analysis of aquaculture literature[J]. Scientometrics, 2012, 90(3):983-999.
[14] LEONE R P, ROBINSON L M, BRAGGE J, et al. A citation and profiling analysis of pricing research from 1980 to 2010[J]. Journal of business research, 2012, 65(7):1010-1024.
[15] LEE H, KIM C, CHO H, et al. An ANP-based technology network for identification of core technologies:a case of telecommunication technologies[J]. Expert systems with applications, 2009, 36(1):894-908.
[16] ERDI P, MAKOVI K, SOMOGYVARI Z, et al. Prediction of emerging technologies based on analysis of the US patent citation network[J]. Scientometrics, 2013, 95(1):225-242.
[17] KAJIKAWA Y, YOSHIKAWA J, TAKEDA Y, et al. Tracking emerging technologies in energy research:toward a roadmap for sustainable energy[J]. Technological forecasting and social change, 2008, 75(6):771-782.
[18] CHO T S, SHIH H Y. Patent citation network analysis of core and emerging technologies in Taiwan:1997-2008[J]. Scientometrics, 2011, 89(3):795-811.
[19] KIM E, CHO Y, KIM W. Dynamic patterns of technological convergence in printed electronics technologies:patent citation network[J]. Scientometrics, 2014, 98(2):975-998.
[20] 吴菲菲, 张辉, 黄鲁成,等. 基于专利引用网络度分布研究技术跨领域应用[J]. 科学学研究, 2015(10):1456-1463.
[21] PHAAL R, FARRUKH C J P, PROBERT D R. Technology roadmapping-a planning framework for evolution and revolution[J]. Technological forecasting & social change, 2004, 71(1/2):5-26.
[22] ZHANG Y, GUO Y, WANG X F, et al. A hybrid visualisation model for technology roadmapping:bibliometrics, qualitative methodology and empirical study[J]. Technology Analysis & strategic management, 2013, 25(6):707-724.
[23] SCHWERDTNER W, SIEBERT R, BUSSE M, et al. Regional open innovation roadmapping:a new framework for innovation-based regional development[J]. Sustainability, 2015, 7(3):2301-2321.
[24] MCDOWALL W. Technology roadmaps for transition management:the case of hydrogen energy[J]. Technological forecasting and social change, 2012, 79(3):530-542.
[25] LEE C, SONG B, PARK Y. An instrument for scenario-based technology roadmapping:how to assess the impacts of future changes on organisational plans[J]. Technological forecasting and social change, 2015, 90, PartA:285-301.
[26] LEE J H, PHAAL R, LEE C. An empirical analysis of the determinants of technology roadmap utilization[J]. R & D management, 2011, 41(5):485-508.
[27] COWAN K R. A New roadmapping technique for creatively managing the emerging smart grid[J]. Creativity and innovation management, 2013, 22(1):67-83.
[28] AMADI-ECHENDU J, LEPHAUPHAU O, MASWANGANYI M, et al. Case studies of technology roadmapping in mining[J]. Journal of engineering and technology management, 2011, 28(1/2):23-32.
[29] KOSTOFF R N, BOYLAN R, SIMONS G R. Disruptive technology roadmaps[J]. Technological forecasting and social change, 2004, 71(1-2):141-159.
[30] DEERWESTER S. Indexing by latent semantic analysis[J]. Journal of the Association for Information Science and Technology, 1990, 41(6):391-407.
[31] HOFMANN T. Probabilistic latent semantic analysis[C]//Kathryn B, Henri P. proceedings of the Fifteenth conference on Uncertainty in Artificial Intelligence. San Francisco:Morgan Kaufmann Publishers Inc, 1999:289-296.
[32] BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation[J]. Journal of machine learning research, 2003, 3(4/5):993-1022.
[33] 范云满, 马建霞. 利用LDA的领域新兴主题探测技术综述[J]. 现代图书情报技术, 2012(12):58-65.
[34] WANG B, LIU S, DING K, et al. Patent content analysis method based on LDA topic model[J]. Science research management, 2015,3:111-117.
[35] YAU C-K, PORTER A, NEWMAN N, et al. Clustering scientific documents with topic modeling[J]. Scientometrics, 2014, 100(3):767-786.
[36] HU Z, FANG S, LIANG T. Empirical study of constructing a knowledge organization system of patent documents using topic modeling[J]. Scientometrics, 2014, 100(3):787-799.
[37] CHEN H, ZHANG G, LU J, et al. A fuzzy approach for measuring development of topics in patents using Latent Dirichlet Allocation[C]//2015 IEEE international conference on fuzzy systems. New Jersey:IEEE, 2015:1-7.
[38] BATTISTI F, FERRARA A, SALINI S. A decade of research in statistics:a topic model approach[J]. Scientometrics, 2015, 103(2):413-433.
[39] LEE H, KWAK J, SONG M, et al. Coherence analysis of research and education using topic modeling[J]. Scientometrics, 2015, 102(2):1119-1137.
[40] BLEI D M, LAFFERTY J D. Dynamic topic models[M]//Proceedings of the 23rd international conference on Machine learning. Pittsburgh:ACM, 2006:113-120.
[41] WANG C, BLEI D M, HECKERMAN D. Continuous time dynamic topic models[C]//David M, Petri M. Proceedings of the uncertainty in artificial intelligence. Finnland:Omnipress, 2012:579-586.
[42] WANG X, MCCALLUM A. Topics over time:a non-markov continuous-time model of topical trends[C]//Han J. Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM, 2006:424-433.
[43] BLEI D M, LAFFERTY J D. Correlated topic models[C]//Bernhard S, John P, Thomas H. Proceeding of advances in neural information processing systems. Massachusetts:MIT Press, 2006:147-154.
[44] TEH Y W, JORDAN M I, BEAL M J, et al. Hierarchical Dirichlet Processes[J]. Journal of the American Statistical Association, 2006, 101(476):1566-1581.
[45] LI W, Andrew M. Pachinko allocation:dag-structured mixture models of topic correlations[C]//Proceedings of the International Conference on Machine Learning. New Jersey:IEEE Computer Society Press, 2006:577-584.
[46] ROSEN-ZVI M, GRIFFITHS T, STEYVERS M, et al. The author-topic model for authors and documents[C]//Pproceedings of the Conference on Uncertainty in Artificial Intelligence. Virginia:AUAI Press, 2004:487-494.
[47] 王萍. 基于概率主题模型的文献知识挖掘[J]. 情报学报, 2011, 30(6):583-590.
[48] WANG B, LIU S, DING K, et al. Identifying technological topics and institution-topic distribution probability for patent competitive intelligence analysis:a case study in LTE technology[J]. Scientometrics, 2014, 101(1):685-704.
[49] WALLACH H M. Topic modeling:beyond bag-of-words[C]//proceedings of the 23rd international conference on machine learning. New York:ACM, 2006:977-984.
[50] WANG X R, MCCALLUM A, WEI X. Topical n-grams:Phrase and topic discovery, with an application to information retrieval[M]//RAMAKRISHNAN N, ZAIANE O R, SHI Y, et al. Icdm 2007:Proceedings of the Seventh Ieee International Conference on Data Mining. Los Alamitos:Ieee Computer Soc,2007:697-702.
[51] GUDIVADA R C, QU X Y A, CHEN J, et al. Identifying disease-causal genes using Semantic Web-based representation of integrated genomic and phenomic knowledge[J]. Journal of biomedical informatics, 2008, 41(5):717-729.
[52] AUER S, LEHMANN J. Creating knowledge out of interlinked data[J]. Semant Web, 2010, 1(1-2):97-104.
[53] ZHAO Y, GAO S, GALLINARI P, et al. Knowledge base completion by learning pairwise-interaction differentiated embeddings[J]. Data mining and knowledge discovery, 2015, 29(5):1486-1504.
[54] CASCINI G, FANTECHI A, SPINICCI E. Natural language processing of patents and technical documentation[M]//MARINAI S, DENGEL A. Document Analysis Systems VI. Berlin:Springer Berlin Heidelberg,2004:508-520.
[55] MOEHRLE M G, WALTER L, GERITZ A, et al. Patent-based inventor profiles as a basis for human resource decisions in research and development[J]. R & D management, 2005, 35(5):513-524.
[56] BERGMANN I, BUTZKE D, WALTER L, et al. Evaluating the risk of patent infringement by means of semantic patent analysis:the case of DNA chips[J]. R&D management, 2008, 38(5):550-562.
[57] ZHANG Y, ZHOU X, PORTER A L, et al. Triple Helix innovation in China's dye-sensitized solar cell industry:hybrid methods with semantic TRIZ and technology roadmapping[J]. Scientometrics, 2014, 99(1):55-75.
[58] VERBITSKY M. Semantic TRIZ[M]. Boston:Invention Machine Corporation, 2004.
[59] ZHANG Y, ZHOU X, PORTER A L, et al. How to combine term clumping and technology roadmapping for newly emerging science & technology competitive intelligence:"problem & solution" pattern based semantic TRIZ tool and case study[J]. Scientometrics, 2014, 101(2):1375-1389.
[60] CUNNINGHAM H, TABLAN V, ROBERTS A, et al. Getting More Out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics[J/OL].[2016-09-29].https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3567135/.
[61] ZHANG Y, PORTER A L, HU Z, et al. "Term clumping" for technical intelligence:a case study on dye-sensitized solar cells[J]. Technological forecasting and social change, 2014, 85:26-39.
[62] KIM Y, TIAN Y, JEONG Y, et al. Automatic discovery of technology trends from patent text[M]. 2009 ACM Symposium on Applied Computing. Honolulu, Hawaii:ACM,2009:1480-1487.
[63] 胡正银, 方曙, 张娴, 等. 个性化语义TRIZ构建研究[J]. 图书情报工作, 2015, 59(7):123-131.
[64] 胡正银. 基于个性化语义TRIZ的专利技术挖掘研究[D].北京:中国科学院大学, 2015.
[65] 胡正银, 方曙, 文奕, 等. 面向TRIZ的专利自动分类研究[J]. 现代图书情报技术, 2015, 31(1):66-74.
[66] CHOI S, KANG D, LIM J, et al. A fact-oriented ontological approach to SAO-based function modeling of patents for implementing Function-based Technology Database[J]. Expert systems with applications, 2012, 39(10):9129-9140.
[67] CHOI S, PARK H, KANG D, et al. An SAO-based text mining approach to building a technology tree for technology planning[J]. Expert systems with applications, 2012, 39(13):11443-11455.
/
〈 | 〉 |