

  • 石崇德 ,
  • 乔晓东 ,
  • 王惠临 ,
  • 屈鹏
  • 中国科学技术信息研究所

收稿日期: 2014-07-24

  修回日期: 2014-09-01

  网络出版日期: 2014-10-05



Research on Domain Adaptation Technology of Chinese Science and Technology Literatures Segmentation

  • Shi Chongde ,
  • Qiao Xiaodong ,
  • Wang Huilin ,
  • Qu Peng
  • Institute of Scientific and Technical Information of China, Beijing 100038

Received date: 2014-07-24

  Revised date: 2014-09-01

  Online published: 2014-10-05




石崇德 , 乔晓东 , 王惠临 , 屈鹏 . 中文科技文献切分的领域适应技术研究[J]. 图书情报工作, 2014 , 58(19) : 13 -18 . DOI: 10.13266/j.issn.0252-3116.2014.19.002


Segmentation of science and technology (S&T) literature is a basic step in S&T documents information processing. This paper takes biomedical literatures as the instances and studies domain adaptation technology in segmentation of S&T literatures. Then it takes some methods such as dictionary features, domain character features, sub-word tagging and low quality in-domain training corpus based on dictionary-based segmentation to adapt Chinese segmentation method based on sequence labeling in journalism filed to S&T filed and achieves the significant improvement. It finds that how to exploit domain specific features with domain knowledge plays an important role in improving the segmentation quality of S&T literatures.


