KNOWLEDGE ORGANIZATION
Hao Jie, Mo Zhiqiang, Sun Haixia, Chen Zhenli, Li Jiao
[Purpose/Significance] This study aims to extract structured item information from free-text clinical scales using ChatGPT without annotations, which efficiently advances the structuring and intellectualization of medical scale resources. [Method/Process] A framework for item information extraction was defined, including eight attribute types and considering the structural differences in clinical scale measurement concepts. A dataset was constructed by collecting 59 commonly used clinical psychometric assessment scale documents. Zero-shot prompt templates were designed based on measurement concept levels, and experiments were conducted using the official ChatGPT-3.5 and ChatGPT-4 interfaces. The extraction performance and possible influencing factors of different ChatGPT versions in processing different clinical scale texts were analyzed from multiple perspectives. [Result/Conclusion] The extraction performance for scale item sources is the best, with Micro-F1 and Macro-F1 scores of at least 98.90% and 97.83%, respectively. This is followed by response options, instructional guidance, and scoring rules, with item numbers and instructions showing moderate performance. Clinical explanations have the lowest performance, with Micro-F1 and Macro-F1 scores of 47.73% and 45.51%, respectively. ChatGPT-4 performs better overall, but the recall rate of some attributes is weaker than that of ChatGPT-3.5. The increase in measurement concept levels, dimensionality, number of items, and text length are found to reduce model performance. In summary, ChatGPT can efficiently assist in the structuring of medical scale resources, especially when dealing with simple scales.