Subjects: Library Science,Information Science >> Information Science submitted time 2023-04-01 Cooperative journals: 《图书情报工作》
Abstract: [Purpose/significance] To solve the problem that the identification effect of the target domain information is difficult to improve because of not enough samples, we will transfer the results of unsupervised learning from big data to the feature space of the target domain. [Method/process] Used the RoBERTa model, which was pre-trained with Chinese Wikipedia and other data, for transfer learning. After mapping the learning results to the target domain, DPCNN was used to aggregate and condense it, and then fine-tuned the model with part of the labeled data to complete the accurate recognition of domain information. [Result/conclusion] Compared with the model without transfer learning and the classic model TextCNN in 10 fields, the model in this paper is much better than the comparison models. After average, the precision is increased by 4.15% and 3.43%, the recall is increased by 4.55% and 3.44%, and the F1 score is increased by 4.52% and 3.44%. It shows that knowledge transfer using big data can effectively improve the information recognition effect in the target field.