Subjects: Library Science,Information Science >> Library Science submitted time 2023-08-27 Cooperative journals: 《图书情报工作》
Abstract: [Purpose/significance] Aiming at the problem of analysis patent topic in terms of word which causes topics are difficult to explain in the patent topic analysis, this paper proposes a patent topic discovery model integrated with term knowledge.[Method/process]The proposed model firstly introduces the class entropy and effectively recognizes the terms in the patent literature. Then, the Generalized Pólya Urn model is used to increase the probability of the semantic similarity terms assigned to the same topic, in order to alleviate the data sparsity problem brought by the term as the basic topic model analysis unit.[Result/conclusion]The experimental results show that the proposed model contains the term information to improve the quality of the topic generation, making the topic representation more readable and topic discriminative.
Subjects: Library Science,Information Science >> Library Science submitted time 2023-08-26 Cooperative journals: 《图书情报工作》
Abstract: [Purpose/significance] Because the research that automatic selection of domain-specific stopwords in topic model of patent text is insufficient, this paper proposes a new method of automatic selection of domain-specific stopwords, for patent text topic model analysis, in order to improve the differentiation and modeling quality of the patent topic model. [Method/process] In essence, domain-specific stopwords are less important words which contain relatively less information,such words are poorly differentiated in different kinds of patent. Therefore, this paper introduced the auxiliary multi-category patent text dataset and measured the distributions of words through the category entropy. Then, according to the category entropy of words. It chose some words that have the maximum category entropy as the domain-specific stopwords. [Result/conclusion] Experimental results show the feasibility and validity of the method proposed in this paper, which can improve the differentiation and quality of topic model for patent text analysis.
Subjects: Library Science,Information Science >> Library Science submitted time 2023-07-26 Cooperative journals: 《图书情报工作》
Abstract: [Purpose/significance] Aiming at the difficulties in making different pattern matching rules for different data sets and the low accuracy of Chinese patent term extraction, this paper proposes a selection method of Chinese patent candidate term based on dependency syntax parsing to improve the accuracy of Chinese patent term extraction.[Method/process] The method mainly includes three main steps:dependency syntax parsing, pruning and dependency subtree generation. Firstly, dependency syntax analysis was carried out on the Chinese patent text, from which dependency tree were obtained. Then, the dependency subtrees were generated by removing dependency relations which do not meet requirements. At last, the continuous word strings were selected as candidate terms to extract Chinese patent terms.[Result/conclusion] The experimental results show that compared with the existing related methods, the proposed method based on dependency syntax parsing can effectively improve the accuracy of Chinese patent term extraction.
Subjects: Library Science,Information Science >> Library Science submitted time 2023-07-26 Cooperative journals: 《图书情报工作》
Abstract: [Purpose/significance] Aiming at the problem that the current manual skill information extraction from the online recruitment post is not suitable for the analysis of large data volume information, this paper proposes an automatic skill information extraction for a large number of online recruitment texts.[Method/process] According to the characteristics of online recruitment texts, the candidate skills are analyzed by dependency syntax analysis, then the domain relevance indicators are used to measure candidate skills, and they are integrated into the traditional terminology extraction method to form a method for automatic extraction of skill information from online recruitment texts.[Result/conclusion] Experiments show that the proposed method can extract skill information automatically, quickly and accurately from the mass online recruitment texts.
Subjects: Library Science,Information Science >> Library Science submitted time 2023-07-26 Cooperative journals: 《图书情报工作》
Abstract: [Purpose/significance] In order to help college teachers and students make full use of web recruitment information, this paper proposes a curriculum knowledge model and its automatic construction method based on large data web recruitment text mining.[Method/process] This paper proposes a three-level curriculum knowledge model including "post-curriculum-knowledge point", which uses natural language text mining technology to realize the automatic construction, and verifies the construction process through experiments.[Result/conclusion] The experimental results show that the proposed model and method are highly feasible and effective, and provide teaching and learning reference for colleges and students.
Subjects: Library Science,Information Science >> Information Science submitted time 2023-04-01 Cooperative journals: 《图书情报工作》
Abstract: [Purpose/significance] In order to make up for the shortcomings of the patent text collection itself to limit the effect of patent term extraction, this paper proposes to use the rich keyword knowledge to obtain effective features outside the patent text to improve the patent term extraction effect. [Method/process] According to the keyword knowledge of related papers, two kinds of characteristic, degree of domain relevance and degree of head & tail are proposed to measure the possibility that candidate terms become terminology, and these characteristics are incorporated into the traditional method of patent term extraction. [Result/conclusion] The experimental results show that the degree of domain relevance and the degree of head & tail of the candidate terms obtained by using the keyword information of the papers make the method of combining the keyword knowledge of the papers significantly higher than the accuracy of the traditional term extraction method.
Subjects: Library Science,Information Science >> Information Science submitted time 2023-04-01 Cooperative journals: 《图书情报工作》
Abstract: [Purpose/significance] Aiming at the problems existing in the evaluation of patent keyword extraction algorithm, which mainly uses the extracted keywords to match the keywords manually labeled by experts, an evaluation model of patent keyword extraction algorithm based on information gain and similarity is proposed.[Method/process] The proposed evaluation model evaluated the accuracy of the patent keyword extraction algorithm from intrinsic and extrinsic levels. The intrinsic evaluation model measured the information gain of each keyword extracted by the evaluation algorithm to evaluate the novelty and creativity of the extracted keywords. The extrinsic evaluation model used the keyword set extracted by the evaluation algorithm to represent the patents, and measured the effectiveness of the keywords extracted by the algorithm to describe the patent topic by calculating the similarity of relevant patents.[Result/conclusion] Through the validation experiment of the evaluation model and the empirical research on the application of the evaluation model, the results show that the evaluation model based on information gain and similarity is feasible and effective.