ChinaXiv.org 中国科学院科技论文预发布平台

Submitted Date

2023
7

Subjects

Authors

Institution

result total 7.

Hide Summary

Hits

Date

Downloads

Your conditions: 俞琰

1. ChinaXiv:202308.00506
Download

Patent Topic Discovery Method Integrated with Term Knowledge

Subjects: Library Science，Information Science >> Library Science submitted time 2023-08-27 Cooperative journals: 《图书情报工作》

Yu Yan Zhao Naixuan

Abstract： [Purpose/significance] Aiming at the problem of analysis patent topic in terms of word which causes topics are difficult to explain in the patent topic analysis, this paper proposes a patent topic discovery model integrated with term knowledge.[Method/process]The proposed model firstly introduces the class entropy and effectively recognizes the terms in the patent literature. Then, the Generalized Pólya Urn model is used to increase the probability of the semantic similarity terms assigned to the same topic, in order to alleviate the data sparsity problem brought by the term as the basic topic model analysis unit.[Result/conclusion]The experimental results show that the proposed model contains the term information to improve the quality of the topic generation, making the topic representation more readable and topic discriminative.

Hits 415 Downloads 124 Comment
2. ChinaXiv:202308.00273
Download

Automatic Selection of Domain-Specific Stopwords in Topic Model of Patent Text

Subjects: Library Science，Information Science >> Library Science submitted time 2023-08-26 Cooperative journals: 《图书情报工作》

Yu Yan Zhao Nianxuan

Abstract： [Purpose/significance] Because the research that automatic selection of domain-specific stopwords in topic model of patent text is insufficient, this paper proposes a new method of automatic selection of domain-specific stopwords, for patent text topic model analysis, in order to improve the differentiation and modeling quality of the patent topic model. [Method/process] In essence, domain-specific stopwords are less important words which contain relatively less information,such words are poorly differentiated in different kinds of patent. Therefore, this paper introduced the auxiliary multi-category patent text dataset and measured the distributions of words through the category entropy. Then, according to the category entropy of words. It chose some words that have the maximum category entropy as the domain-specific stopwords. [Result/conclusion] Experimental results show the feasibility and validity of the method proposed in this paper, which can improve the differentiation and quality of topic model for patent text analysis.

Hits 336 Downloads 117 Comment
3. ChinaXiv:202307.00375
Download

Research on the Selection of Chinese Patent Candidate Term Based on Dependency Syntax Parsing

Subjects: Library Science，Information Science >> Library Science submitted time 2023-07-26 Cooperative journals: 《图书情报工作》

Yu Yan Chen lei Jiang Jinde Zhao Naixuan

Abstract： [Purpose/significance] Aiming at the difficulties in making different pattern matching rules for different data sets and the low accuracy of Chinese patent term extraction, this paper proposes a selection method of Chinese patent candidate term based on dependency syntax parsing to improve the accuracy of Chinese patent term extraction.[Method/process] The method mainly includes three main steps:dependency syntax parsing, pruning and dependency subtree generation. Firstly, dependency syntax analysis was carried out on the Chinese patent text, from which dependency tree were obtained. Then, the dependency subtrees were generated by removing dependency relations which do not meet requirements. At last, the continuous word strings were selected as candidate terms to extract Chinese patent terms.[Result/conclusion] The experimental results show that compared with the existing related methods, the proposed method based on dependency syntax parsing can effectively improve the accuracy of Chinese patent term extraction.

Hits 227 Downloads 115 Comment
4. ChinaXiv:202307.00466
Download

Research on Skill Information Automatic Extraction from Online Recruitment Texts

Subjects: Library Science，Information Science >> Library Science submitted time 2023-07-26 Cooperative journals: 《图书情报工作》

Yu Yan Chen Lei Jiang Jinde Zhao Naixuan

Abstract： [Purpose/significance] Aiming at the problem that the current manual skill information extraction from the online recruitment post is not suitable for the analysis of large data volume information, this paper proposes an automatic skill information extraction for a large number of online recruitment texts.[Method/process] According to the characteristics of online recruitment texts, the candidate skills are analyzed by dependency syntax analysis, then the domain relevance indicators are used to measure candidate skills, and they are integrated into the traditional terminology extraction method to form a method for automatic extraction of skill information from online recruitment texts.[Result/conclusion] Experiments show that the proposed method can extract skill information automatically, quickly and accurately from the mass online recruitment texts.

Hits 215 Downloads 98 Comment
5. ChinaXiv:202307.00504
Download

Research on Automatic Construction of Curriculum Knowledge Model Based on Web Recruitment Text Mining

Subjects: Library Science，Information Science >> Library Science submitted time 2023-07-26 Cooperative journals: 《图书情报工作》

Yu Yan Chen Lei Zhao Naixuan

Abstract： [Purpose/significance] In order to help college teachers and students make full use of web recruitment information, this paper proposes a curriculum knowledge model and its automatic construction method based on large data web recruitment text mining.[Method/process] This paper proposes a three-level curriculum knowledge model including "post-curriculum-knowledge point", which uses natural language text mining technology to realize the automatic construction, and verifies the construction process through experiments.[Result/conclusion] The experimental results show that the proposed model and method are highly feasible and effective, and provide teaching and learning reference for colleges and students.

Hits 255 Downloads 130 Comment
6. ChinaXiv:202304.00168
Download

Patent Term Extraction by Integrating Keyword Knowledge From Paper

Subjects: Library Science，Information Science >> Information Science submitted time 2023-04-01 Cooperative journals: 《图书情报工作》

Yu Yan Chen Lei Jiang Jinde Zhao Naixuan

Abstract： [Purpose/significance] In order to make up for the shortcomings of the patent text collection itself to limit the effect of patent term extraction, this paper proposes to use the rich keyword knowledge to obtain effective features outside the patent text to improve the patent term extraction effect. [Method/process] According to the keyword knowledge of related papers, two kinds of characteristic, degree of domain relevance and degree of head & tail are proposed to measure the possibility that candidate terms become terminology, and these characteristics are incorporated into the traditional method of patent term extraction. [Result/conclusion] The experimental results show that the degree of domain relevance and the degree of head & tail of the candidate terms obtained by using the keyword information of the papers make the method of combining the keyword knowledge of the papers significantly higher than the accuracy of the traditional term extraction method.

Hits 161 Downloads 97 Comment
7. ChinaXiv:202304.00818
Download

Research on the Evaluation Method of Patent Keyword Extraction Algorithm Based on Information Gain and Similarity

Subjects: Library Science，Information Science >> Information Science submitted time 2023-04-01 Cooperative journals: 《图书情报工作》

Yu Yan Ju Peng Shang Mingjie

Abstract： [Purpose/significance] Aiming at the problems existing in the evaluation of patent keyword extraction algorithm, which mainly uses the extracted keywords to match the keywords manually labeled by experts, an evaluation model of patent keyword extraction algorithm based on information gain and similarity is proposed.[Method/process] The proposed evaluation model evaluated the accuracy of the patent keyword extraction algorithm from intrinsic and extrinsic levels. The intrinsic evaluation model measured the information gain of each keyword extracted by the evaluation algorithm to evaluate the novelty and creativity of the extracted keywords. The extrinsic evaluation model used the keyword set extracted by the evaluation algorithm to represent the patents, and measured the effectiveness of the keywords extracted by the algorithm to describe the patent topic by calculating the similarity of relevant patents.[Result/conclusion] Through the validation experiment of the evaluation model and the empirical research on the application of the evaluation model, the results show that the evaluation model based on information gain and similarity is feasible and effective.

Hits 176 Downloads 93 Comment

Patent Topic Discovery Method Integrated with Term Knowledge

Automatic Selection of Domain-Specific Stopwords in Topic Model of Patent Text

Research on the Selection of Chinese Patent Candidate Term Based on Dependency Syntax Parsing

Research on Skill Information Automatic Extraction from Online Recruitment Texts

Research on Automatic Construction of Curriculum Knowledge Model Based on Web Recruitment Text Mining

Patent Term Extraction by Integrating Keyword Knowledge From Paper

Research on the Evaluation Method of Patent Keyword Extraction Algorithm Based on Information Gain and Similarity