ChinaXiv.org 中国科学院科技论文预发布平台

Submitted Date

2023
6
2017
3

Subjects

Authors

Institution

result total 9.

Hide Summary

Hits

Date

Downloads

Your conditions: 白如江

1. ChinaXiv:202308.00528
Download

An Analysis of Zero-cited and Highly-cited Papers in the Perspective of Research Topics: A Case Study of Environmental Science

Subjects: Library Science，Information Science >> Library Science submitted time 2023-08-27 Cooperative journals: 《图书情报工作》

Pan Fei Wang Xiaoyue Bai Rujiang Zhou Yanting

Abstract： [Purpose/significance] This paper analyzes zero-cited papers in the field of environmental science from the perspective of the subject, to find the differences in the content of articles and external indicators between zero-quoted papers and high-cited papers and reveal the reason for the existence of zero-cited papers.[Method/process] Firstly, the PLDA model was used to identify topics that from 260 high-cited papers and 907 zero-cited papers in the domestic environmental sciences database from the Web of Science database. Then the relevance of the topics was found through topic similarity calculation. With the topic popularity used as an internal indicator, the time of publication and the journals used as external evaluation indicators, a comparison analysis of zero-cited papers and high-cited papers was made by combining topical of the papers with external indicators.[Result/conclusion] The experimental results show that under the same research topic, the influence of the journal is the main reason that influences the citation of the paper; under different topics, the topic is the main reason leading to zero-cited papers.

Hits 554 Downloads 123 Comment 0
2. ChinaXiv:202308.00577
Download

A Method to Evaluate Academic Papers' Innovation Based on the Research Theme Comparing

Subjects: Library Science，Information Science >> Library Science submitted time 2023-08-27 Cooperative journals: 《图书情报工作》

Yang Jing Wang Fang Bai Rujiang

Abstract： [Purpose/significance] Innovation is the essence requirement of academic papers, and how to effectively evaluate the innovation of academic papers has been concerned about by domestic and foreign experts and scholars. The development of information technology makes it possible to automatically evaluate papers' innovation based on the content using computers. [Method/process] This paper presents a method to evaluate papers' innovation based on the research theme comparing. Firstly, keygraph algorithms is used to extract keywords which represent papers' theme.Then, the similarity of the research theme and the scientific research front theme is calculated.Lastly, a comprehensive model is presented to determine the level of papers' innovation bytwo external indicators includingthe journal impact factor and altmetrics. [Result/conclusion] An empirical study of carbon nanotube field demonstrated thatthis method can evaluate papers' innovation from the perspective of papers' contents effectively, quickly and accurately.

Hits 530 Downloads 168 Comment 0
3. ChinaXiv:202308.00605
Download

Citation Sentiment Recognition Method Based on Citation Content Analysis

Subjects: Library Science，Information Science >> Library Science submitted time 2023-08-27 Cooperative journals: 《图书情报工作》

Liao Junhua Liu Ziqiang Bai Rujiang Chen Junying

Abstract： [Purpose/significance] The paper proposes an identification method based on the analysis of citations content. And a visual display is presented to overcome the problem of different citation emotions based on simple reference frequency measurement. [Method/process] First, it uses regular expressions to extract the content information of the text in full text. Then, it uses the TF-IDF algorithm to select the quoted emotion feature words, combines the emotional dictionary, and uses emotional analysis technology to quote emotion recognition. Finally, the use of visual tools shows the overall distribution of the reference emotion. [Result/conclusion] The method can effectively identify emotional information in the domain of anti-aging. The experimental results show that the positive citation accounts for 21% of the total citation frequency, neutral citation accounts for 78% of the total citation frequency, and negative citation accounts for only 1% of the total citation frequency. Compared with the traditional citation network, the visualization map based on citation emotion can effectively identify the distribution of different citation emotions on the overall data set.

Hits 703 Downloads 185 Comment 0
4. ChinaXiv:202304.00332
Download

A Study of Knowledge Meme Heredity and Mutation in Academic Paper

Subjects: Library Science，Information Science >> Information Science submitted time 2023-04-01 Cooperative journals: 《图书情报工作》

Bai Rujiang Zhang Qingzhi Sun Yigang

Abstract： [Purpose/significance] The accumulation and inheritance of knowledge promotes the development of human society. This paper proposes to study the inheritance and variation of knowledge by using the knowledge gene of scientific and technological literature, in order to have a more intuitive and comprehensive perspective on the inheritance and development of knowledge.[Method/process] By analyzing the narrow and broad definitions of knowledge genes, the research significance of knowledge genes was determined and their specific research objects were discussed. Two ways of expression of knowledge genes in scientific and technological literature were proposed, and the types of knowledge genes in scientific and technological literature were analyzed. The main factors affecting the inheritance and variation of knowledge genes were summarized, and the inheritance of knowledge genes was designed. And the method of variation.[Result/conclusion] The identification of knowledge genes in scientific and technological literature can effectively reveal the knowledge inheritance and iteration between different documents, and promote the development and perfection of the theoretical system of knowledge memes.

Hits 283 Downloads 129 Comment 0
5. ChinaXiv:202304.00515
Download

Research on SAO Short Text Classification in LIS Based on Semantic Association and BERT

Subjects: Library Science，Information Science >> Information Science submitted time 2023-04-01 Cooperative journals: 《图书情报工作》

Zhang Yujie Bai Rujiang Liu Mingyue Yu Chunliang

Abstract： [Purpose/significance] Aiming at the shortage of semantic features and insufficient domain knowledge in the classification of SAO structure short texts, this paper proposes a SAO classification method combining semantic association and BERT in order to improve the classification effect.[Method/process] Taking the SAO short text in the library and information science field as the data source, firstly, a semantic association scheme including the three links of "Expansion-Reconstruction-NoiseReduction" was designed. The semantic information of SAO was extended through semantic expansion and SAO reconstruction, and the extended noise interference problem was solved by semantic noise reduction; then used the BERT model to train the SAO short text after semantic association; finally realized automatic classification in the classification part.[Result/conclusion] After comparing different association values, learning rates and classifiers, the experimental results show that when the association value is 10 and the learning rate is 4e-5, the SAO short text classification effect is optimal, and the average F1 value is 0.852 2, which is comparable to SVM and LSTM compared with pure BERT, the F1 value is increased by 0.103 1, 0.153 8 and 0.140 5 respectively.

Hits 201 Downloads 110 Comment 0
6. ChinaXiv:202304.00640
Download

Research on Technology Opportunity Discovery Based on Comment Topic Identification and Multi Dimension Analysis of Technical Attributes

Subjects: Library Science，Information Science >> Information Science submitted time 2023-04-01 Cooperative journals: 《图书情报工作》

Wu Yiping Bai Rujiang Liu Mingyue Wang Xiaoyue

Abstract： [Purpose/significance] This paper proposed a technology opportunity discovery method which integrated comment topic identification and multi-dimensional analysis of technology attributes, identified technology opportunities from the perspective of technology demand driven, and provided decision-making support for enterprises' forward-looking layout of R & D direction and scientific research management planning. [Method/process] Product online comments were used as the research data source. Firstly, LDA topic model was used to identify the technical topics of comments, and two indicators of technical comment topic strength and topic novelty were proposed to screen out the emerging key technical comment topics. Then, technical attribute words were manually selected from academic papers and technical patents, and high-frequency comment words were obtained through TF-IDF value calculation. Combined with expert knowledge, technical feature words were further selected, and product technical attribute words technical feature words list was constructed. Through the correlation calculation, the technical attributes related to the comments and the topics of the emerging key technology comments were obtained respectively. Finally, this paper proposed an index model to identify important technical attributes of products, and designed a multi-dimensional analysis method to analyze the characteristics of important technical attributes of products, and finally identified the emerging technology opportunities contained in the comment text. [Result/conclusion] The experimental results show that this method can effectively identify technology opportunities prospectively, and provide reference for enterprise product technology R & D management.

Hits 239 Downloads 130 Comment 0
7. ChinaXiv:201712.01401
Download

一种基于语义组块特征的改进 Cosine 文本相似度计算方法*

Subjects: Library Science，Information Science >> Information Science submitted time 2017-12-05 Cooperative journals: 《数据分析与知识发现》

白如江冷伏海廖君华

Abstract：【目的】利用文本语义组块特征提升 Cosine 文本相似度计算性能。【方法】获取 NSF 资助的关于碳纳米管研究领域的项目数据, 进行词干还原、词性标注等预处理; 利用条件随机场模型实现文本内容的语义组块标注; 在此基础上实现基于语义组块特征的改进 Cosine 文本相似度计算, 并与未标注的数据进行相似度计算比较, 分析实验结果。【结果】实验证明基于语义组块特征的改进 Cosine 相似度计算结果比原始文本 Cosine 相似度计算结果相似度均有不同程度的提升, 在实验数据中最高的相似度提升了 26%。【局限】依赖于语义组块标注性能。【结论】本文方法能有效提升文本间语义相似度, 降低向量空间模型维度, 提高计算效率, 并且具有良好的泛化能力和鲁棒性。

Hits 2388 Downloads 1275 Comment 0
8. ChinaXiv:201712.01612
Download

一种基于语义组块特征的改进 Cosine 文本相似度计算方法*

Subjects: Library Science，Information Science >> Information Science submitted time 2017-11-30 Cooperative journals: 《数据分析与知识发现》

白如江冷伏海廖君华

Abstract：【目的】利用文本语义组块特征提升 Cosine 文本相似度计算性能。【方法】获取 NSF 资助的关于碳纳米管研究领域的项目数据, 进行词干还原、词性标注等预处理; 利用条件随机场模型实现文本内容的语义组块标注; 在此基础上实现基于语义组块特征的改进 Cosine 文本相似度计算, 并与未标注的数据进行相似度计算比较, 分析实验结果。【结果】实验证明基于语义组块特征的改进 Cosine 相似度计算结果比原始文本 Cosine 相似度计算结果相似度均有不同程度的提升, 在实验数据中最高的相似度提升了 26%。【局限】依赖于语义组块标注性能。【结论】本文方法能有效提升文本间语义相似度, 降低向量空间模型维度, 提高计算效率, 并且具有良好的泛化能力和鲁棒性。

Hits 2397 Downloads 1303 Comment 0
9. ChinaXiv:201711.02034
Download

面向情报研究的文本语义挖掘方法述评

Subjects: Library Science，Information Science >> Information Science submitted time 2017-11-08 Cooperative journals: 《数据分析与知识发现》

赵冬晓王效岳白如江刘自强

Abstract：【目的】对主要的文本语义挖掘方法及其在情报研究中的应用进行综述分析。【文献范围】集中选择近10年国内外主流的文本语义挖掘方法在情报研究领域的应用以及少数此前的代表性研究和文本语义挖掘方法的进展研究。【方法】分别概括介绍词、句子和篇章粒度的文本语义挖掘方法、算法, 并通过主题演化和技术挖掘领域的实际应用进行方法剖析。【结果】文本语义挖掘方法与传统的情报分析方法相比, 主要弥补了两个缺陷: 侧重于分析结构化的数据, 无法处理多种异构的数据源; 分析停留在统计语法层面, 没有深入到文本的语义信息。【局限】仅对主流的文本语义挖掘方法以及在科学研究领域的应用进行综述分析, 研究不全面。【结论】文本语义挖掘方法弥补了传统情报分析方法的不足, 是情报研究方法的重要发展方向, 随着方法的成熟, 下一步研究重点是外部语义资源的丰富。

Hits 2560 Downloads 1794 Comment 0