ChinaXiv.org 中国科学院科技论文预发布平台

Submitted Date

2023
1
2017
3

Subjects

Information Science
4

Authors

Institution

result total 4.

Hide Summary

Hits

Date

Downloads

Your conditions: 中国人民大学数据工程与知识工程教育部重点实验室北京 100872

1. ChinaXiv:202304.00686
Download

Scientific Data Management from the Perspective of Archives: A Study Based on Relevant Achievements of International Organizations

Subjects: Library Science，Information Science >> Information Science submitted time 2023-04-01 Cooperative journals: 《图书情报工作》

Wang Ning Liu Yuenan

Abstract： [Purpose/significance] In the context of global e-science development, scientific data management practices have increasingly shown a desire for interdisciplinary thinking and methods. The use of relevant theories and methods in the field of archives can help improve the quality and efficiency of scientific data preservation, sharing, and reuse. [Method/process] By use of text coding analysis and comprehensive integration method, the archival methods and the involved scientific data management work were extracted and inducted from the research achievements of four international organizations including OCLC, DCC, RDA and ICA, as well as other related literature. [Result/conclusion] It is found that the methods of archival science include appraising and disposal, digital continuity, context management, long-term preservation are necessary to carry out scientific data management. It is recommended to improve the effectiveness of scientific data management by conducting interdisciplinary cooperation dialogues, establishing a cross-agency continuity management regulation framework, and cultivating data librarians with archival expertise.

Hits 189 Downloads 87 Comment 0
2. ChinaXiv:201711.01966
Download

词向量聚类加权TextRank 的关键词抽取

Subjects: Library Science，Information Science >> Information Science submitted time 2017-11-08 Cooperative journals: 《数据分析与知识发现》

夏天

Abstract：【目的】将维基百科蕴涵的世界知识以词向量方式融入TextRank 模型, 改进单文档关键词抽取效果。【方法】利用Word2Vec 模型基于维基百科中文数据, 生成词向量模型, 对TextRank 词图节点的词向量进行聚类以调整簇内节点的投票重要性, 结合节点的覆盖和位置因素, 计算节点之间的随机跳转概率, 生成转移矩阵, 最终通过迭代计算获得节点的重要性得分, 选取前TopN 个词语生成关键词。【结果】当TopN≤7 时, 词向量聚类加权方法均优于对比方法; TopN=3 时, F 值取得最大值, 比先前最优结果增量提升了3.374%; TopN＞7 时,结果与位置加权法相似。【局限】聚类分析使得计算开销变高。【结论】词向量聚类加权能够改善关键词抽取效果。

Hits 2119 Downloads 1284 Comment 0
3. ChinaXiv:201711.01989
Download

词向量聚类加权TextRank 的关键词抽取

Subjects: Library Science，Information Science >> Information Science submitted time 2017-11-08 Cooperative journals: 《数据分析与知识发现》

夏天

Abstract：【目的】将维基百科蕴涵的世界知识以词向量方式融入TextRank 模型, 改进单文档关键词抽取效果。【方法】利用Word2Vec 模型基于维基百科中文数据, 生成词向量模型, 对TextRank 词图节点的词向量进行聚类以调整簇内节点的投票重要性, 结合节点的覆盖和位置因素, 计算节点之间的随机跳转概率, 生成转移矩阵, 最终通过迭代计算获得节点的重要性得分, 选取前TopN 个词语生成关键词。【结果】当TopN≤7 时, 词向量聚类加权方法均优于对比方法; TopN=3 时, F 值取得最大值, 比先前最优结果增量提升了3.374%; TopN＞7 时,结果与位置加权法相似。【局限】聚类分析使得计算开销变高。【结论】词向量聚类加权能够改善关键词抽取效果。

Hits 3094 Downloads 2256 Comment 0
4. ChinaXiv:201711.01237
Download

基于维基百科的中文文本层次路径生成研究

Subjects: Library Science，Information Science >> Information Science submitted time 2017-10-11 Cooperative journals: 《数据分析与知识发现》

夏天

Abstract： [Objective] Generate hierarchical semantic paths of texts from Wikipedia. [Methods] We first establish article concept vector of Chinese texts from Wikipedia through explicit semantic analysis. And then, we mapped the vector to the category nodes of hierarchical-tree-like graph. Finally, we generated the hierarchical paths with the help of seed node information diffusion and top-down path selection, as well as optimization technology. [Results] The average relevance degree of the first generated hierarchical path was 54.10% on the test dataset, and the top 20 paths were sorted by relevance in the descending order. [Limitations] We did not analyze the effect of using different numbers of explicit concept vector to the quality of the generated path. [Conclusions] The hierarchical paths generated from Wikipedia can reflect the main semantic meaning of the given texts.

Hits 2036 Downloads 1266 Comment 0

Scientific Data Management from the Perspective of Archives: A Study Based on Relevant Achievements of International Organizations

词向量聚类加权TextRank 的关键词抽取

词向量聚类加权TextRank 的关键词抽取

基于维基百科的中文文本层次路径生成研究