Subjects: Library Science,Information Science >> Information Science submitted time 2023-04-01 Cooperative journals: 《图书情报工作》
Abstract: [Purpose/significance] In the context of global e-science development, scientific data management practices have increasingly shown a desire for interdisciplinary thinking and methods. The use of relevant theories and methods in the field of archives can help improve the quality and efficiency of scientific data preservation, sharing, and reuse. [Method/process] By use of text coding analysis and comprehensive integration method, the archival methods and the involved scientific data management work were extracted and inducted from the research achievements of four international organizations including OCLC, DCC, RDA and ICA, as well as other related literature. [Result/conclusion] It is found that the methods of archival science include appraising and disposal, digital continuity, context management, long-term preservation are necessary to carry out scientific data management. It is recommended to improve the effectiveness of scientific data management by conducting interdisciplinary cooperation dialogues, establishing a cross-agency continuity management regulation framework, and cultivating data librarians with archival expertise.
Subjects: Library Science,Information Science >> Information Science submitted time 2017-11-08 Cooperative journals: 《数据分析与知识发现》
Abstract:【目的】将维基百科蕴涵的世界知识以词向量方式融入TextRank 模型, 改进单文档关键词抽取效果。【方法】利用Word2Vec 模型基于维基百科中文数据, 生成词向量模型, 对TextRank 词图节点的词向量进行聚类以调整簇内节点的投票重要性, 结合节点的覆盖和位置因素, 计算节点之间的随机跳转概率, 生成转移矩阵, 最终通过迭代计算获得节点的重要性得分, 选取前TopN 个词语生成关键词。【结果】当TopN≤7 时, 词向量聚类加权方法均优于对比方法; TopN=3 时, F 值取得最大值, 比先前最优结果增量提升了3.374%; TopN>7 时,结果与位置加权法相似。【局限】聚类分析使得计算开销变高。【结论】词向量聚类加权能够改善关键词抽取效果。
Subjects: Library Science,Information Science >> Information Science submitted time 2017-11-08 Cooperative journals: 《数据分析与知识发现》
Abstract:【目的】将维基百科蕴涵的世界知识以词向量方式融入TextRank 模型, 改进单文档关键词抽取效果。【方法】利用Word2Vec 模型基于维基百科中文数据, 生成词向量模型, 对TextRank 词图节点的词向量进行聚类以调整簇内节点的投票重要性, 结合节点的覆盖和位置因素, 计算节点之间的随机跳转概率, 生成转移矩阵, 最终通过迭代计算获得节点的重要性得分, 选取前TopN 个词语生成关键词。【结果】当TopN≤7 时, 词向量聚类加权方法均优于对比方法; TopN=3 时, F 值取得最大值, 比先前最优结果增量提升了3.374%; TopN>7 时,结果与位置加权法相似。【局限】聚类分析使得计算开销变高。【结论】词向量聚类加权能够改善关键词抽取效果。
Subjects: Library Science,Information Science >> Information Science submitted time 2017-10-11 Cooperative journals: 《数据分析与知识发现》
Abstract: [Objective] Generate hierarchical semantic paths of texts from Wikipedia. [Methods] We first establish article concept vector of Chinese texts from Wikipedia through explicit semantic analysis. And then, we mapped the vector to the category nodes of hierarchical-tree-like graph. Finally, we generated the hierarchical paths with the help of seed node information diffusion and top-down path selection, as well as optimization technology. [Results] The average relevance degree of the first generated hierarchical path was 54.10% on the test dataset, and the top 20 paths were sorted by relevance in the descending order. [Limitations] We did not analyze the effect of using different numbers of explicit concept vector to the quality of the generated path. [Conclusions] The hierarchical paths generated from Wikipedia can reflect the main semantic meaning of the given texts.