论文标题

数据集检索会话的特征:现实生活中的数字图书馆的体验

Characteristics of Dataset Retrieval Sessions: Experiences from a Real-life Digital Library

论文作者

Carevic, Zeljko, Roy, Dwaipayan, Mayr, Philipp

论文摘要

次要分析或现有调查数据的再利用是社会科学家的常见实践。在数字库中搜索相关数据集是对这个社区的一种陌生行为。数据集检索,尤其是在社会科学中,结合了其他材料,例如代码书,问卷,原始数据文件等。我们的假设是,由于数据集的多样性,文档检索模型通常无法有效地检索数据集。增强这些类型搜索的一种方法是将用户的交互上下文合并以个性化数据集检索会话。作为朝着这个长期目标的第一步,我们研究了从现实生活中的数字图书馆中为社会科学的数据集检索会话的特征,该学位同时结合了:研究数据和出版物。先前的研究报告了一种通过查询长度辨别文档搜索和数据集搜索之间的查询的方法。在本文中,我们辩称索赔并报告了我们关于查询的难以区分性的发现,无论是针对数据集还是文档。除其他外,我们还报告了有关查询特征,交互序列和局部漂移在65,000个唯一的会话中的发现结果。

Secondary analysis or the reuse of existing survey data is a common practice among social scientists. Searching for relevant datasets in Digital Libraries is a somehow unfamiliar behaviour for this community. Dataset retrieval, especially in the social sciences, incorporates additional material such as codebooks, questionnaires, raw data files and more. Our assumption is that due to the diverse nature of datasets, document retrieval models often do not work as efficiently for retrieving datasets. One way of enhancing these types of searches is to incorporate the users' interaction context in order to personalise dataset retrieval sessions. As a first step towards this long term goal, we study characteristics of dataset retrieval sessions from a real-life Digital Library for the social sciences that incorporates both: research data and publications. Previous studies reported a way of discerning queries between document search and dataset search by query length. In this paper, we argue the claim and report our findings of an indistinguishability of queries, whether aiming for a dataset or a document. Amongst others, we report our findings of dataset retrieval sessions with respect to query characteristics, interaction sequences and topical drift within 65,000 unique sessions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源