论文标题
奥斯卡:一种基于语义的数据结合方法
OSCAR: A Semantic-based Data Binning Approach
论文作者
论文摘要
Binning应用于对数据值进行分类或查看数据的分布。现有的Binning算法通常依赖于数据的统计属性。但是,有一些语义考虑选择适当的分子方案。例如,调查收集受访者的数据,以获取与人口相关的问题,例如年龄,工资,员工数量等,这些问题被列入定义的语义类别。在本文中,我们利用调查数据和Tableau公共可视化的常见语义类别来确定一组语义binning类别。我们在奥斯卡中采用了这些语义分子类别:一种基于该字段推断的语义类型自动选择垃圾箱的方法。我们对120名参与者进行了众包研究,以更好地了解Oscar与Tableau中提供的Binning生成的用户偏好。我们发现,与纯粹基于数据的统计属性相比,用户首选使用Oscar生成的binned值的地图和直方图。
Binning is applied to categorize data values or to see distributions of data. Existing binning algorithms often rely on statistical properties of data. However, there are semantic considerations for selecting appropriate binning schemes. Surveys, for instance, gather respondent data for demographic-related questions such as age, salary, number of employees, etc., that are bucketed into defined semantic categories. In this paper, we leverage common semantic categories from survey data and Tableau Public visualizations to identify a set of semantic binning categories. We employ these semantic binning categories in OSCAR: a method for automatically selecting bins based on the inferred semantic type of the field. We conducted a crowdsourced study with 120 participants to better understand user preferences for bins generated by OSCAR vs. binning provided in Tableau. We find that maps and histograms using binned values generated by OSCAR are preferred by users as compared to binning schemes based purely on the statistical properties of the data.