论文标题

评估散点图的采样方法

Evaluation of Sampling Methods for Scatterplots

论文作者

Yuan, Jun, Xiang, Shouxing, Xia, Jiazhi, Yu, Lingyun, Liu, Shixia

论文摘要

给定一个具有数万点甚至更多点的散点图,一个自然的问题是应使用哪种抽样方法来创建一个小但良好的散点图,以更好地抽象。我们介绍了一项用户研究的结果,该研究研究了不同采样策略对多级散点图的影响。这项研究的主要目的是了解采样方法在保留散点图的密度,离群值和整体形状方面的能力。为此,我们全面审查了文献,并选择了七个典型的抽样策略以及八个代表性数据集。然后,我们设计了四个实验,以了解维护不同策略的性能:1)区域密度; 2)班级密度; 3)异常值; 4)抽样结果的总体形状。结果表明:1)对于保留区域密度,首选随机采样; 2)蓝噪声采样和随机抽样与保留类密度的三种多级抽样策略具有可比性的性能; 3)基于偏置密度的采样,基于递归细分的采样和蓝色噪声采样表现最佳,以保持离群值的最佳状态; 4)蓝噪声采样的表现优于其他噪声,以维持散点图的整体形状。

Given a scatterplot with tens of thousands of points or even more, a natural question is which sampling method should be used to create a small but "good" scatterplot for a better abstraction. We present the results of a user study that investigates the influence of different sampling strategies on multi-class scatterplots. The main goal of this study is to understand the capability of sampling methods in preserving the density, outliers, and overall shape of a scatterplot. To this end, we comprehensively review the literature and select seven typical sampling strategies as well as eight representative datasets. We then design four experiments to understand the performance of different strategies in maintaining: 1) region density; 2) class density; 3) outliers; and 4) overall shape in the sampling results. The results show that: 1) random sampling is preferred for preserving region density; 2) blue noise sampling and random sampling have comparable performance with the three multi-class sampling strategies in preserving class density; 3) outlier biased density based sampling, recursive subdivision based sampling, and blue noise sampling perform the best in keeping outliers; and 4) blue noise sampling outperforms the others in maintaining the overall shape of a scatterplot.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源