Evaluation of Sampling Methods for Scatterplots

Jun Yuan, Shouxing Xiang, Jiazhi Xia, Lingyun Yu, Shixia Liu*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

27 Citations (Scopus)

Abstract

Given a scatterplot with tens of thousands of points or even more, a natural question is which sampling method should be used to create a small but 'good' scatterplot for a better abstraction. We present the results of a user study that investigates the influence of different sampling strategies on multi-class scatterplots. The main goal of this study is to understand the capability of sampling methods in preserving the density, outliers, and overall shape of a scatterplot. To this end, we comprehensively review the literature and select seven typical sampling strategies as well as eight representative datasets. We then design four experiments to understand the performance of different strategies in maintaining: 1) region density; 2) class density; 3) outliers; and 4) overall shape in the sampling results. The results show that: 1) random sampling is preferred for preserving region density; 2) blue noise sampling and random sampling have comparable performance with the three multi-class sampling strategies in preserving class density; 3) outlier biased density based sampling, recursive subdivision based sampling, and blue noise sampling perform the best in keeping outliers; and 4) blue noise sampling outperforms the others in maintaining the overall shape of a scatterplot.

Original languageEnglish
Article number9226404
Pages (from-to)1720-1730
Number of pages11
JournalIEEE Transactions on Visualization and Computer Graphics
Volume27
Issue number2
DOIs
Publication statusPublished - Feb 2021

Keywords

  • Scatterplot
  • data sampling
  • empirical evaluation

Cite this