Evaluation of Sampling Methods for Scatterplots

Jun Yuan; Shouxing Xiang; Jiazhi Xia; Lingyun Yu; Shixia Liu

doi:10.1109/TVCG.2020.3030432

Evaluation of Sampling Methods for Scatterplots

Jun Yuan, Shouxing Xiang, Jiazhi Xia, Lingyun Yu, Shixia Liu^*

^*Corresponding author for this work

Department of Computing

Tsinghua University

Research output: Contribution to journal › Article › peer-review

30 Citations (Scopus)

Abstract

Given a scatterplot with tens of thousands of points or even more, a natural question is which sampling method should be used to create a small but 'good' scatterplot for a better abstraction. We present the results of a user study that investigates the influence of different sampling strategies on multi-class scatterplots. The main goal of this study is to understand the capability of sampling methods in preserving the density, outliers, and overall shape of a scatterplot. To this end, we comprehensively review the literature and select seven typical sampling strategies as well as eight representative datasets. We then design four experiments to understand the performance of different strategies in maintaining: 1) region density; 2) class density; 3) outliers; and 4) overall shape in the sampling results. The results show that: 1) random sampling is preferred for preserving region density; 2) blue noise sampling and random sampling have comparable performance with the three multi-class sampling strategies in preserving class density; 3) outlier biased density based sampling, recursive subdivision based sampling, and blue noise sampling perform the best in keeping outliers; and 4) blue noise sampling outperforms the others in maintaining the overall shape of a scatterplot.

Original language	English
Article number	9226404
Pages (from-to)	1720-1730
Number of pages	11
Journal	IEEE Transactions on Visualization and Computer Graphics
Volume	27
Issue number	2
DOIs	https://doi.org/10.1109/TVCG.2020.3030432
Publication status	Published - Feb 2021

Keywords

Scatterplot
data sampling
empirical evaluation

Access to Document

10.1109/TVCG.2020.3030432

Cite this

@article{a083bed327134a5298d3467e3f907da2,

title = "Evaluation of Sampling Methods for Scatterplots",

abstract = "Given a scatterplot with tens of thousands of points or even more, a natural question is which sampling method should be used to create a small but 'good' scatterplot for a better abstraction. We present the results of a user study that investigates the influence of different sampling strategies on multi-class scatterplots. The main goal of this study is to understand the capability of sampling methods in preserving the density, outliers, and overall shape of a scatterplot. To this end, we comprehensively review the literature and select seven typical sampling strategies as well as eight representative datasets. We then design four experiments to understand the performance of different strategies in maintaining: 1) region density; 2) class density; 3) outliers; and 4) overall shape in the sampling results. The results show that: 1) random sampling is preferred for preserving region density; 2) blue noise sampling and random sampling have comparable performance with the three multi-class sampling strategies in preserving class density; 3) outlier biased density based sampling, recursive subdivision based sampling, and blue noise sampling perform the best in keeping outliers; and 4) blue noise sampling outperforms the others in maintaining the overall shape of a scatterplot.",

keywords = "Scatterplot, data sampling, empirical evaluation",

author = "Jun Yuan and Shouxing Xiang and Jiazhi Xia and Lingyun Yu and Shixia Liu",

note = "Publisher Copyright: {\textcopyright} 1995-2012 IEEE.",

year = "2021",

month = feb,

doi = "10.1109/TVCG.2020.3030432",

language = "English",

volume = "27",

pages = "1720--1730",

journal = "IEEE Transactions on Visualization and Computer Graphics",

issn = "1077-2626",

number = "2",

}

TY - JOUR

T1 - Evaluation of Sampling Methods for Scatterplots

AU - Yuan, Jun

AU - Xiang, Shouxing

AU - Xia, Jiazhi

AU - Yu, Lingyun

AU - Liu, Shixia

PY - 2021/2

Y1 - 2021/2

N2 - Given a scatterplot with tens of thousands of points or even more, a natural question is which sampling method should be used to create a small but 'good' scatterplot for a better abstraction. We present the results of a user study that investigates the influence of different sampling strategies on multi-class scatterplots. The main goal of this study is to understand the capability of sampling methods in preserving the density, outliers, and overall shape of a scatterplot. To this end, we comprehensively review the literature and select seven typical sampling strategies as well as eight representative datasets. We then design four experiments to understand the performance of different strategies in maintaining: 1) region density; 2) class density; 3) outliers; and 4) overall shape in the sampling results. The results show that: 1) random sampling is preferred for preserving region density; 2) blue noise sampling and random sampling have comparable performance with the three multi-class sampling strategies in preserving class density; 3) outlier biased density based sampling, recursive subdivision based sampling, and blue noise sampling perform the best in keeping outliers; and 4) blue noise sampling outperforms the others in maintaining the overall shape of a scatterplot.

AB - Given a scatterplot with tens of thousands of points or even more, a natural question is which sampling method should be used to create a small but 'good' scatterplot for a better abstraction. We present the results of a user study that investigates the influence of different sampling strategies on multi-class scatterplots. The main goal of this study is to understand the capability of sampling methods in preserving the density, outliers, and overall shape of a scatterplot. To this end, we comprehensively review the literature and select seven typical sampling strategies as well as eight representative datasets. We then design four experiments to understand the performance of different strategies in maintaining: 1) region density; 2) class density; 3) outliers; and 4) overall shape in the sampling results. The results show that: 1) random sampling is preferred for preserving region density; 2) blue noise sampling and random sampling have comparable performance with the three multi-class sampling strategies in preserving class density; 3) outlier biased density based sampling, recursive subdivision based sampling, and blue noise sampling perform the best in keeping outliers; and 4) blue noise sampling outperforms the others in maintaining the overall shape of a scatterplot.

KW - Scatterplot

KW - data sampling

KW - empirical evaluation

UR - http://www.scopus.com/inward/record.url?scp=85100354780&partnerID=8YFLogxK

U2 - 10.1109/TVCG.2020.3030432

DO - 10.1109/TVCG.2020.3030432

M3 - Article

C2 - 33074820

AN - SCOPUS:85100354780

SN - 1077-2626

VL - 27

SP - 1720

EP - 1730

JO - IEEE Transactions on Visualization and Computer Graphics

JF - IEEE Transactions on Visualization and Computer Graphics

IS - 2

M1 - 9226404

ER -

Evaluation of Sampling Methods for Scatterplots

Abstract

Keywords

Access to Document

Other files and links

Interaction Techniques for Visual Storytelling

Cite this

Evaluation of Sampling Methods for Scatterplots

Abstract

Keywords

Access to Document

Other files and links

Projects

Interaction Techniques for Visual Storytelling

Cite this