SmallClient for big data: an indexing framework towards fast data retrieval

Aisha Siddiqa; Ahmad Karim; Victor Chang

doi:10.1007/s10586-016-0712-4

SmallClient for big data: an indexing framework towards fast data retrieval

Aisha Siddiqa^*, Ahmad Karim, Victor Chang

^*Corresponding author for this work

International Business School Suzhou

Research output: Contribution to journal › Article › peer-review

17 Citations (Scopus)

Abstract

Numerous applications are continuously generating massive amount of data and it has become critical to extract useful information while maintaining acceptable computing performance. The objective of this work is to design an indexing framework which minimizes indexing overhead and improves query execution and data search performance with optimum aggregation of computing performance. We propose SmallClient, an indexing framework to speed up query execution. SmallClient has three modules: block creation, index creation and query execution. Block creation module supports improving data retrieval performance with minimum data uploading overhead. Index creation module allows maximum indexes on a dataset to increase index hit ratio with minimized indexing overhead. Finally, query execution module offers incoming queries to utilize these indexes. The evaluation shows that SmallClient outperforms Hadoop full scan with more than 90% search performance. Meanwhile, indexing overhead of SmallClient is reduced to approximately 50 and 80% for index size and indexing time respectively.

Original language	English
Pages (from-to)	1193-1208
Number of pages	16
Journal	Cluster Computing
Volume	20
Issue number	2
DOIs	https://doi.org/10.1007/s10586-016-0712-4
Publication status	Published - 1 Jun 2017

Keywords

Big data
Big data analytics
Big data indexing
Big data retrieval
Data search performance
Query execution

Access to Document

10.1007/s10586-016-0712-4

Cite this

@article{5633229f1c114f89aca32f48ec62d5d3,

title = "SmallClient for big data: an indexing framework towards fast data retrieval",

abstract = "Numerous applications are continuously generating massive amount of data and it has become critical to extract useful information while maintaining acceptable computing performance. The objective of this work is to design an indexing framework which minimizes indexing overhead and improves query execution and data search performance with optimum aggregation of computing performance. We propose SmallClient, an indexing framework to speed up query execution. SmallClient has three modules: block creation, index creation and query execution. Block creation module supports improving data retrieval performance with minimum data uploading overhead. Index creation module allows maximum indexes on a dataset to increase index hit ratio with minimized indexing overhead. Finally, query execution module offers incoming queries to utilize these indexes. The evaluation shows that SmallClient outperforms Hadoop full scan with more than 90% search performance. Meanwhile, indexing overhead of SmallClient is reduced to approximately 50 and 80% for index size and indexing time respectively.",

keywords = "Big data, Big data analytics, Big data indexing, Big data retrieval, Data search performance, Query execution",

author = "Aisha Siddiqa and Ahmad Karim and Victor Chang",

note = "Publisher Copyright: {\textcopyright} 2016, Springer Science+Business Media New York.",

year = "2017",

month = jun,

day = "1",

doi = "10.1007/s10586-016-0712-4",

language = "English",

volume = "20",

pages = "1193--1208",

journal = "Cluster Computing",

issn = "1386-7857",

number = "2",

}

TY - JOUR

T1 - SmallClient for big data

T2 - an indexing framework towards fast data retrieval

AU - Siddiqa, Aisha

AU - Karim, Ahmad

AU - Chang, Victor

PY - 2017/6/1

Y1 - 2017/6/1

N2 - Numerous applications are continuously generating massive amount of data and it has become critical to extract useful information while maintaining acceptable computing performance. The objective of this work is to design an indexing framework which minimizes indexing overhead and improves query execution and data search performance with optimum aggregation of computing performance. We propose SmallClient, an indexing framework to speed up query execution. SmallClient has three modules: block creation, index creation and query execution. Block creation module supports improving data retrieval performance with minimum data uploading overhead. Index creation module allows maximum indexes on a dataset to increase index hit ratio with minimized indexing overhead. Finally, query execution module offers incoming queries to utilize these indexes. The evaluation shows that SmallClient outperforms Hadoop full scan with more than 90% search performance. Meanwhile, indexing overhead of SmallClient is reduced to approximately 50 and 80% for index size and indexing time respectively.

AB - Numerous applications are continuously generating massive amount of data and it has become critical to extract useful information while maintaining acceptable computing performance. The objective of this work is to design an indexing framework which minimizes indexing overhead and improves query execution and data search performance with optimum aggregation of computing performance. We propose SmallClient, an indexing framework to speed up query execution. SmallClient has three modules: block creation, index creation and query execution. Block creation module supports improving data retrieval performance with minimum data uploading overhead. Index creation module allows maximum indexes on a dataset to increase index hit ratio with minimized indexing overhead. Finally, query execution module offers incoming queries to utilize these indexes. The evaluation shows that SmallClient outperforms Hadoop full scan with more than 90% search performance. Meanwhile, indexing overhead of SmallClient is reduced to approximately 50 and 80% for index size and indexing time respectively.

KW - Big data

KW - Big data analytics

KW - Big data indexing

KW - Big data retrieval

KW - Data search performance

KW - Query execution

UR - http://www.scopus.com/inward/record.url?scp=85006713715&partnerID=8YFLogxK

U2 - 10.1007/s10586-016-0712-4

DO - 10.1007/s10586-016-0712-4

M3 - Article

AN - SCOPUS:85006713715

SN - 1386-7857

VL - 20

SP - 1193

EP - 1208

JO - Cluster Computing

JF - Cluster Computing

IS - 2

ER -

SmallClient for big data: an indexing framework towards fast data retrieval

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this