Abstract
Numerous applications are continuously generating massive amount of data and it has become critical to extract useful information while maintaining acceptable computing performance. The objective of this work is to design an indexing framework which minimizes indexing overhead and improves query execution and data search performance with optimum aggregation of computing performance. We propose SmallClient, an indexing framework to speed up query execution. SmallClient has three modules: block creation, index creation and query execution. Block creation module supports improving data retrieval performance with minimum data uploading overhead. Index creation module allows maximum indexes on a dataset to increase index hit ratio with minimized indexing overhead. Finally, query execution module offers incoming queries to utilize these indexes. The evaluation shows that SmallClient outperforms Hadoop full scan with more than 90% search performance. Meanwhile, indexing overhead of SmallClient is reduced to approximately 50 and 80% for index size and indexing time respectively.
Original language | English |
---|---|
Pages (from-to) | 1193-1208 |
Number of pages | 16 |
Journal | Cluster Computing |
Volume | 20 |
Issue number | 2 |
DOIs | |
Publication status | Published - 1 Jun 2017 |
Keywords
- Big data
- Big data analytics
- Big data indexing
- Big data retrieval
- Data search performance
- Query execution