Abstract
Continually growing big data by the intervention of electronic and automated devices affects the data retrieval performance of contemporary big data analytics technologies and makes exploration and adoption of improved procedures inevitable. Indexing on big data facilitates analytics in a way that it can store, process, access and analyze given data sets quickly and more efficiently once properly designed. This paper aims to propose a novel mathematical model that introduces an indexing mechanism and ensures improved data retrieval performance on data sets with support to growing volume of big data. The model is composed of three modules: block creation, index creation and query execution. Block creation module improves records access performance while avoiding remote access delays. Index creation module allows maximum possible indexes for big data with minimized indexing overhead. Query execution module performs data search and retrieval operation on user search queries. The evaluation of proposed mathematical model ensures that search performance for both small and big data sets is improved with minimized overhead of data uploading and indexing time. We further verify the results by implementing SmallClient logic on four-node physical cluster that prove the improved performance of proposed approach.
Original language | English |
---|---|
Pages (from-to) | 5241-5262 |
Number of pages | 22 |
Journal | Journal of Supercomputing |
Volume | 74 |
Issue number | 10 |
DOIs | |
Publication status | Published - 1 Oct 2018 |
Keywords
- Big data
- Big data analytics
- Big data indexing
- Indexing