TY - GEN
T1 - Near Real-Time Big Data Stream Processing Platform Using Cassandra
AU - Pal, Gautam
AU - Li, Gangmin
AU - Atkinson, Katie
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/10
Y1 - 2018/10
N2 - Users are always impatient to get answers instantly from analytics system. If time to insight exceeds 10s of milliseconds, then the value is lost. Applications such as stock market, sensors, Twitter feed data or fraud detection can't afford to wait. This often means analyzing the inflow of data before it even stored to the database of records. Coupled with zero tolerance for data loss and the challenge gets even more daunting. In realtime Big Data scenario rather waiting for data to be collected as a whole at a long periodic interval, streaming analysis let us identify patterns and make informed decisions based on them-as data start arriving. When data are non-stationary, and patterns change with time, streaming systems adapt itself. This work describes near real-time data storage and processing approaches to analyze streams of data with respect to Cassandra NoSQL datastore. It provides an insight into optimizing Cassandra on a multi data center setup for near Real-Time Responses. The classic trade-off between low-latency and high-accuracy is conceptualized. The theoretical claims are corroborated with several thorough experimental analysis in Apache and Datastax distribution of Cassandra.
AB - Users are always impatient to get answers instantly from analytics system. If time to insight exceeds 10s of milliseconds, then the value is lost. Applications such as stock market, sensors, Twitter feed data or fraud detection can't afford to wait. This often means analyzing the inflow of data before it even stored to the database of records. Coupled with zero tolerance for data loss and the challenge gets even more daunting. In realtime Big Data scenario rather waiting for data to be collected as a whole at a long periodic interval, streaming analysis let us identify patterns and make informed decisions based on them-as data start arriving. When data are non-stationary, and patterns change with time, streaming systems adapt itself. This work describes near real-time data storage and processing approaches to analyze streams of data with respect to Cassandra NoSQL datastore. It provides an insight into optimizing Cassandra on a multi data center setup for near Real-Time Responses. The classic trade-off between low-latency and high-accuracy is conceptualized. The theoretical claims are corroborated with several thorough experimental analysis in Apache and Datastax distribution of Cassandra.
KW - Cassandra
KW - Datastax
KW - Real-Time Big Data Analytics
KW - Real-Time Data Ingestion
UR - http://www.scopus.com/inward/record.url?scp=85084121831&partnerID=8YFLogxK
U2 - 10.1109/I2CT42659.2018.9058101
DO - 10.1109/I2CT42659.2018.9058101
M3 - Conference Proceeding
AN - SCOPUS:85084121831
T3 - 2018 4th International Conference for Convergence in Technology, I2CT 2018
BT - 2018 4th International Conference for Convergence in Technology, I2CT 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 4th International Conference for Convergence in Technology, I2CT 2018
Y2 - 27 October 2018 through 28 October 2018
ER -