Designing Real-time Data Pipelines for Predictive Analytics in Large-scale Systems

Authors:
Divya Kodi

Addresses:
Department of Cyber Security, Truist Bank Financial, California, United States of America. 

Abstract:

With the age of data decision-making, real-time data pipelines have become an integral building block for predictive analytics in big-scale systems. The article outlines the design, deployment, and challenge of creating reliable real-time data pipelines for predictive analytics at scale. Data used in this research is sourced from an e-commerce site, involving transactional information, customer behaviour, product surfing, and purchasing interactions. The data set consists of many structured and unstructured data and perfectly signifies the complexity involved while processing high-velocity, big-scale data for predictive analytics. We touch upon key domains such as ingestion, processing, storage, and analytics and also talk about varied architectures such as Lambda and Kappa that offer fault-tolerant scalability. We talk about employing machine learning models and consuming streams of real-time data and predictive models for deriving actionable insight for big systems. Apart from technology and operations-based needs, this paper also describes the best practices, tools, and frameworks necessary to correctly implement real-time data pipelines for predictive analytics. The study emphasizes pipeline optimization with low latency, high throughput, and fault tolerance to enable long-term and precise predictions.

Keywords: Real-Time Data Pipelines; Predictive Analytics; Machine Learning; Large-Scale Systems; Data Ingestion; IoT Sensor; Product Information; Social Media Data; Transactional Data; Product Surfing and Purchasing Interactions.

Received on: 02/05/2024, Revised on: 30/07/2024, Accepted on: 07/09/2024, Published on: 03/12/2024

DOI: 10.69888/FTSCS.2024.000294

FMDB Transactions on Sustainable Computing Systems, 2024 Vol. 2 No. 4, Pages: 178-188

  • Views : 167
  • Downloads : 6
Download PDF