FoodFlow Insights, A Real-Time and Batch Analytics for Food Delivery

FoodFlow Insights is a data analytics project that demonstrates the power of stream and batch processing for a food delivery platform. Built with Apache Kafka and Apache Spark, it processes order data to provide real-time insights and daily aggregated metrics, showcasing the contrast between stream and batch processing paradigms.

What It Does?

Real-Time Analytics (Stream Processing)

  1. Using a 60-second window, the project delivers instant insights into:
  2. Top-Selling Item: Identifies the most popular item ordered in real-time.
  3. Top-Selling Category: Tracks the leading food category within the window.
  4. Maximum Order Value: Highlights the highest-value order placed in the last 60 seconds.

Daily Aggregated Metrics (Batch Processing)

The project computes comprehensive daily summaries, including:

  1. Total Orders: Counts the number of orders placed each day.
  2. Maximum Order Value: Determines the highest order value for the day.
  3. Most Sold Category: Identifies the top food category by sales volume.
  4. Total Revenue: Calculates the day's total revenue.

Order Status Tracking

  1. Monitors and aggregates order status updates (e.g., "order-cancelled") to provide insights into order lifecycle trends.

Data Flow

  1. Data Generation: A Python script generates realistic order data, including new orders and status updates, in JSON format.
  2. Kafka Integration: Publishes data to NEW_ORDER and ORDER_UPDATE topics for processing.
  3. Database Storage: A batch consumer listens to the NEW_ORDER topic and stores item details in a database for persistent analysis.

Why It Matters?

FoodFlow Insights illustrates how stream and batch processing can work together to provide both immediate and historical insights for a food delivery business. It’s a practical example of leveraging big data technologies to drive decision-making.

Explore the source code on my GitHub!

Happy coding!