EventFlow-LOGO

From Event-Driven Prototype to Cloud-Ready Streaming Platform

February 12, 2026

From Event-Driven Prototype to Cloud-Ready Streaming Platform

Most Kafka demos stop at:

  • producer publishes\
  • consumer reads\
  • done

That's not how real systems behave.

Real systems receive unreliable webhooks.
They face retries, duplicates, malformed payloads, and partial failures.
They need observability.
They need CI.
They need infrastructure you can explain.

So I built EventFlow --- not as a toy demo, but as a reproducible streaming backbone.

šŸ”— Source code:
https://github.com/sofman65/EventFlow


High-Level Architecture

HIGH-LEVEL-ARCHITECTURE
HIGH-LEVEL-ARCHITECTURE

EventFlow models a payment authorization pipeline:

  1. External client calls /authorize
  2. External payment simulator signs and delivers webhook
  3. API Producer verifies signature and publishes immutable event
  4. Kafka routes events through processing stages
  5. Consumers operate independently
  6. Failures go to a Dead Letter Queue (DLQ)

No synchronous service calls.
No shared databases.
No hidden coupling.

Everything moves through Kafka.


The Services

The-different-seevices
The-different-seevices

The system consists of:

  • external-payment-simulator (FastAPI)
    • retries\
    • duplicate injection\
    • corruption injection
  • api-producer (FastAPI)
    • verifies webhook signature\
    • validates timestamp window\
    • publishes events.raw.v1
  • validator-consumer (Python)
    • enforces schema + business rules\
    • routes invalid events to events.dlq.v1
  • analytics-consumer (Python)
    • emits processing metrics
  • persistence-consumer (Java / Spring Boot)
    • writes validated events to PostgreSQL\
    • routes DB failures to DLQ

Each service owns exactly one responsibility.


Kafka Topic Flow

The system revolves around explicit topic stages:

  • events.raw.v1
  • events.validated.v1
  • events.dlq.v1
  • events.enriched.v1 (planned)

Flow:

raw → validated → analytics + persistence
ā†˜ DLQ (failures)

Events are immutable.
Failures are data.
New consumers can attach without touching producers.


Local Infrastructure (Docker Compose)

DOCKER-COMPOSE-UP-INFRA-SCREENSHOT
DOCKER-COMPOSE-UP-INFRA-SCREENSHOT

Local stack includes:

  • Kafka (KRaft mode)
  • PostgreSQL
  • Prometheus
  • Grafana

Everything starts with:

cd infra
docker compose up -d

Reproducibility matters.


Observability (Prometheus + Grafana)

PROMETHEUS-TARGETS
PROMETHEUS-TARGETS

Prometheus scrapes:

  • validator metrics
  • analytics metrics
  • persistence /actuator/prometheus
  • simulator metrics

You can see:

  • processing throughput
  • error rates
  • DLQ counts
  • event age

GRAPHANA-REALTIME
GRAPHANA-REALTIME

Grafana shows real-time streaming behavior.

A key lesson:

A system can be healthy and silent.

Without traffic shape, dashboards mean nothing.


CI Pipeline

CI-PIPELINE
CI-PIPELINE

CI runs:

  1. Python compile checks\
  2. Java Maven tests\
  3. Docker Compose smoke validation\
  4. End-to-end pipeline test

Unit tests don't prove distributed systems work.
End-to-end validation does.


Terraform Deployment Readiness

EventFlow is deployment-ready as a reproducible demo/interview cloud baseline.

Already in place:

  • Terraform stack for VPC, ECS Fargate, MSK, RDS, ALB, ECR, CloudWatch
  • Release CD workflow on tags (v*, release-*)
  • Image build and push automation
  • End-to-end CI validation before merge

Intentionally not production-hardened yet:

  • Kafka auth/TLS hardening
  • private-only networking for all tasks
  • stricter IAM boundaries
  • backup/restore + HA policies

This is intentional: the project optimizes for architectural clarity and reproducibility first.


Cost Breakdown (AWS us-east-1, 24/7)

Estimated with current Terraform defaults (as of February 11, 2026):

ComponentMonthly (USD)
ECS Fargate (compute + memory)63.07
MSK (2 x kafka.t3.small + 200 GB storage)86.58
RDS (db.t4g.micro + 20 GB gp3)13.98
ALB (fixed + low LCU usage)17.01 to 22.27
Public IPv4 charges25.55
Estimated base total206 to 212
ECR / logs / transfer overhead+5 to +30

Practical expected range: ~210 to ~240 USD/month.

Key takeaway: MSK is the dominant cost driver.

Context for larger Kafka sizing:

  • 3 x kafka.t3.small -> ~250 USD/month total
  • 3 x kafka.m5.large -> ~610 USD/month total
  • 3 x kafka.m5.xlarge -> ~1,070 USD/month total

What This Project Demonstrates

  • Clear event contracts
  • Isolation via consumer groups
  • DLQ-driven failure modeling
  • Observability-first mindset
  • CI for distributed flows
  • Infrastructure as Code
  • Cost awareness

šŸ”— GitHub
https://github.com/sofman65/EventFlow

šŸ“– Blog
https://sofianos-lampropoulos.com/blog