You Need Storage. But Not Every Storage Is the Same.

If you’re building anything beyond a small app, you’ll eventually hit this problem:

Where do we put all this data?

At first, everything fits:

Your database works
Your backend runs fine
Your events flow through Kafka

Then one day:

You have 10TB of logs.
Or telemetry dumps.
Or simulation outputs.
Or validation datasets.

And suddenly… your architecture starts to feel wrong.

Databases Are Not Designed for This

Relational databases are great for:

Transactions
Queries
Structured data

But they are NOT built to store:

Massive binary files
Multi-GB exports
TB-scale historical dumps

You can store blobs.

You just shouldn’t.

Event Streaming Is Not Storage Either

Kafka is amazing for:

Real-time processing
State updates
Event-driven systems

But Kafka is not your data lake.

Keeping TBs of historical data in your event system:

Makes replay expensive
Makes retention tricky
Makes scaling harder

Event systems move data.

They don’t replace storage.

The Real Solution: Object Storage

When your data becomes big and heavy, you need:

Object storage.

Think of it as:

A scalable, distributed file system
built for huge files
with versioning and lifecycle control.

This is what S3 is.

But S3 is just a concept — not only AWS.

Two Ways to See It

Before choosing a tool, choose the model.

#Centralized

All big datasets go into one shared storage cluster.

Pros:

Easy replay
Easy validation
One catalog
Cleaner governance

Cons:

You must operate it

#Federated

Each team keeps its own raw data.

Only selected exports are shared.

Pros:

Less central responsibility

Cons:

Harder to reproduce results
Harder to replay everything

If you care about reproducibility → centralized wins.

5 Self-Hosted Object Storage Options

If you don’t want cloud storage, here are real options:

#MinIO

Lightweight. Easy. Fully S3-compatible.

#Ceph

Enterprise-grade. Massive scale. More complex.

#SeaweedFS

Optimized for performance and throughput.

#Garage

Minimal and efficient distributed object storage.

#OpenStack Swift

Mature and proven in OpenStack ecosystems.

When Should You Self-Host?

Self-hosting makes sense if:

You care about privacy
You need full control
Your data grows fast
You want predictable costs
You’re building infrastructure, not just apps

Cloud is easier.

Self-hosted is more controlled.

Pick your tradeoff.

Final Thought

If you’re building distributed systems:

Control plane moves data.
Data plane stores data.

Don’t confuse the two.

That mistake becomes expensive.