You Need Storage. But Not Every Storage Is the Same.
If you’re building anything beyond a small app, you’ll eventually hit this problem:
Where do we put all this data?
At first, everything fits:
- Your database works
- Your backend runs fine
- Your events flow through Kafka
Then one day:
You have 10TB of logs.
Or telemetry dumps.
Or simulation outputs.
Or validation datasets.
And suddenly… your architecture starts to feel wrong.
Databases Are Not Designed for This
Relational databases are great for:
- Transactions
- Queries
- Structured data
But they are NOT built to store:
- Massive binary files
- Multi-GB exports
- TB-scale historical dumps
You can store blobs.
You just shouldn’t.
Event Streaming Is Not Storage Either
Kafka is amazing for:
- Real-time processing
- State updates
- Event-driven systems
But Kafka is not your data lake.
Keeping TBs of historical data in your event system:
- Makes replay expensive
- Makes retention tricky
- Makes scaling harder
Event systems move data.
They don’t replace storage.
The Real Solution: Object Storage
When your data becomes big and heavy, you need:
Object storage.
Think of it as:
A scalable, distributed file system
built for huge files
with versioning and lifecycle control.
This is what S3 is.
But S3 is just a concept — not only AWS.
Two Ways to See It
Before choosing a tool, choose the model.
#Centralized
All big datasets go into one shared storage cluster.
Pros:
- Easy replay
- Easy validation
- One catalog
- Cleaner governance
Cons:
- You must operate it
#Federated
Each team keeps its own raw data.
Only selected exports are shared.
Pros:
- Less central responsibility
Cons:
- Harder to reproduce results
- Harder to replay everything
If you care about reproducibility → centralized wins.
5 Self-Hosted Object Storage Options
If you don’t want cloud storage, here are real options:
#MinIO
Lightweight. Easy. Fully S3-compatible.

#Ceph
Enterprise-grade. Massive scale. More complex.

#SeaweedFS
Optimized for performance and throughput.

#Garage
Minimal and efficient distributed object storage.

#OpenStack Swift
Mature and proven in OpenStack ecosystems.

When Should You Self-Host?
Self-hosting makes sense if:
- You care about privacy
- You need full control
- Your data grows fast
- You want predictable costs
- You’re building infrastructure, not just apps
Cloud is easier.
Self-hosted is more controlled.
Pick your tradeoff.
Final Thought
If you’re building distributed systems:
Control plane moves data.
Data plane stores data.
Don’t confuse the two.
That mistake becomes expensive.
