Data Architecture — Fri Feb 20

← Home | ← data-architecture

Storage isn't always about Money

Fri Feb 20
#database #dba #architecture #organizing data #on-prem #performance #cloud data #data lake #partitioning #data-architecture #scalability

When considering storage I like to think of this analogy

I own a freezer just like you do! When I’m pushing my old trolley around the supermarket, I like to check the offers on meat! Meat is expensive these days. So if it’s on offer, I like to grab something and pop it in the freezer when I get home. Ergo, I make a saving!

However, I know people who have two freezers and do the same. But once that extra meat is in the second freezer, they don’t necessarily “need” it. There is a timing element based on how long they keep that purchase before it’s not a saving — it’s actually more expensive.

Through that lens, it doesn’t make sense!

So, are there other reasons they have this extra freezer?

Some people are stocking up for a rainy day. Perhaps being extra stocked gives them peace of mind. Perhaps they have a massive family, so demand on their stocks is greater? Maybe they have a little food business? Maybe they shoot or fish?

There are tons of reasons they would double up which are not financial considerations.


Data Storage Works Exactly the Same Way

In technology, we often ask:

Why do we need more storage?
Why do we keep old data?
Why are we duplicating it?

From a pure cost perspective, extra storage can look wasteful. Especially when cloud invoices arrive.

But just like the second freezer, storage is rarely about just price.

It is about purpose.


1. Availability & Redundancy

Sometimes we duplicate data because we cannot afford to lose it.

  • Backups
  • Geo-replication
  • Disaster recovery copies
  • Snapshots

If your primary system fails, that second “freezer” keeps the business running.

You don’t buy insurance because it’s financially efficient.
You buy it because failure is expensive.


2. Performance

Duplicated data is often about speed.

  • Data warehouses copy data from transactional systems.
  • Reporting layers duplicate raw data into aggregated models.
  • Caches duplicate frequently accessed information.

Why?

Because you don’t want your checkout system grinding to a halt while someone runs a massive analytical query.

In freezer terms: You don’t want to defrost everything every time you cook dinner.


3. Isolation of Workloads

Production systems need stability.

Data science teams need freedom.

So we copy data into:

  • Sandboxes
  • Data lakes
  • Feature stores
  • Analytical clusters

This duplication allows experimentation without risking the operational core.

It’s like having a second freezer for the business stock — separate from family food.


4. Compliance & Audit

Some data must be retained:

  • Financial records
  • Energy production logs
  • Asset histories
  • Regulatory reporting data

Sometimes you are required to keep exact historical copies, even if the operational system has moved on.

In this case, duplication isn’t optional.

It’s legal.


5. Historical Preservation

Businesses evolve.

Schemas change. Columns get renamed. Systems get replaced.

If you don’t snapshot or preserve historical states, you lose the ability to answer:

  • What did we know at the time?
  • What did the data look like when that decision was made?
  • What was the turbine configuration before failure?

Duplicated data can act as a time capsule.


6. Psychological Safety (Yes, Really)

Just like the second freezer can bring peace of mind, duplicated data can reduce organisational anxiety.

Engineers sleep better knowing:

  • Backups exist
  • Rollback is possible
  • A migration can fail safely

Storage isn’t always rational. Sometimes it’s emotional.

But emotional stability in engineering teams has value.


7. Business Growth & Scale

Sometimes the second freezer is simply preparation.

  • Growing customer base
  • Expanding product lines
  • Increasing telemetry
  • More turbines, more sensors, more logs

Storage often scales ahead of immediate need.

Because rebuilding under pressure is far more expensive.


When Duplication Becomes a Problem

However — and this is important — not all duplication is intentional.

Bad duplication looks like:

  • Multiple versions of truth
  • Unclear ownership
  • Uncontrolled S3 buckets
  • Ad hoc exports
  • Manual CSV archives

That’s not strategic redundancy.

That’s entropy.

There is a difference between:

  • Designed duplication
  • Accidental sprawl

The first is architecture.
The second is drift.


The Real Question

The question is not:

“Why do we have duplicated data?”

The question is:

“Do we understand why it exists?”

If the answer is:

  • Availability
  • Performance
  • Compliance
  • Experimentation
  • Historical preservation
  • Scale

Then it likely makes sense.

If the answer is:

“I’m not sure who created that bucket…”

You may have too many freezers.


Storage is not just a cost line.

It is:

  • Risk mitigation
  • Operational stability
  • Analytical freedom
  • Regulatory compliance
  • Psychological comfort
  • Business growth

And sometimes, yes —

It’s just meat on offer.


Gareth Winterman