Data Architecture — Thu Feb 12

← Home | ← data-architecture

Choosing the Right Database: There is no such thing as “the best database.”

Thu Feb 12
#database #dba #architecture

Reality check before we start!

They build what delivers value first — the bread and butter systems that actually keep the lights on. The database choices made at that stage aren’t usually driven by architectural purity. They’re driven by speed, budget, and survival.

By the time anyone steps back to review those decisions, the systems are live. They’re generating revenue. They’re supporting customers. They’re carrying risk.

Rewriting them isn’t realistic.

As an architect, you don’t walk in and demand perfection. You work with what exists. You understand the constraints. And you influence the decisions that haven’t been made yet.

Good architecture isn’t about starting from scratch. It’s about steering what’s already there in a better direction.


Choosing the Right Database: Use Cases for Every Type

This bit isn’t complicated:

The right database for the workload!

Databases are tools. Different shapes. Different strengths. Different failure modes.

Here’s how each major type actually fits in the real world.


Relational Databases (OLTP)

Use when: You need transactions, integrity, and structured business logic.

Best for:

  • Financial systems
  • ERP / CRM
  • Inventory
  • Billing
  • Core application backends

Why they exist:

  • ACID compliance
  • Referential integrity
  • Strong schema enforcement
  • Mature tooling
  • Predictable behaviour

These are your systems of record.

Don’t use them for:

  • Massive horizontal scale
  • Petabyte analytics
  • Deep relationship traversal
  • High-volume telemetry ingestion

They are incredibly strong — but not infinite.


Analytical / Data Warehouse (OLAP)

Use when: You need to scan large datasets and aggregate everything.

Best for:

  • BI dashboards
  • Historical reporting
  • Data science workloads
  • Large joins across datasets
  • ELT pipelines

Why they exist:

  • Columnar storage
  • Massive parallel processing
  • Compression
  • Optimised for scans

These databases answer questions like:

“What happened over the last 5 years?”

They are not built for:

  • Millisecond transactional APIs
  • Row-by-row updates
  • Real-time CRUD systems

OLTP and OLAP are different animals.


Document Databases (NoSQL)

Use when: Your schema changes frequently or is naturally JSON-based.

Best for:

  • Content management
  • User profiles
  • Product catalogues
  • Flexible metadata
  • Rapid prototyping

Why they exist:

  • Schema flexibility
  • JSON-native storage
  • Horizontal scaling

They’re ideal when structure is fluid.

They’re not ideal for:

  • Complex joins
  • Strict financial integrity
  • Deep relational logic

If your system depends heavily on relationships — this is not your tool.


Key–Value Stores

Use when: You need speed. Extreme speed.

Best for:

  • Caching
  • Session storage
  • Rate limiting
  • Feature flags
  • Real-time counters

Why they exist:

  • In-memory performance
  • Simple key lookups
  • Minimal overhead

They are infrastructure accelerators.

They are not:

  • Analytical engines
  • Systems of record
  • Relationship databases

If you try to turn a cache into a database, you’ll regret it.


Wide Column / BigTable Style Databases

Use when: You need massive distributed writes at scale.

Best for:

  • IoT ingestion
  • Event logs
  • High-write workloads
  • Global distributed systems

Why they exist:

  • Horizontal partitioning
  • Fault tolerance
  • Distributed-first design

These shine in scale-out systems.

They are painful for:

  • Complex joins
  • Ad-hoc analytics
  • Strict multi-row ACID transactions

They trade relational elegance for scale.


Graph Databases

Use when: The question is about relationships.

Best for:

  • Fraud detection
  • Access control modelling
  • Dependency mapping
  • Network topology
  • Recommendation engines

Why they exist:

  • Relationship traversal is native
  • Multi-hop queries are efficient
  • Real-world connections map naturally

They answer:

“How is this connected to that?”

They are not built for:

  • Heavy aggregation
  • Time-series ingestion
  • Simple CRUD applications

Use them when relationships are the product.


Time-Series Databases

Use when: Time is the dominant dimension.

Best for:

  • Sensor data
  • Telemetry
  • SCADA
  • Financial tick data
  • Monitoring metrics

Why they exist:

  • Optimised time partitioning
  • Fast window functions
  • Compression for historical data

They answer:

“What changed over time?”

They are not for:

  • Deep graph traversal
  • CMS content
  • Text search

Time-series is about trends and signals.


Search Index Databases

Use when: You need to find things quickly — especially text.

Best for:

  • Log search
  • Observability
  • Full-text search
  • Fuzzy matching
  • Autocomplete

Why they exist:

  • Inverted indexes
  • Relevance scoring
  • Fast retrieval

They are search engines, not truth stores.

If your only copy of business data lives in a search engine, that’s an architectural smell.


Distributed SQL

Use when: You need global distribution with SQL semantics.

Best for:

  • Multi-region SaaS
  • Always-on systems
  • Global financial applications
  • Horizontally scaled OLTP

Why they exist:

  • ACID + distributed consensus
  • Automatic replication
  • Fault tolerance across regions

They solve a specific problem:

“How do we keep strong consistency without a single primary?”

They are overkill for:

  • Single-region systems
  • Small workloads
  • Simple applications

Distributed systems always come with complexity.


Embedded Databases

Use when: You need a database inside an application.

Best for:

  • Mobile apps
  • Edge devices
  • Local applications
  • Single-file analytics

Why they exist:

  • No server required
  • Lightweight
  • Portable

They are not enterprise multi-user systems.

They are quiet workhorses.


The Architect’s Question

Before choosing a database, ask:

  1. Is this transactional?
  2. Is this analytical?
  3. Is time the dominant dimension?
  4. Are relationships central?
  5. Is search critical?
  6. Does it need global distribution?
  7. Is this just caching?
  8. Is this embedded?

Most architectural mistakes happen when:

Teams choose the database they know Instead of the database the workload needs.


Final Thought

Modern systems rarely use one database.

They use several:

  • Relational for truth
  • Time-series for metrics
  • Search for retrieval
  • Warehouse for analytics
  • Cache for speed

Understanding how they fit together is what separates a developer from an architect.


CHEAT SHEET: A list of databases in each section

Relational Databases (SQL / OLTP)

Classic row-store, ACID, schema-based databases.

Microsoft SQL Server Oracle Database PostgreSQL MySQL MariaDB IBM Db2 SQLite Firebird Ingres SAP HANA Amazon Aurora CockroachDB YugabyteDB TiDB

Analytical / Data Warehouse Databases (OLAP)

Columnar, massively parallel, analytics-focused.

Snowflake

Amazon Redshift Google BigQuery Azure Synapse Analytics Teradata Vertica Greenplum ClickHouse Apache Doris Apache Druid

NoSQL – Document Databases

Schema-flexible JSON-like storage.

MongoDB CouchDB Couchbase Amazon DocumentDB Azure Cosmos DB RavenDB ArangoDB

NoSQL – Key–Value Stores

Ultra-fast lookups, caching, session state.

Redis Memcached Amazon DynamoDB Riak Aerospike Etcd

NoSQL – Wide Column / Big Table

Designed for massive scale. Apache Cassandra Apache HBase Google Cloud Bigtable ScyllaDB

#️ Graph Databases

Relationships first, not rows.

Neo4j Amazon Neptune JanusGraph TigerGraph ArangoDB

#️ Time-Series Databases (your wheelhouse)

Optimised for metrics, sensors, SCADA, telemetry.

TimescaleDB InfluxDB Prometheus QuestDB OpenTSDB VictoriaMetrics

Search / Index Databases

Optimised for text, logs, observability.

Elasticsearch OpenSearch Apache Solr Meilisearch Typesense

Embedded / Local / Edge Databases

Used in apps, devices, and IoT.

SQLite LevelDB RocksDB DuckDB Realm

Multi-Model Databases

More than one paradigm in one engine.

Azure Cosmos DB ArangoDB OrientDB MarkLogic

#️ Legacy / Historical (still encountered)

MS Access Sybase ASE dBase Paradox Informix

Gareth Winterman