Skip to content
>GLB_
Go back

Why Small Tables Can Explode: Understanding JOIN Cardinality in SQL

It is common to assume that joining two small tables will produce a small result set. In practice, this assumption frequently fails. Even tables with only a handful of rows can generate unexpectedly large outputs due to cardinality effects.

This article explains why that happens, using a minimal reproducible example.


The Setup

Consider two tables:

Table A

A
A
B
C

Table B

A
A
C
B
NULL

Both tables are small. At first glance, one might expect a join to return approximately four rows. That intuition is incorrect.

SELECT *
FROM A
INNER JOIN B
  ON A.col = B.col;

What Happens?

Therefore:

Additionally:

Final Row Count

ValueMatches
A4
B1
C1
Total6 rows

Two small tables. Six rows in the result. This is not an anomaly — it is expected relational algebra behavior.


Why This Happens

SQL JOINs are based on set theory and relational algebra, not row-by-row pairing.

When keys are not unique:

The formula is:

Rows produced per key = count_in_A × count_in_B

If a value appears:

You get: 3 × 5 = 15 rows

Even if both tables are small.


NULL Behavior

In SQL:

NULL = NULL

evaluates to UNKNOWN, not TRUE.

Therefore:

To match NULLs intentionally:

ON A.col IS NOT DISTINCT FROM B.col

(Supported in PostgreSQL and some engines.)


Diagnosing Cardinality Problems

Before joining, always inspect key uniqueness.

Check duplicates

SELECT col, COUNT(*)  FROM B  GROUP BY col  HAVING COUNT(*) > 1;

Determine relationship type

Only one-to-one joins preserve row counts.


Common Fixes

1. Deduplicate Before Joining

WITH b_dedup AS (    SELECT DISTINCT col    FROM B  )  SELECT *  FROM A  LEFT JOIN b_dedup    ON A.col = b_dedup.col;

Use when duplicates are irrelevant.


2. Select a Canonical Record

When duplicates are meaningful but you need one row:

WITH ranked AS (    SELECT *,           ROW_NUMBER() OVER (PARTITION BY col ORDER BY updated_at DESC) AS rn    FROM B  )  SELECT *  FROM A  LEFT JOIN ranked r    ON A.col = r.col   AND r.rn = 1;

This enforces deterministic selection.


3. Redesign the Data Model

If a many-to-many relationship exists, model it explicitly with:

Do not rely on accidental uniqueness.


Conceptual Takeaway

A JOIN does not “attach rows.”
It combines sets according to relational algebra rules.

Small input does not guarantee small output.
Cardinality drives row growth.

Understanding join multiplicity is fundamental in:


Share this post:

Previous Post
Why You Can’t Get Full Social Analytics from the HubSpot API (Even with Marketing Hub Pro)
Next Post
Resolving the Node.js Error: Cannot find module jsonwebtoken