Skip to main content

ADR-002: UUID v4 as Primary Keys for All Entities

Date: 2026-02-19 Status: Accepted

Context

All 57 entities in the Curriculab data model require a primary key strategy. The choice affects:

  • Security — sequential integer PKs expose row counts and allow enumeration attacks against REST API endpoints (e.g., GET /enrollments/1234).
  • API design — PKs appear directly in URLs and response bodies; they must be safe to expose to all client roles including students.
  • Application-layer ID generation — generating an ID before the INSERT simplifies certain patterns (audit log references, idempotency keys, client-side optimistic UI).
  • Distributed compatibility — even if Curriculab runs on a single PostgreSQL instance today, the schema should not assume that IDs are generated by a single sequence.
  • Cross-entity FK chains — the Student table uses a shared PK with User (see ADR-003); UUIDs make this pattern unambiguous.

Decision

All primary keys use UUID version 4 (randomly generated, 128-bit). PostgreSQL's gen_random_uuid() function (available without extensions from PG 13+) is the recommended generation method. Application-layer generation (e.g., via the uuid package in Node.js or Python's uuid.uuid4()) is also acceptable when an ID must be known before the INSERT.

Consequences

Positive

  • Non-guessable: exposure of an ID in a URL or log line does not allow enumeration of adjacent records.
  • Globally unique without coordination: IDs can be generated in the application, a queue, or a migration script without risk of collision.
  • No information leakage: an ID does not reveal the creation order or total row count.
  • Consistent: every FK across all 57 entities has the same type (uuid), simplifying ORMs, query builders, and generated client SDKs.

Negative

  • Storage: a UUID is 16 bytes (128 bits) vs. 4–8 bytes for a 32/64-bit integer PK. At scale this affects index and row sizes.
  • Index fragmentation: random UUID v4 values insert at random B-tree positions, causing index page splits and bloat over time. This can be mitigated with pg_repack or periodic VACUUM FULL.
  • Readability: UUIDs are harder to reference in logs and support conversations than short integers.

Neutral

  • The PostgreSQL uuid column type stores values in 16 bytes (binary), not as the 36-character hyphenated string. Wire protocol and ORM mapping handle the conversion transparently.

Alternatives Considered

Auto-incrementing integer (SERIAL / BIGSERIAL) Simple and storage-efficient. However, sequential IDs are enumerable through the API, which creates an IDOR (Insecure Direct Object Reference) risk unless every endpoint enforces authorization checks. Even with authorization, leaking row count via max-ID is undesirable. Rejected in favour of UUID v4.

UUID v7 (time-ordered) UUID v7 embeds a millisecond-precision timestamp in the most-significant bits, making values roughly sortable and reducing B-tree fragmentation significantly compared to v4. This is the most compelling alternative. Rejected for the initial schema because PostgreSQL does not include a native gen_uuid_v7() function in v15 (requires the pg_uuidv7 extension or application-layer generation), and UUID v7 library support across languages is less uniform than v4. This decision should be revisited as UUID v7 support matures; migrating from v4 to v7 requires only a schema and generation change, not a data-model restructure.

ULID (Universally Unique Lexicographically Sortable Identifier) Sortable and URL-friendly. Less standardised than UUID; adds a non-standard library dependency in every repo. Rejected in favour of the more widely supported UUID standard.

Composite natural keys Several entities have strong natural uniqueness constraints (e.g., Enrollment(student_id, section_id)). Using these as PKs reduces one level of indirection but complicates FK chains (every referencing table must carry all columns of the composite key) and API design (URLs become multi-segment). These constraints are retained as UNIQUE constraints alongside a surrogate UUID PK.