Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.databunker.org/llms.txt

Use this file to discover all available pages before exploring further.

This page documents measured Databunker Pro performance on a representative workload — a 120-field user-profile vault sized at 5 million records. It is intended as a starting point for capacity planning, not as a marketing benchmark; the numbers are produced from an internal performance run and the methodology is documented so you can reproduce or refine against your own payloads.

Benchmark setup

ParameterValue
WorkloadUserCreate (PII tokenisation) — full user profiles
Profile shape120 fields per profile (mixed PII + custom application fields)
Record count5,000,000
HostAWS EC2 m6i.2xlarge (8 vCPU Intel Ice Lake, 32 GB RAM)
Backend databasePostgreSQL, co-located on the same EC2 host
Databunker ProSingle instance, stateless application server (same host)
Load generatorPython scripts from the databunkerpro-python repo — bulk_user_creator.py for writes and bulk_user_fetcher.py for reads — running locally on the same host (no client-side network bottleneck)
Write modeBulk (UserCreateBulk) at 4,000 records per request
IndexingDefault hashed indexes on email, phone, login, custom
EncryptionAES-256 at rest, TLS in transit (defaults)
The numbers below were produced on a modest single EC2 instance with the database co-located. They are not the upper bound — separating the database onto a dedicated managed PostgreSQL service (e.g., Aurora) and scaling the application tier horizontally produces materially higher throughput.

Measured throughput and latency

MetricValue
Sustained bulk write throughput~1,700 records/sec
Per-record amortised in a bulk~0.6 ms
Per-bulk request latency~2.4 seconds for a 4,000-record bulk
Time to write 1,000,000 records~10 minutes
Time to write 5,000,000 records~50 minutes — same per-million rate as 1 M (linear scaling)
Single-record sync write3–10 ms range over a real network
Detokenisation (UserGet)Hashed-index point lookup; faster than write (precise number tracked in follow-up benchmark)

Storage footprint

MetricValue
Total database size (5 M)27 GB
Heap table (users)832 MB
Indexes655 MB
Encrypted record TOAST~25.5 GB (the bulk of the footprint — encrypted profile blobs)
Per-record on disk~5.4 KB per fully-encrypted 120-field profile, including indexes

What the numbers mean

For capacity planning

  • ~1,700 records/sec sustained on one instance is more than enough for almost any single-organisation steady-state workload. A typical large enterprise serving 100,000 users only needs to ingest its full user base once in under a minute.
  • Linear scaling 1 M → 5 M means the vault doesn’t slow down as it grows. Plan storage; don’t plan throughput penalties.
  • 5.4 KB per encrypted 120-field record — multiply by your expected record count to size your PostgreSQL volume. A 10-million-record deployment needs roughly 54 GB plus operational headroom.

Bulk vs single-record latency

The headline ~0.6 ms per record is amortised inside a 4,000-record bulk request. It includes network, auth, encryption, hashing, indexing, and transaction commit averaged across the batch. Single-record synchronous writes will be higher — 3–10 ms is typical over a real network — because each call carries its own connection, auth, and transaction overhead. Plan accordingly:
  • Backfills and batch flows: use UserCreateBulk and assume ~0.6 ms per record.
  • Real-time API flows: assume 3–10 ms per single-record write. Throughput in production is then (concurrency × 1 / per-call latency) — i.e., 100 concurrent worker threads writing at 5 ms apiece gives 20,000 records/sec.

Why no detokenisation number yet?

Detokenisation (UserGet) is a hashed-index point lookup, which is structurally faster than a write. Most internal runs put it well below the per-record write latency. We are documenting a precise number in a follow-up benchmark and will publish it here.

When to scale horizontally

The application server is stateless, so horizontal scaling on Kubernetes is straightforward. Add instances when:
  • The PostgreSQL backend is healthy and CPU-headroom on the application pod is exhausted (vault is CPU-bound on encryption + indexing under heavy bulk load).
  • You need geographic locality — see Multi-jurisdiction deployment.
  • You want isolation between workload classes (e.g., dedicate one instance to batch jobs, another to real-time API).
The backend database is usually the eventual bottleneck before the application server. Use AWS Aurora PostgreSQL Auto-Scaling (or equivalent on Azure / GCP) for production deployments — this lifts the typical wall-clock bottleneck.

Running your own benchmark

The numbers above are a starting point. Real workloads vary by:
  • Payload size — a 12-field profile tokenises faster than a 120-field profile; a 1 KB profile differs from a 10 KB profile.
  • Search-index field count — each hashed index adds a small per-record cost.
  • Access-control policy depth — CRBAC evaluation on reads scales with policy complexity.
  • Database backend choice and sizing — PostgreSQL vs MySQL, instance class, IOPS, network latency to the DB.
To reproduce or refine for your context, start from the open-source Python load scripts we used:
  • bulk_user_creator.py — drives UserCreateBulk to generate and tokenise a configurable number of user profiles. This is the script that produced the bulk-write numbers above.
  • bulk_user_fetcher.py — drives UserGet to retrieve previously-tokenised records, for measuring detokenisation throughput and latency.
Both scripts are in the official databunkerpro-python SDK repo and are designed to be edited — adjust the profile shape, batch size, and concurrency to match your workload.

Steps

  1. Adjust the profile shape in bulk_user_creator.py to match the actual JSON your application sends.
  2. Pre-warm the backend database (vacuum, statistics).
  3. Run bulk_user_creator.py co-located with the Databunker Pro instance to remove client-side network noise.
  4. Measure: sustained records/sec, p50 / p95 / p99 per-bulk latency, database CPU and IOPS.
  5. Run bulk_user_fetcher.py against the same vault to capture UserGet performance.
Contact us through office@databunker.org if you would like the professional services team to design and run a benchmark against your representative payloads as part of an engagement.

What we’ll add next

  • Precise detokenisation latency numbers (UserGet p50 / p95 / p99).
  • Multi-instance horizontal-scaling curve (records/sec per added pod).
  • Format-preserving tokenisation throughput (TokenCreate / TokenCreateBulk).
  • MySQL backend comparison.
  • Cold-cache vs warm-cache UserGet profile.