Performance & benchmarks

This page documents measured Databunker Pro performance on a representative workload — a 120-field user-profile vault sized at 5 million records. It is intended as a starting point for capacity planning, not as a marketing benchmark; the numbers are produced from an internal performance run and the methodology is documented so you can reproduce or refine against your own payloads.

Benchmark setup

Parameter	Value
Workload	`UserCreate` (PII tokenisation) — full user profiles
Profile shape	120 fields per profile (mixed PII + custom application fields)
Record count	5,000,000
Host	AWS EC2 `m6i.2xlarge` (8 vCPU Intel Ice Lake, 32 GB RAM)
Backend database	PostgreSQL, co-located on the same EC2 host
Databunker Pro	Single instance, stateless application server (same host)
Load generator	Python scripts from the `databunkerpro-python` repo — `bulk_user_creator.py` for writes and `bulk_user_fetcher.py` for reads — running locally on the same host (no client-side network bottleneck)
Write mode	Bulk (`UserCreateBulk`) at 4,000 records per request
Indexing	Default hashed indexes on `email`, `phone`, `login`, `custom`
Encryption	AES-256 at rest, TLS in transit (defaults)

The numbers below were produced on a modest single EC2 instance with the database co-located. They are not the upper bound — separating the database onto a dedicated managed PostgreSQL service (e.g., Aurora) and scaling the application tier horizontally produces materially higher throughput.

Measured throughput and latency

Metric	Value
Sustained bulk write throughput	~1,700 records/sec
Per-record amortised in a bulk	~0.6 ms
Per-bulk request latency	~2.4 seconds for a 4,000-record bulk
Time to write 1,000,000 records	~10 minutes
Time to write 5,000,000 records	~50 minutes — same per-million rate as 1 M (linear scaling)
Single-record sync write	3–10 ms range over a real network
Detokenisation (`UserGet`)	Hashed-index point lookup; faster than write (precise number tracked in follow-up benchmark)

Storage footprint

Metric	Value
Total database size (5 M)	27 GB
Heap table (`users`)	832 MB
Indexes	655 MB
Encrypted record TOAST	~25.5 GB (the bulk of the footprint — encrypted profile blobs)
Per-record on disk	~5.4 KB per fully-encrypted 120-field profile, including indexes

What the numbers mean

For capacity planning

~1,700 records/sec sustained on one instance is more than enough for almost any single-organisation steady-state workload. A typical large enterprise serving 100,000 users only needs to ingest its full user base once in under a minute.
Linear scaling 1 M → 5 M means the vault doesn’t slow down as it grows. Plan storage; don’t plan throughput penalties.
5.4 KB per encrypted 120-field record — multiply by your expected record count to size your PostgreSQL volume. A 10-million-record deployment needs roughly 54 GB plus operational headroom.

Bulk vs single-record latency

The headline ~0.6 ms per record is amortised inside a 4,000-record bulk request. It includes network, auth, encryption, hashing, indexing, and transaction commit averaged across the batch. Single-record synchronous writes will be higher — 3–10 ms is typical over a real network — because each call carries its own connection, auth, and transaction overhead. Plan accordingly:

Backfills and batch flows: use UserCreateBulk and assume ~0.6 ms per record.
Real-time API flows: assume 3–10 ms per single-record write. Throughput in production is then (concurrency × 1 / per-call latency) — i.e., 100 concurrent worker threads writing at 5 ms apiece gives 20,000 records/sec.

Why no detokenisation number yet?

Detokenisation (UserGet) is a hashed-index point lookup, which is structurally faster than a write. Most internal runs put it well below the per-record write latency. We are documenting a precise number in a follow-up benchmark and will publish it here.

When to scale horizontally

The application server is stateless, so horizontal scaling on Kubernetes is straightforward. Add instances when:

The PostgreSQL backend is healthy and CPU-headroom on the application pod is exhausted (vault is CPU-bound on encryption + indexing under heavy bulk load).
You need geographic locality — see Multi-jurisdiction deployment.
You want isolation between workload classes (e.g., dedicate one instance to batch jobs, another to real-time API).

The backend database is usually the eventual bottleneck before the application server. Use AWS Aurora PostgreSQL Auto-Scaling (or equivalent on Azure / GCP) for production deployments — this lifts the typical wall-clock bottleneck.

Running your own benchmark

The numbers above are a starting point. Real workloads vary by:

Payload size — a 12-field profile tokenises faster than a 120-field profile; a 1 KB profile differs from a 10 KB profile.
Search-index field count — each hashed index adds a small per-record cost.
Access-control policy depth — CRBAC evaluation on reads scales with policy complexity.
Database backend choice and sizing — PostgreSQL vs MySQL, instance class, IOPS, network latency to the DB.

To reproduce or refine for your context, start from the open-source Python load scripts we used:

bulk_user_creator.py — drives UserCreateBulk to generate and tokenise a configurable number of user profiles. This is the script that produced the bulk-write numbers above.
bulk_user_fetcher.py — drives UserGet to retrieve previously-tokenised records, for measuring detokenisation throughput and latency.

Both scripts are in the official databunkerpro-python SDK repo and are designed to be edited — adjust the profile shape, batch size, and concurrency to match your workload.

Steps

Adjust the profile shape in bulk_user_creator.py to match the actual JSON your application sends.
Pre-warm the backend database (vacuum, statistics).
Run bulk_user_creator.py co-located with the Databunker Pro instance to remove client-side network noise.
Measure: sustained records/sec, p50 / p95 / p99 per-bulk latency, database CPU and IOPS.
Run bulk_user_fetcher.py against the same vault to capture UserGet performance.

Contact us through office@databunker.org if you would like the professional services team to design and run a benchmark against your representative payloads as part of an engagement.

What we’ll add next

Precise detokenisation latency numbers (UserGet p50 / p95 / p99).
Multi-instance horizontal-scaling curve (records/sec per added pod).
Format-preserving tokenisation throughput (TokenCreate / TokenCreateBulk).
MySQL backend comparison.
Cold-cache vs warm-cache UserGet profile.

Get started

Installation

Administration

Concepts

Comparisons

Developer tools

Performance & benchmarks

Benchmark setup

Measured throughput and latency

Storage footprint

What the numbers mean

For capacity planning

Bulk vs single-record latency

Why no detokenisation number yet?

When to scale horizontally

Running your own benchmark

Steps

What we’ll add next

Get started

Installation

Administration

Concepts

Comparisons

Developer tools

Documentation Index

​Benchmark setup

​Measured throughput and latency

​Storage footprint

​What the numbers mean

​For capacity planning

​Bulk vs single-record latency

​Why no detokenisation number yet?

​When to scale horizontally

​Running your own benchmark

​Steps

​What we’ll add next

Benchmark setup

Measured throughput and latency

Storage footprint

What the numbers mean

For capacity planning

Bulk vs single-record latency

Why no detokenisation number yet?

When to scale horizontally

Running your own benchmark

Steps

What we’ll add next