This page documents measured Databunker Pro performance on a representative workload — a 120-field user-profile vault sized at 5 million records. It is intended as a starting point for capacity planning, not as a marketing benchmark; the numbers are produced from an internal performance run and the methodology is documented so you can reproduce or refine against your own payloads.Documentation Index
Fetch the complete documentation index at: https://docs.databunker.org/llms.txt
Use this file to discover all available pages before exploring further.
Benchmark setup
| Parameter | Value |
|---|---|
| Workload | UserCreate (PII tokenisation) — full user profiles |
| Profile shape | 120 fields per profile (mixed PII + custom application fields) |
| Record count | 5,000,000 |
| Host | AWS EC2 m6i.2xlarge (8 vCPU Intel Ice Lake, 32 GB RAM) |
| Backend database | PostgreSQL, co-located on the same EC2 host |
| Databunker Pro | Single instance, stateless application server (same host) |
| Load generator | Python scripts from the databunkerpro-python repo — bulk_user_creator.py for writes and bulk_user_fetcher.py for reads — running locally on the same host (no client-side network bottleneck) |
| Write mode | Bulk (UserCreateBulk) at 4,000 records per request |
| Indexing | Default hashed indexes on email, phone, login, custom |
| Encryption | AES-256 at rest, TLS in transit (defaults) |
The numbers below were produced on a modest single EC2 instance with the database co-located. They are not the upper bound — separating the database onto a dedicated managed PostgreSQL service (e.g., Aurora) and scaling the application tier horizontally produces materially higher throughput.
Measured throughput and latency
| Metric | Value |
|---|---|
| Sustained bulk write throughput | ~1,700 records/sec |
| Per-record amortised in a bulk | ~0.6 ms |
| Per-bulk request latency | ~2.4 seconds for a 4,000-record bulk |
| Time to write 1,000,000 records | ~10 minutes |
| Time to write 5,000,000 records | ~50 minutes — same per-million rate as 1 M (linear scaling) |
| Single-record sync write | 3–10 ms range over a real network |
Detokenisation (UserGet) | Hashed-index point lookup; faster than write (precise number tracked in follow-up benchmark) |
Storage footprint
| Metric | Value |
|---|---|
| Total database size (5 M) | 27 GB |
Heap table (users) | 832 MB |
| Indexes | 655 MB |
| Encrypted record TOAST | ~25.5 GB (the bulk of the footprint — encrypted profile blobs) |
| Per-record on disk | ~5.4 KB per fully-encrypted 120-field profile, including indexes |
What the numbers mean
For capacity planning
- ~1,700 records/sec sustained on one instance is more than enough for almost any single-organisation steady-state workload. A typical large enterprise serving 100,000 users only needs to ingest its full user base once in under a minute.
- Linear scaling 1 M → 5 M means the vault doesn’t slow down as it grows. Plan storage; don’t plan throughput penalties.
- 5.4 KB per encrypted 120-field record — multiply by your expected record count to size your PostgreSQL volume. A 10-million-record deployment needs roughly 54 GB plus operational headroom.
Bulk vs single-record latency
The headline ~0.6 ms per record is amortised inside a 4,000-record bulk request. It includes network, auth, encryption, hashing, indexing, and transaction commit averaged across the batch. Single-record synchronous writes will be higher — 3–10 ms is typical over a real network — because each call carries its own connection, auth, and transaction overhead. Plan accordingly:- Backfills and batch flows: use
UserCreateBulkand assume ~0.6 ms per record. - Real-time API flows: assume 3–10 ms per single-record write. Throughput in production is then
(concurrency × 1 / per-call latency)— i.e., 100 concurrent worker threads writing at 5 ms apiece gives 20,000 records/sec.
Why no detokenisation number yet?
Detokenisation (UserGet) is a hashed-index point lookup, which is structurally faster than a write. Most internal runs put it well below the per-record write latency. We are documenting a precise number in a follow-up benchmark and will publish it here.
When to scale horizontally
The application server is stateless, so horizontal scaling on Kubernetes is straightforward. Add instances when:- The PostgreSQL backend is healthy and CPU-headroom on the application pod is exhausted (vault is CPU-bound on encryption + indexing under heavy bulk load).
- You need geographic locality — see Multi-jurisdiction deployment.
- You want isolation between workload classes (e.g., dedicate one instance to batch jobs, another to real-time API).
Running your own benchmark
The numbers above are a starting point. Real workloads vary by:- Payload size — a 12-field profile tokenises faster than a 120-field profile; a 1 KB profile differs from a 10 KB profile.
- Search-index field count — each hashed index adds a small per-record cost.
- Access-control policy depth — CRBAC evaluation on reads scales with policy complexity.
- Database backend choice and sizing — PostgreSQL vs MySQL, instance class, IOPS, network latency to the DB.
bulk_user_creator.py— drivesUserCreateBulkto generate and tokenise a configurable number of user profiles. This is the script that produced the bulk-write numbers above.bulk_user_fetcher.py— drivesUserGetto retrieve previously-tokenised records, for measuring detokenisation throughput and latency.
databunkerpro-python SDK repo and are designed to be edited — adjust the profile shape, batch size, and concurrency to match your workload.
Steps
- Adjust the profile shape in
bulk_user_creator.pyto match the actual JSON your application sends. - Pre-warm the backend database (vacuum, statistics).
- Run
bulk_user_creator.pyco-located with the Databunker Pro instance to remove client-side network noise. - Measure: sustained records/sec, p50 / p95 / p99 per-bulk latency, database CPU and IOPS.
- Run
bulk_user_fetcher.pyagainst the same vault to captureUserGetperformance.
What we’ll add next
- Precise detokenisation latency numbers (
UserGetp50 / p95 / p99). - Multi-instance horizontal-scaling curve (records/sec per added pod).
- Format-preserving tokenisation throughput (
TokenCreate/TokenCreateBulk). - MySQL backend comparison.
- Cold-cache vs warm-cache
UserGetprofile.