PgDog: Connection Pooling, Read Splits, and Sharding

PgDog sits in front of unmodified Postgres as a pooler, read/write balancer, and sharder. Here are the sharp edges to map before it hits production traffic.

PgDog is the Postgres proxy that just raised $5.5M from Basis Set, Y Combinator, and Pioneer Fund. The funding post claims it already serves more than 2M queries per second across dozens of production deployments, with over 20 TB sharded. It is three things at once: a connection pooler, a read/write load balancer, and a database sharder, written in Rust on Tokio, sitting in front of unmodified Postgres so you scale out without switching engines or rewriting queries. That last part is the pitch and also the trap. Most teams reaching for PgDog do not need sharding yet. They need pooling and a read/write split, and they should stop there. These tips are for the platform engineer about to put PgDog in the hot path who wants the sharp edges mapped before traffic finds them. If you have run PgBouncer, most of this will feel familiar, and the places it does not are exactly where people get burned.

The tips

Land it as a pooler first, prove it for a week, then think about shards. PgDog listens on a PgBouncer-style port (6432 in the docs) and you adopt it by pointing DATABASE_URL at it. Nothing in the app changes. Ship it as a plain transaction pooler in front of your existing primary and let it bake before you touch sharding. If pooling alone kills your connection storm, you are done and you skipped the hardest part of the project.

postgres://app:pass@pgdog-host:6432/prod

Default to transaction pooling, and size the pool to cores, not clients. PgDog supports transaction and session pooling like PgBouncer. The pgbouncer-vs-pgdog comparison puts it ahead by about 10% on average and faster specifically once you pass 50 client connections, because Tokio multithreads where libevent does not. Set default_pool_size near your Postgres core count, not your client count. Trigger: if active server connections sit above roughly 2x CPU cores while clients wait, your pool is oversized and you are thrashing the backend, so cut it.
Prepare statements at the protocol level, never with SQL PREPARE. Transaction pooling historically broke prepared statements because consecutive transactions land on different server connections. PgBouncer 1.21 fixed this for protocol-level prepares by tracking and renaming them, but per the pganalyze writeup it still cannot intercept the SQL PREPARE text. The same rule applies here. Use your driver's protocol prepare (libpq PQprepare, or the equivalent in your client), not PREPARE foo AS ..., or your statements silently miss the cache under pooling and you eat the parse cost on every call.
Take the read/write split, but gate read-after-write paths explicitly. PgDog parses every query with the Postgres parser and routes INSERT, UPDATE, CREATE TABLE, and friends to the primary while sending SELECT to replicas. It tracks replication lag and pulls a lagging replica out of rotation, but it cannot know your read-after-write requirements. For any path that just wrote and must read its own data, keep it on the primary. For everything safe to serve slightly stale, mark it so the router is certain:

BEGIN READ ONLY;
SELECT ... ;
COMMIT;

Choose a sharding key that appears in the vast majority of your queries. Direct-to-shard routing only works when PgDog can extract the key from the query. Otherwise it broadcasts to every shard, collects the rows, and reassembles them in memory. That fan-out is fine for the occasional analytical query and miserable as your default path. Pick a key (tenant_id, customer_id) that rides on most of your WHERE clauses. Trigger: if more than 10% of your query volume runs without the key, fix the schema before you shard, not after, because retrofitting a key under load is its own outage.
When the parser cannot see the key, inject it out of band. Some queries genuinely lack the column: joins through a lookup table, or aggregates across tenants. Rather than eat a broadcast, PgDog lets you supply the shard key in a query comment or via a SET. Wire this into your data layer for the handful of known offenders so they route to one shard instead of all of them.

SET pgdog.sharding_key TO '42';

Load and migrate data with COPY sharding, not a hand-rolled splitter. PgDog ships a text, CSV, and binary COPY parser that splits incoming rows across shards by the sharding key automatically. Your bulk imports and backfills do not need client-side partitioning logic. You COPY into the proxy and it scatters the rows. This is the difference between a one-line ingest and a weekend writing a sharding script that you will get subtly wrong on the edge cases.
Reshard online with logical replication, and do not schedule a maintenance window for it. PgDog speaks the Postgres logical replication protocol and orchestrates splitting data between databases in the background without downtime. When a shard runs hot, add capacity and let it move ranges live rather than taking an outage. Verify on a staging copy first, because resharding is the operation where a wrong key definition costs you the most.
Turn on two-phase commit only for the cross-shard writes that need it, and watch for orphans. PgDog can run Postgres prepared transactions and execute the 2pc exchange on the client's behalf for atomic multi-shard writes, rolling back or committing on failure. It is not free: every such commit is two round trips per shard. Reserve it for writes that truly span shards, and monitor for stuck prepared transactions so a crash mid-protocol does not leave locks behind:

SELECT gid, prepared, database FROM pg_prepared_xacts;

Instrument it before cutover, and pin the version. PgDog exposes an OpenMetrics endpoint, a PgBouncer-style admin database, and OTEL push, so wire pooler metrics into your dashboards before it carries real traffic. Watch waiting clients and per-shard query distribution. One operational detail people miss: the funding post says a new version ships every Thursday. Pin a specific Docker tag and promote through staging on your cadence. Do not run latest in production and inherit a weekly surprise.

image: pgdogdev/pgdog:<pinned-tag>   # not :latest

Wrap-up

If you take one habit from this, make it the order of operations: pool first, split reads second, shard last and only when a single primary genuinely cannot hold your write volume. PgDog makes sharding feel close enough to touch that teams reach for it early, and the fan-out on a bad key will hurt more than the problem you started with. Prove each layer in front of vanilla Postgres, keep the sharding key on your hot path, and let logical replication do the dangerous parts so you never trade an outage for scale.

PgDog: Connection Pooling, Read Splits, and Sharding

The tips

Wrap-up

Sources

Comments

Leave a comment