NV Trends Logo

PgDog: The Future of Horizontal PostgreSQL Sharding

Discover how PgDog's $5.5M funding is set to revolutionize PostgreSQL scaling with Rust-based horizontal sharding and high-performance connection pooling.

NV Trends avatar
  • NV Trends
  • 9 min read

For over three decades, PostgreSQL has been the bedrock of modern software development. From small side projects to massive enterprise systems, it has earned its reputation as the most reliable, extensible, and feature-rich relational database in the world. However, even the “world’s most advanced open-source database” has a ceiling. When a company experiences explosive growth—think of a fast-growing Indian fintech startup or a quick-commerce giant like Zepto—they eventually hit the “Postgres Wall.” This is the point where a single database node can no longer handle the sheer volume of write traffic or the terabytes of data being poured into it.

Traditionally, hitting this wall meant one of two painful choices: either you embarked on a multi-month engineering effort to manually shard your database at the application level, or you abandoned Postgres entirely in favor of NoSQL solutions like Amazon DynamoDB or MongoDB. Neither path is ideal. NoSQL often requires sacrificing ACID compliance and the rich querying power of SQL, while manual sharding creates a maintenance nightmare for developers. But a new player has just entered the arena, promising to make these compromises a thing of the past. PgDog, a high-performance database proxy, has officially announced its $5.5 million seed funding round, and it’s coming to a database near you.

Led by Lev Kokotov—the engineer who helped scale Instacart through its massive pandemic-era surge—PgDog is designed to bring horizontal sharding to the Postgres ecosystem without the usual complexity. By sitting between your application and your database nodes, PgDog handles the dirty work of routing queries, managing connections, and distributing data across multiple servers. With backing from heavyweight investors like Basis Set and Y Combinator, PgDog isn’t just another open-source tool; it is a serious attempt to redefine how we scale the internet’s favorite database.

PgDog: The Future of Horizontal PostgreSQL Sharding

The PostgreSQL Scaling Dilemma

To understand why PgDog is generating so much buzz on platforms like Hacker News, we first need to look at the specific problem it solves. PostgreSQL is a “vertical scaling” champion. If your database is getting slow, you can usually fix it by “throwing more hardware at it”—adding more RAM, faster CPUs, or better NVMe storage. In the world of cloud computing, this might mean moving from a small instance to a massive one costing Rs. 50,000 or more per month.

But eventually, you run out of bigger boxes. When you reach millions of transactions per second or dozens of terabytes of data, even the largest AWS or Azure instances can’t keep up. This is particularly relevant in the Indian context, where the sheer scale of the user base can overwhelm systems in a matter of months. A successful UPI-based payment app or a national-scale e-commerce platform can quickly find its Postgres instance gasping for air.

The “traditional” fix is horizontal sharding: splitting your data across multiple database servers. However, Postgres does not support native horizontal sharding out of the box. You have to use extensions like Citus, or worse, write complex logic in your application code to decide which “shard” a specific user’s data lives on. This adds massive technical debt and makes every database migration a high-stakes gamble.

What is PgDog?

PgDog is an open-source network proxy written in Rust that aims to make horizontal sharding transparent. In simple terms, your application talks to PgDog as if it were a single PostgreSQL database. Behind the scenes, PgDog talks to dozens or even hundreds of Postgres nodes. It parses your SQL queries in real-time, identifies where the data lives, and routes the request to the correct shard.

The choice of Rust is not accidental. In the world of database proxies, every millisecond of latency counts. Traditional proxies like PgBouncer are written in C and are extremely fast but can be difficult to extend with complex logic like SQL parsing. Newer proxies written in Go often struggle with “garbage collection pauses” that can cause unpredictable spikes in latency. Rust provides the “bare-metal” performance of C with modern safety features, allowing PgDog to handle over 2 million queries per second with minimal overhead.

Key Features of PgDog:

  • Transparent Sharding: Distribute data across multiple servers without changing a single line of your application code.
  • Connection Pooling: Efficiently manages thousands of application connections, preventing your database from being overwhelmed by “connection churn.”
  • Query Routing: Automatically sends read-only queries to replicas and write queries to the primary shard.
  • Cross-Shard Operations: Supports atomic writes and even complex schema changes across multiple shards using two-phase commits.
  • Query Rewriting: Can block or optimize “dangerous” queries (like a SELECT * without a WHERE clause) before they ever reach the database.

The Instacart Connection: Built in the Trenches

The most compelling part of the PgDog story is its pedigree. The founder, Lev Kokotov, didn’t build this in a vacuum. He was one of the lead engineers at Instacart during 2020, when the grocery delivery service saw years of projected growth happen in just a few weeks due to global lockdowns.

During that time, Instacart’s massive PostgreSQL clusters were under immense strain. Kokotov first built PgCat, an open-source pooler and proxy that helped Instacart survive that traffic. PgCat was a success, but it was primarily focused on connection pooling and basic load balancing. PgDog is the spiritual and technical successor to PgCat, built with the specific goal of solving the sharding problem once and for all.

While PgCat was a tool built to solve a specific company’s problem, PgDog is being built as a platform for the entire industry. The transition from a “side project” to a $5.5M-funded startup indicates that the team is ready to provide the enterprise-grade support and stability that large-scale Indian enterprises require.

Why Sharding Matters for Indian Startups

For a startup based in Bangalore or Gurgaon, the economics of database scaling are a constant concern. Cloud costs are often the second-largest expense after payroll. When you use “managed” NoSQL services like DynamoDB, you are often locked into a proprietary ecosystem with unpredictable costs. A single unoptimized query can lead to a surprise bill of thousands of dollars (lakhs of rupees) at the end of the month.

By contrast, PostgreSQL is open-source. You can run it on cheap “commodity” hardware, on-premise servers, or any cloud provider. PgDog allows Indian companies to keep the flexibility and cost-efficiency of Postgres while scaling to levels that were previously only possible for “Big Tech” firms.

The Cost Factor (A Practical Example)

Imagine a logistics startup handling 500,000 deliveries a day.

  • Scenario A: They use a massive, vertically scaled Postgres instance on AWS RDS. The monthly bill might be Rs. 1,50,000, and they are already at 80% CPU utilization.
  • Scenario B: They move to a proprietary NoSQL database. Their monthly bill becomes unpredictable, potentially spiking to Rs. 4,00,000 during peak festival sales (like Diwali or Big Billion Days).
  • Scenario C: They use PgDog to shard their data across five smaller Postgres instances. The total cost of the hardware might only be Rs. 80,000, and they have enough “headroom” to grow their traffic by 5x without hitting a limit.

This “horizontal” approach is not just about raw performance; it’s about economic sustainability.

Technical Deep Dive: How PgDog Handles the Load

One of the biggest criticisms of proxy-level sharding is the “latency tax.” If every query has to be parsed by a proxy before reaching the database, doesn’t that make everything slower?

PgDog minimizes this through several clever engineering choices. First, its SQL parser is highly optimized. It doesn’t need to understand every single nuance of your query; it only needs to find the “sharding key” (e.g., user_id or org_id). Once it finds that key, it looks up the destination shard in a high-speed internal map and forwards the raw bytes to the database.

Handling “Cross-Shard” Queries

The “holy grail” of sharding is handling queries that need data from multiple shards at once. For example, “Show me the total sales across all regions.” Most sharding solutions simply fail or return an error for these types of queries.

PgDog takes a different approach. It can send the query to all relevant shards in parallel, collect the results, and merge them in its own memory before sending the final answer back to the user. While this is naturally slower than a single-node query, it ensures that your application remains functional even as your data is fragmented.

High Availability and Failover

In a sharded environment, if one database node goes down, you don’t want your whole application to crash. PgDog monitors the health of every Postgres node in real-time. If a primary node fails, PgDog can automatically promote a replica to take its place and update its internal routing tables in milliseconds. To the application, it looks like a brief flicker in performance rather than a catastrophic outage.

The Roadmap: $5.5 Million and Beyond

The recent seed round led by Basis Set is a clear signal that the market is hungry for a Postgres-native scaling solution. The funding will be used to grow the core engineering team and build out an “Enterprise Edition” of PgDog.

While the core proxy will remain open-source, the company plans to offer features specifically for the cloud era. This includes deep integration with AWS (Amazon Web Services), providing a “managed” experience that rivals Aurora or DynamoDB but with the freedom of Postgres. For large Indian enterprises that are hesitant to move away from SQL, this “Enterprise PgDog” could provide the SLA-backed security they need to migrate their most critical workloads.

The roadmap also includes:

  1. Automated Resharding: The ability to add new database nodes and move data between them while the system is live.
  2. Advanced Observability: A dashboard to see exactly which shards are under the most pressure and which queries are causing bottlenecks.
  3. Security Layers: Built-in role-based access control (RBAC) and encryption that works across all shards.

Conclusion

The funding of PgDog marks a pivotal moment in the “Postgres vs. NoSQL” war. For years, the conventional wisdom was that if you wanted to go “big,” you had to leave SQL behind. PgDog is proving that you can have your cake and eat it too: the reliability and power of PostgreSQL with the infinite scalability of a distributed system.

For developers and tech leaders in India, this is a development worth watching closely. As our digital economy continues to explode, the ability to scale efficiently without breaking the bank—or the dev team’s spirit—will be the ultimate competitive advantage. Whether you are building the next big fintech app or scaling a local delivery service, PgDog might just be the tool that keeps your database from becoming your biggest bottleneck.

The “Postgres Wall” has officially been breached. It’s time to start thinking about what your application could do with two million queries per second.

NV Trends

Written by : NV Trends

NV Trends shares concise, easy-to-read insights on tech, lifestyle, finance, and the latest trends.

Recommended for You