Restoring data into PostgreSQL is often treated as a solved problem: run pg_dump, run pg_restore, move on. That holds for small databases or offline systems, but it breaks down on live systems with ongoing writes. I have seen large restores fail after hours of work and leave the database in a state that is hard to reason about. The failure mode is not a bug in pg_restore. It is a workflow problem.
The core takeaway is simple: pg_restore is a low-level primitive, not a safe restore strategy by itself. If you treat it as a full restore plan on a live system, you will eventually hit failure modes the tool cannot protect you from.
Why data-only restores fail midstream
The default pattern looks like this:
pg_restore --data-only -d production_db dumpfilePartway through execution, the restore gets interrupted. At that point, PostgreSQL cannot tell you which rows were inserted, which tables were partially populated, or whether rerunning the restore is safe. In data-only mode, pg_restore emits COPY commands and executes them in big transactions. There is no durable checkpoint at a row boundary, and no idempotent record of what was loaded.
If you rerun the restore, you might duplicate data, hit unique constraint failures half way through a table, or leave the system in a mixed state. The tool expects a clean run from start to finish. There is no resume mechanism because it is a bulk load, not a resumable protocol.
Live-system contention during COPY
Restoring directly into production tables also causes severe contention. Application inserts slow to seconds or minutes, and some writes appear to hang. No obvious blocking locks are visible. The root cause is not locking. It is resource saturation.
COPY generates heavy WAL and large transactions. That increases checkpoint pressure and disk I/O, which in turn slows down everything else that needs disk access. Even without a visible lock, the system is effectively starved. This is hard to diagnose in the moment because it looks like random latency.
Why this is a workflow problem
pg_restore is doing what it should: bulk load data efficiently. The mismatch is between the tool and the environment. A long COPY looks like a single massive write to the system. The live workload has to compete with it. If the restore fails, you lose time and you do not know how much data was loaded. If it succeeds, you still have to worry about side effects on production tables.
That is why I stopped asking “how do I make pg_restore safer” and started asking “why am I restoring into production at all?”
The staging-first principle
The staging-first approach makes restores safe by separating ingestion from cutover. It guarantees isolation, repeatability, sequence safety, and an explicit cutover. A failed restore is a staging problem, not a production incident. A successful restore is still not production until you explicitly promote it.
This one change shifts the risk profile from unknown to controlled.
Create a staging schema and tables
Start by creating a dedicated staging schema. I treat the schema boundary as non-negotiable:
CREATE SCHEMA restore_staging;It makes restored data easy to identify and cleanup obvious. If a restore fails, drop the schema and start over without touching production.
Next, create staging tables using the production table as a template:
CREATE UNLOGGED TABLE restore_staging.reading (
LIKE public.reading
INCLUDING DEFAULTS
INCLUDING GENERATED
INCLUDING IDENTITY
);UNLOGGED tables reduce WAL overhead during the restore. The trade-off is that they are wiped after a crash and are not replicated. That is acceptable during staging because the dump is still the source of truth. If staging is your only copy, do not use UNLOGGED.
Pitfalls I have seen
- Rerunning a data-only restore into production without cleaning up first.
- Assuming the absence of locks means the system is safe, while WAL pressure is starving live traffic.
- Restoring tables and then discovering sequences have been reset.
- Treating a failed restore as a repair task instead of redesigning the workflow.
Staging-first checklist
- Create a dedicated staging schema for all restored tables.
- Build staging tables with
UNLOGGEDwhen durability is not required during the restore. - Keep production tables unchanged until staging is fully restored and validated.
- Treat
pg_restoreas an ingestion primitive, not a complete restore strategy. - Plan an explicit cutover step you can execute and roll back.
Next step
Once staging is in place, pg_restore stops being the problem. It becomes a tool you can control. The next step is to make the restore stream deterministic and safe. In Part 2, I walk through TOC lists, sequence safety, and metadata-only cutovers. Continue with Deterministic PostgreSQL restores with TOC lists and zero-downtime cutover.

