PostgreSQL Performance: Indexing Strategies and Query Optimization
Introduction
PostgreSQL is one of the most capable relational databases available — but raw capability doesn't help if your queries time out under load. This guide covers the indexing strategies, query patterns, and diagnostic tools that separate a database that struggles at 1M rows from one that handles 1B rows without breaking a sweat.
EXPLAIN ANALYZE: Your Most Important Tool
Before optimising anything, read the query plan:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE u.created_at > NOW() - INTERVAL '30 days'
GROUP BY u.id, u.name
ORDER BY order_count DESC
LIMIT 20;
Key things to look for:
- Seq Scan on large tables — usually bad, needs an index
- Hash Join vs Nested Loop — hash is usually better for large sets
- Rows — actual vs estimated; huge differences mean stale statistics
- Buffers hit — cache hits vs disk reads
Index Types and When to Use Each
B-Tree (default) — equality and range queries
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_created_at ON orders(created_at DESC);
Composite — multiple columns, order matters
-- Covers: WHERE status = 'active' ORDER BY created_at DESC
CREATE INDEX idx_orders_status_created
ON orders(status, created_at DESC);
-- Rule: equality columns first, then range/sort columns
Partial — index only a subset of rows
-- Only index active orders — much smaller, faster
CREATE INDEX idx_orders_active
ON orders(user_id, created_at)
WHERE status = 'active';
-- Only index unprocessed jobs
CREATE INDEX idx_jobs_pending
ON jobs(priority DESC, created_at)
WHERE processed_at IS NULL;
GIN — JSONB and full-text search
CREATE INDEX idx_products_metadata
ON products USING GIN(metadata);
-- Now this is fast:
SELECT * FROM products WHERE metadata @> '{"category": "electronics"}';
BRIN — time-series data with natural ordering
-- Tiny index for append-only tables (logs, events)
CREATE INDEX idx_events_occurred_at
ON events USING BRIN(occurred_at);
Common Query Anti-Patterns
1. Function on indexed column (index skip)
-- Bad — index on email is skipped
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';
-- Good — use a functional index
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
2. LIKE with leading wildcard
-- Bad — can't use B-tree index
SELECT * FROM products WHERE name LIKE '%phone%';
-- Good — use full-text search
CREATE INDEX idx_products_fts ON products USING GIN(to_tsvector('english', name));
SELECT * FROM products WHERE to_tsvector('english', name) @@ to_tsquery('phone');
3. N+1 hidden in code
-- Instead of loading users then querying each user's orders — use a JOIN
SELECT u.id, u.name, COALESCE(order_counts.total, 0) AS total_orders
FROM users u
LEFT JOIN (
SELECT user_id, COUNT(*) AS total
FROM orders
GROUP BY user_id
) order_counts ON order_counts.user_id = u.id;
Materialized Views for Expensive Aggregations
-- Create a materialized view for the dashboard stats
CREATE MATERIALIZED VIEW monthly_revenue AS
SELECT
DATE_TRUNC('month', created_at) AS month,
SUM(total_amount) AS revenue,
COUNT(*) AS order_count,
AVG(total_amount) AS avg_order_value
FROM orders
WHERE status = 'completed'
GROUP BY DATE_TRUNC('month', created_at);
CREATE INDEX ON monthly_revenue(month DESC);
-- Refresh on a schedule (or after bulk inserts)
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_revenue;
CONCURRENTLY allows reads during the refresh — essential for production.
Connection Pooling with PgBouncer
PostgreSQL creates a new process per connection — it does not scale to thousands of concurrent connections. Always use a connection pooler:
# pgbouncer.ini
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb
[pgbouncer]
pool_mode = transaction ; best for web apps
max_client_conn = 1000
default_pool_size = 25
min_pool_size = 5
reserve_pool_size = 10
Partitioning for Massive Tables
-- Partition events table by month
CREATE TABLE events (
id BIGSERIAL,
user_id UUID NOT NULL,
event_type TEXT NOT NULL,
payload JSONB,
occurred_at TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (occurred_at);
CREATE TABLE events_2026_01 PARTITION OF events
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE events_2026_02 PARTITION OF events
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
-- Queries with WHERE occurred_at filter only scan the relevant partition
Conclusion
PostgreSQL performance is 80% about indexes and 20% about query rewrites. Use EXPLAIN ANALYZE as a habit, not an afterthought. Add partial indexes for filtered queries, use JSONB GIN indexes for flexible schemas, and always run behind a connection pooler.
Key takeaways:
- Composite index column order = equality first, then range
- Partial indexes are often 10x smaller and faster than full indexes
- CONCURRENTLY is your friend for materialized view refreshes
- Never connect your app directly to Postgres — always use PgBouncer or equivalent