*** LONG RUNING QUERIES
✅ STEP-BY-STEP: Investigate and Resolve Long-Running Queries
🧠STEP 1: Identify Long-Running Queries
Use this SQL to list all currently running queries ordered by duration:
SELECT
pid,
usename,
datname,
client_addr,
application_name,
state,
now() - query_start AS duration,
wait_event_type,
wait_event,
query
FROM pg_stat_activity
WHERE state != 'idle'
AND now() - query_start > interval '2 minutes'
ORDER BY duration DESC;
You can adjust the
2 minutes
threshold based on your SLAs.
🧠STEP 2: Investigate Query Type and Intent
Look at the output:
-
SELECT
→ check if it's scanning large tables without indexes -
UPDATE
/DELETE
→ large data change without batching -
DDL like
CREATE INDEX
→ may block writes unless concurrent
🛠Check if the query is:
-
Holding locks
-
Waiting on disk/network (
wait_event_type = IO/Client
)
🧠STEP 3: Check for Blocking Locks (if any)
Long-running queries may be blocked or are blocking others.
Run:
SELECT
blocked_locks.pid AS blocked_pid,
blocked_activity.query AS blocked_query,
now() - blocked_activity.query_start AS blocked_duration,
blocking_locks.pid AS blocking_pid,
blocking_activity.query AS blocking_query,
now() - blocking_activity.query_start AS blocking_duration
FROM pg_locks blocked_locks
JOIN pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_locks blocking_locks ON blocking_locks.transactionid = blocked_locks.transactionid
JOIN pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
🧠STEP 4: Check Query Plan (Optional but Useful)
If you're unsure why it's slow, run EXPLAIN (ANALYZE, BUFFERS)
for the same query manually (on a test DB).
This shows:
-
Seq Scan vs Index Scan
-
Sort/Join bottlenecks
-
Rows returned vs expected
🧠STEP 5: Terminate a Truly Problematic Query
If the query is:
-
Running for an unusual time
-
Not critical
-
Blocking other sessions
Terminate it safely:
SELECT pg_terminate_backend(<pid>);
Or just cancel it (safer):
SELECT pg_cancel_backend(<pid>);
🧠STEP 6: Audit User and Source (Who’s Running It?)
SELECT pid, usename, application_name, client_addr, query_start, query
FROM pg_stat_activity
WHERE now() - query_start > interval '2 minutes'
ORDER BY query_start;
Use this to check:
-
Who triggered the query
-
Is it a batch job or application process?
🛡️ STEP 7: Prevent Long Queries in Future
Problem | Prevention |
---|---|
Full table scans | Add appropriate indexes |
Large batch updates | Use LIMIT + OFFSET batching |
Idle transactions | Set idle_in_transaction_session_timeout |
Infinite reports | Use LIMIT in queries |
Application retries | Throttle, set timeouts at app level |
Example config (postgresql.conf):
statement_timeout = '2min'
log_min_duration_statement = '30s'
🧠BONUS: Monitor Long Queries in Health Checks
Add this SQL in your pp_health_checks_01.sql
:
-- Long Running Queries (above 2 min)
SELECT
now() as captured_at,
pid,
usename,
client_addr,
now() - query_start AS runtime,
query
FROM pg_stat_activity
WHERE state != 'idle'
AND now() - query_start > interval '2 minutes'
ORDER BY runtime DESC;
You can output this to .html
format from your shell script as part of daily health logs.
No comments:
Post a Comment