Tuesday, June 24, 2025

LONG RUNING QUERIES

 *** LONG RUNING QUERIES

✅ STEP-BY-STEP: Investigate and Resolve Long-Running Queries


🧭 STEP 1: Identify Long-Running Queries

Use this SQL to list all currently running queries ordered by duration:

SELECT
    pid,
    usename,
    datname,
    client_addr,
    application_name,
    state,
    now() - query_start AS duration,
    wait_event_type,
    wait_event,
    query
FROM pg_stat_activity
WHERE state != 'idle'
  AND now() - query_start > interval '2 minutes'
ORDER BY duration DESC;

You can adjust the 2 minutes threshold based on your SLAs.


🧭 STEP 2: Investigate Query Type and Intent

Look at the output:

  • SELECT → check if it's scanning large tables without indexes

  • UPDATE / DELETE → large data change without batching

  • DDL like CREATE INDEX → may block writes unless concurrent

🛠 Check if the query is:

  • Holding locks

  • Waiting on disk/network (wait_event_type = IO/Client)


🧭 STEP 3: Check for Blocking Locks (if any)

Long-running queries may be blocked or are blocking others.

Run:

SELECT
    blocked_locks.pid     AS blocked_pid,
    blocked_activity.query AS blocked_query,
    now() - blocked_activity.query_start AS blocked_duration,
    blocking_locks.pid     AS blocking_pid,
    blocking_activity.query AS blocking_query,
    now() - blocking_activity.query_start AS blocking_duration
FROM pg_locks blocked_locks
JOIN pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_locks blocking_locks ON blocking_locks.transactionid = blocked_locks.transactionid
JOIN pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;

🧭 STEP 4: Check Query Plan (Optional but Useful)

If you're unsure why it's slow, run EXPLAIN (ANALYZE, BUFFERS) for the same query manually (on a test DB).

This shows:

  • Seq Scan vs Index Scan

  • Sort/Join bottlenecks

  • Rows returned vs expected


🧭 STEP 5: Terminate a Truly Problematic Query

If the query is:

  • Running for an unusual time

  • Not critical

  • Blocking other sessions

Terminate it safely:

SELECT pg_terminate_backend(<pid>);

Or just cancel it (safer):

SELECT pg_cancel_backend(<pid>);

🧭 STEP 6: Audit User and Source (Who’s Running It?)

SELECT pid, usename, application_name, client_addr, query_start, query
FROM pg_stat_activity
WHERE now() - query_start > interval '2 minutes'
ORDER BY query_start;

Use this to check:

  • Who triggered the query

  • Is it a batch job or application process?


🛡️ STEP 7: Prevent Long Queries in Future

Problem Prevention
Full table scans Add appropriate indexes
Large batch updates Use LIMIT + OFFSET batching
Idle transactions Set idle_in_transaction_session_timeout
Infinite reports Use LIMIT in queries
Application retries Throttle, set timeouts at app level

Example config (postgresql.conf):

statement_timeout = '2min'
log_min_duration_statement = '30s'

🧠 BONUS: Monitor Long Queries in Health Checks

Add this SQL in your pp_health_checks_01.sql:

-- Long Running Queries (above 2 min)
SELECT
    now() as captured_at,
    pid,
    usename,
    client_addr,
    now() - query_start AS runtime,
    query
FROM pg_stat_activity
WHERE state != 'idle'
  AND now() - query_start > interval '2 minutes'
ORDER BY runtime DESC;

You can output this to .html format from your shell script as part of daily health logs.



No comments:

Post a Comment