Tuesday, June 24, 2025

LONG RUNING QUERIES

 *** LONG RUNING QUERIES

✅ STEP-BY-STEP: Investigate and Resolve Long-Running Queries


๐Ÿงญ STEP 1: Identify Long-Running Queries

Use this SQL to list all currently running queries ordered by duration:

SELECT
    pid,
    usename,
    datname,
    client_addr,
    application_name,
    state,
    now() - query_start AS duration,
    wait_event_type,
    wait_event,
    query
FROM pg_stat_activity
WHERE state != 'idle'
  AND now() - query_start > interval '2 minutes'
ORDER BY duration DESC;

You can adjust the 2 minutes threshold based on your SLAs.


๐Ÿงญ STEP 2: Investigate Query Type and Intent

Look at the output:

  • SELECT → check if it's scanning large tables without indexes

  • UPDATE / DELETE → large data change without batching

  • DDL like CREATE INDEX → may block writes unless concurrent

๐Ÿ›  Check if the query is:

  • Holding locks

  • Waiting on disk/network (wait_event_type = IO/Client)


๐Ÿงญ STEP 3: Check for Blocking Locks (if any)

Long-running queries may be blocked or are blocking others.

Run:

SELECT
    blocked_locks.pid     AS blocked_pid,
    blocked_activity.query AS blocked_query,
    now() - blocked_activity.query_start AS blocked_duration,
    blocking_locks.pid     AS blocking_pid,
    blocking_activity.query AS blocking_query,
    now() - blocking_activity.query_start AS blocking_duration
FROM pg_locks blocked_locks
JOIN pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_locks blocking_locks ON blocking_locks.transactionid = blocked_locks.transactionid
JOIN pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;

๐Ÿงญ STEP 4: Check Query Plan (Optional but Useful)

If you're unsure why it's slow, run EXPLAIN (ANALYZE, BUFFERS) for the same query manually (on a test DB).

This shows:

  • Seq Scan vs Index Scan

  • Sort/Join bottlenecks

  • Rows returned vs expected


๐Ÿงญ STEP 5: Terminate a Truly Problematic Query

If the query is:

  • Running for an unusual time

  • Not critical

  • Blocking other sessions

Terminate it safely:

SELECT pg_terminate_backend(<pid>);

Or just cancel it (safer):

SELECT pg_cancel_backend(<pid>);

๐Ÿงญ STEP 6: Audit User and Source (Who’s Running It?)

SELECT pid, usename, application_name, client_addr, query_start, query
FROM pg_stat_activity
WHERE now() - query_start > interval '2 minutes'
ORDER BY query_start;

Use this to check:

  • Who triggered the query

  • Is it a batch job or application process?


๐Ÿ›ก️ STEP 7: Prevent Long Queries in Future

Problem Prevention
Full table scans Add appropriate indexes
Large batch updates Use LIMIT + OFFSET batching
Idle transactions Set idle_in_transaction_session_timeout
Infinite reports Use LIMIT in queries
Application retries Throttle, set timeouts at app level

Example config (postgresql.conf):

statement_timeout = '2min'
log_min_duration_statement = '30s'

๐Ÿง  BONUS: Monitor Long Queries in Health Checks

Add this SQL in your pp_health_checks_01.sql:

-- Long Running Queries (above 2 min)
SELECT
    now() as captured_at,
    pid,
    usename,
    client_addr,
    now() - query_start AS runtime,
    query
FROM pg_stat_activity
WHERE state != 'idle'
  AND now() - query_start > interval '2 minutes'
ORDER BY runtime DESC;

You can output this to .html format from your shell script as part of daily health logs.



LOCKS

  #####LOCKS



๐Ÿ” STEP-BY-STEP: Check and Resolve Locks in PostgreSQL


Step 1: List Active Locks

SELECT
    pid,
    usename,
    datname,
    locktype,
    relation::regclass AS table_name,
    mode,
    granted,
    application_name,
    client_addr,
    now() - pg_stat_activity.query_start AS query_age,
    pg_stat_activity.query
FROM pg_locks
JOIN pg_stat_activity USING (pid)
WHERE NOT granted
ORDER BY query_age DESC;
  • This shows who is waiting for a lock, and which object is blocked.


Step 2: Check Blocked vs Blocking Sessions

SELECT
    blocked_locks.pid     AS blocked_pid,
    blocked_activity.usename  AS blocked_user,
    blocked_activity.query    AS blocked_query,
    blocking_locks.pid     AS blocking_pid,
    blocking_activity.usename AS blocking_user,
    blocking_activity.query   AS blocking_query,
    blocking_activity.state   AS blocking_state,
    now() - blocking_activity.query_start AS blocking_duration
FROM pg_locks blocked_locks
JOIN pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_locks blocking_locks
    ON blocking_locks.locktype = blocked_locks.locktype
    AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
    AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
    AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
    AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
    AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
    AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
    AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
    AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
    AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
    AND blocked_locks.pid != blocking_locks.pid
JOIN pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
  • This gives blocking → blocked chains, with full query details and duration.


Step 3: Identify the Type of Lock

Common mode values:

  • AccessExclusiveLockBLOCKS all

  • RowExclusiveLock, ShareLock, etc. → moderate impact

Use this table:

Lock Mode Description
AccessExclusiveLock Full table lock (e.g., DROP, ALTER)
ExclusiveLock VACUUM FULL, REINDEX
RowExclusiveLock Insert/update/delete
ShareRowExclusiveLock CREATE INDEX CONCURRENTLY conflict

Step 4: Find and Terminate Blocking Queries

๐Ÿ” Option 1: Manually Terminate Blocking PID

SELECT pg_terminate_backend(<blocking_pid>);

Only do this after checking:

  • The blocking query is idle or stuck

  • It is not a system/critical session


Step 5: Monitor Lock Wait Time

To monitor session locks in real-time:

SELECT pid, now() - query_start AS wait_duration, query
FROM pg_stat_activity
WHERE wait_event_type = 'Lock'
ORDER BY wait_duration DESC;

Step 6: Prevent Future Locks — Good Practices

Tip Action
๐Ÿงฝ Keep transactions short Avoid leaving BEGIN sessions open
๐Ÿ›‘ No DDL in business hours Schedule ALTER, DROP off-hours
๐Ÿงต Index your foreign keys Missing indexes cause long share locks
๐Ÿ” Avoid large UPDATE/DELETE in one go Break into batches
๐Ÿ’ก Set lock_timeout or statement_timeout Prevent endless wait
SET lock_timeout = '5s';  -- optional

๐Ÿ›  Example Shell Script to Auto-Detect Blocking Sessions

Let me know if you'd like a shell script that logs blocking chains, so you can add it to cron or monitoring scripts like your pp_health_checks.sh.




ALL BASIC HEALTH CHECKS

 ALL BASIC HEALTH CHECKS


*** BASIC CHECKS

*** LOCKS

*** DEADLOCK

*** ALERT LOGS CHECKS

*** HIGH CPU

*** LONG RUNING QUERIES

*** CANCEL QUERY BY PID


Thursday, June 19, 2025

ALL HEALTH CHECKS AND ALL ALERTS

ALL HEALTH CHECKS AND ALL ALERTS

https://prasadappsdba.blogspot.com/2025/06/postgresql-high-alerts-and-checks.html

 

## POSTGRES DB SCRIPTS

https://dbaclass.com/postgres-db-scripts/

 

## pg_dba_scripts - PostgreSQL DBA scripts

https://github.com/Azmodey/pg_dba_scripts

 

## PostgreSQL – Architecture

https://awsprocert.com/databases/postgresql-dba/

 

## Daily used SQL query statements for PostgreSQL DBA

https://pankajconnect.medium.com/daily-used-sql-query-statements-for-postgresql-dba-f697b63968a8

POSTGRESQL ALL CHECKS

 

POSTGRESQL ALL CHECKS

✅ 1. Backend Connection Exhaustion (PostgreSQL max_connections hit)

  • If PgBouncer is in session or transaction mode, it reuses backend connections.

  • But if PostgreSQL backend is out of max_connections, PgBouncer cannot assign clients, and it stalls.

๐Ÿ‘‰ You must check PostgreSQL's max_connections:

SHOW max_connections;
SELECT count(*) FROM pg_stat_activity;

If count(*) ≈ max_connections, this causes backpressure.


✅ 2. Client Storm or Stuck Sessions

  • A few client apps (or one bad job) may create 80+ simultaneous requests, and they get stuck.

  • Once PgBouncer's default_pool_size is fully occupied, no new clients can proceed.

Run:

SHOW POOLS;       -- in PgBouncer shell

Look for:

  • "in_use" = 80, and "waiting" clients

  • "cl_active" or "sv_active" = saturated pool


✅ 3. Backend Query or Transaction Stuck (Locks or Long Queries)

If queries behind those 80 connections are:

  • Stuck in lock waits

  • Doing full table scans

  • Or idle in transaction (not releasing pool)

PgBouncer won’t release the connections → keeps looping.

Run in PostgreSQL:

SELECT pid, state, wait_event_type, wait_event, query, now() - query_start AS runtime
FROM pg_stat_activity
WHERE state <> 'idle'
ORDER BY runtime DESC;

✅ 4. Health Checks / Monitoring Apps Flooding PgBouncer

Sometimes a monitoring tool (like Zabbix or a REST check) hits PgBouncer repeatedly.

Run:

SELECT application_name, count(*) FROM pg_stat_activity GROUP BY 1 ORDER BY 2 DESC;

You may find a single app is creating 80 connections on its own.


✅ 5. PgBouncer Timeouts or Misconfigurations

Check if PgBouncer is killing idle sessions too aggressively, or not recycling sessions:

In pgbouncer.ini:

max_client_conn = 1000
default_pool_size = 100
reserve_pool_size = 20
reserve_pool_timeout = 2
server_idle_timeout = 300
client_idle_timeout = 600

Also verify:

pool_mode = transaction   # better than session

If in session mode, connections are sticky and won't release until client disconnects.


✅ Recommended Investigation Steps

Step What to Check Command
๐Ÿ”Ž PostgreSQL backend limit Is it hitting max_connections? SHOW max_connections;
๐Ÿ”Ž Active PgBouncer pools Who's using connections SHOW POOLS;
๐Ÿ”Ž Stuck backend queries Any locks or long-running queries Query pg_stat_activity
๐Ÿ”Ž Session ownership Top users/applications SELECT usename, application_name...
๐Ÿ”Ž PgBouncer config pool_mode, timeouts, pool size cat /etc/pgbouncer/pgbouncer.ini

๐Ÿ› ️ Permanent Solution Suggestions

✔ Tune PostgreSQL:

  • max_connections = 500+

  • Monitor with pg_stat_activity

✔ Tune PgBouncer:

pool_mode = transaction
default_pool_size = 100
max_client_conn = 1000
reserve_pool_size = 20

✔ Enable Logging in PgBouncer:

log_connections = 1
log_disconnections = 1
log_pooler_errors = 1

Check logs in /var/log/pgbouncer.log or similar.


๐Ÿ”ง If You Can't Access PgBouncer Server

From PostgreSQL side, still run:

SELECT client_addr, application_name, usename, backend_start
FROM pg_stat_activity
WHERE backend_start > now() - interval '30 minutes';

This helps you identify the clients from PgBouncer, which apps, and when they connected.