Which SQL dialect should I study?

Start with ANSI SQL then learn one of PostgreSQL or MySQL deeply. Interviewers are usually flexible with syntax as long as the logic is correct.

How important are window functions?

Very — most medium/hard interview problems use ROW_NUMBER, RANK, LAG, or running totals. They come up in nearly every data-heavy loop.

Know how B-trees work, when indexes help and hurt (writes), and how to read an EXPLAIN plan. This separates senior candidates from juniors.

For data/backend roles, basic understanding (document vs key-value vs wide-column) is enough. Deep NoSQL usually has its own loop.

How do you practice SQL?

Sites like StrataScratch, LeetCode SQL, and HackerRank have real interview-style problems. Run queries locally on sample datasets too.

SQL Interview Questions (2026) — 100 Q&A with Queries

SELECT retrieves rows and columns from one or more tables. You specify the columns you want (or * for all), the source table in a FROM clause, and optional filtering, sorting, and limiting clauses.

sql
SELECT first_name, last_name, email
FROM users
WHERE active = TRUE
ORDER BY last_name ASC
LIMIT 10;

Every SQL query starts with SELECT, making it the most fundamental statement to master.

WHERE filters rows before they are returned, keeping only rows where the condition evaluates to TRUE. NULL values are excluded because NULL comparisons return UNKNOWN, not TRUE.

sql
SELECT * FROM orders
WHERE status = 'shipped' AND total > 100;

WHERE is evaluated before GROUP BY and HAVING, so you cannot reference aggregate functions in it.

ORDER BY sorts the result set by one or more columns (ASC by default). LIMIT caps the number of rows returned; OFFSET skips the first N rows, enabling pagination.

sql
SELECT id, title
FROM articles
ORDER BY published_at DESC
LIMIT 20 OFFSET 40;  -- page 3, 20 per page

Without ORDER BY, the row order is non-deterministic, so OFFSET-based pagination can return duplicate or missing rows across pages on a live table.

DISTINCT removes duplicate rows from the result set, comparing all selected columns together. It is useful when a join or query can produce repeated combinations you do not care about.

sql
SELECT DISTINCT country FROM customers;

DISTINCT adds a sort/hash step and can be expensive on large tables. If you find yourself adding it to fix "too many rows," it is often a sign of an incorrect join or missing filter.

Aggregate functions collapse a set of rows into a single value. COUNT(*) counts all rows; COUNT(col) counts non-NULL values. SUM and AVG work on numeric columns. MIN/MAX work on any orderable type.

sql
SELECT
  COUNT(*)        AS total_orders,
  SUM(amount)     AS revenue,
  AVG(amount)     AS avg_order,
  MIN(created_at) AS first_order,
  MAX(created_at) AS last_order
FROM orders;

All aggregates ignore NULL values except COUNT(*).

GROUP BY collapses rows that share the same values in the specified columns into a single group, allowing aggregates to be computed per group. Every column in the SELECT that is not inside an aggregate function must appear in GROUP BY.

sql
SELECT department, COUNT(*) AS headcount, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;

Omitting a non-aggregated column from GROUP BY is a SQL error in strict mode (and in PostgreSQL).

WHERE filters individual rows before grouping; HAVING filters groups after GROUP BY and aggregate calculation. You cannot use aggregate functions in WHERE, but you can in HAVING.

sql
SELECT department, COUNT(*) AS cnt
FROM employees
WHERE active = TRUE          -- filters rows first
GROUP BY department
HAVING COUNT(*) > 5;         -- filters groups after aggregation

The execution order is: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT.

NULL represents an unknown or missing value. Any arithmetic or comparison with NULL yields NULL (or UNKNOWN in a boolean context), meaning NULL = NULL is not TRUE — use IS NULL or IS NOT NULL instead. COALESCE returns the first non-NULL argument; NULLIF returns NULL when two arguments are equal.

sql
SELECT COALESCE(phone, 'N/A') AS phone_display
FROM contacts
WHERE deleted_at IS NULL;

NULL propagation is a frequent source of bugs in WHERE clauses and JOIN conditions.

INNER JOIN returns rows where a matching row exists in both tables based on the join condition. Rows that have no match in either table are excluded from the result set.

sql
SELECT o.id, c.name
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id;

Use INNER JOIN when you only care about records that have a corresponding entry in both tables.

LEFT JOIN (also LEFT OUTER JOIN) returns all rows from the left table and matching rows from the right table. Where no match exists, right-table columns are NULL. Use it when you want all records from the primary table regardless of whether a related record exists.

sql
SELECT u.id, u.email, p.plan
FROM users u
LEFT JOIN subscriptions p ON u.id = p.user_id;
-- users without a subscription still appear, p.plan is NULL

RIGHT JOIN returns all rows from the right table and matching rows from the left table, with NULLs for unmatched left-table columns. It is the mirror of LEFT JOIN and is less commonly used because you can always rewrite a RIGHT JOIN as a LEFT JOIN by swapping the table order.

sql
SELECT e.name, d.name AS dept
FROM employees e
RIGHT JOIN departments d ON e.dept_id = d.id;
-- departments with no employees still appear

FULL OUTER JOIN returns all rows from both tables, filling in NULLs where there is no matching row on either side. Use it to find unmatched rows in both tables simultaneously.

sql
SELECT a.id AS a_id, b.id AS b_id
FROM table_a a
FULL OUTER JOIN table_b b ON a.key = b.key
WHERE a.id IS NULL OR b.id IS NULL;  -- only unmatched rows

PostgreSQL supports FULL OUTER JOIN natively; MySQL did not until version 8.

CROSS JOIN produces the Cartesian product of two tables — every row in the first table paired with every row in the second. A 10-row table crossed with a 100-row table yields 1,000 rows.

sql
SELECT s.size, c.color
FROM sizes s
CROSS JOIN colors c;

Cross joins are intentional when you need all combinations (e.g., generating a calendar, pricing matrix), but accidental cross joins from a missing ON clause are a common performance bug.

A SELF JOIN joins a table to itself using aliases, treating the same table as two logical tables. Common use cases include hierarchical data (employees and their managers) and comparing rows within the same table.

sql
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

Self joins require aliases to distinguish which copy of the table each column reference comes from.

Aliases assign a temporary name to a table or column for the duration of the query. Column aliases appear in the output header; table aliases shorten join syntax. The AS keyword is optional in most databases.

sql
SELECT
  u.first_name || ' ' || u.last_name AS full_name,
  o.total                              AS order_total
FROM users AS u
JOIN orders AS o ON o.user_id = u.id;

Column aliases defined in SELECT are not available in WHERE or HAVING in most databases (they are in PostgreSQL ORDER BY).

INSERT INTO adds one or more rows to a table. You can specify explicit column names (recommended) or rely on column order (fragile).

sql
INSERT INTO products (name, price, stock)
VALUES ('Widget', 9.99, 100);

-- Multi-row insert
INSERT INTO tags (name) VALUES ('sql'), ('database'), ('backend');

Omitting the column list requires values for every column in table-definition order and breaks if the schema changes.

UPDATE modifies existing rows in a table. Always include a WHERE clause — omitting it updates every row in the table.

sql
UPDATE users
SET last_login = NOW(), login_count = login_count + 1
WHERE id = 42;

PostgreSQL also supports UPDATE ... FROM for joining another table during the update, and UPDATE ... RETURNING to fetch affected rows without a second query.

DELETE removes rows matching the WHERE condition. Without WHERE, all rows are deleted (but the table structure remains). Always run the equivalent SELECT first to confirm what will be deleted.

sql
DELETE FROM sessions
WHERE expires_at < NOW();

DELETE fires row-level triggers and generates WAL entries, making it slower than TRUNCATE for clearing a whole table.

DELETE removes rows one at a time, fires triggers, and can have a WHERE clause. TRUNCATE removes all rows at once, is much faster, does not fire row-level triggers, and resets sequences. TRUNCATE acquires an ACCESS EXCLUSIVE lock on the table.

sql
-- Slower, transactional, supports WHERE
DELETE FROM audit_log WHERE created_at < '2023-01-01';

-- Faster, no WHERE, resets sequences
TRUNCATE TABLE session_cache;

In PostgreSQL, TRUNCATE is transactional and can be rolled back, unlike in MySQL.

A PRIMARY KEY uniquely identifies each row in a table. It automatically implies NOT NULL and UNIQUE. A table can have only one primary key, which may span multiple columns (composite key).

sql
CREATE TABLE orders (
  id         SERIAL PRIMARY KEY,
  user_id    INT NOT NULL,
  created_at TIMESTAMP NOT NULL DEFAULT NOW()
);

Databases create a unique B-tree index on the primary key automatically, making lookups by primary key very fast.

A FOREIGN KEY enforces referential integrity by requiring that the value in the referencing column exists in the referenced table's column (usually a primary key). It prevents orphaned rows.

sql
CREATE TABLE order_items (
  id       SERIAL PRIMARY KEY,
  order_id INT REFERENCES orders(id) ON DELETE CASCADE,
  sku      TEXT NOT NULL
);

ON DELETE CASCADE, SET NULL, or RESTRICT control what happens when the referenced row is deleted.

UNIQUE ensures no two rows have the same value in the constrained column(s); NULLs are usually exempt (multiple NULLs are allowed because NULL != NULL). NOT NULL prevents storing a NULL in the column.

sql
CREATE TABLE users (
  id    SERIAL PRIMARY KEY,
  email TEXT NOT NULL UNIQUE,
  name  TEXT NOT NULL
);

A composite UNIQUE constraint covers combinations of columns, not each column individually.

CREATE TABLE defines the table name, column names, data types, and optional constraints. Column-level constraints follow the type; table-level constraints (like composite keys) appear after all columns.

sql
CREATE TABLE products (
  id          SERIAL        PRIMARY KEY,
  sku         VARCHAR(50)   NOT NULL UNIQUE,
  name        TEXT          NOT NULL,
  price       NUMERIC(10,2) NOT NULL CHECK (price >= 0),
  created_at  TIMESTAMP     NOT NULL DEFAULT NOW()
);

A subquery is a SELECT nested inside another SQL statement. It can appear in WHERE, FROM (as a derived table), or SELECT (scalar subquery). Use subqueries when the inner result is needed as a filter set or inline value.

sql
SELECT name FROM products
WHERE id IN (
  SELECT product_id FROM order_items
  WHERE quantity > 10
);

Correlated subqueries (referencing the outer query) run once per outer row and can be slow; CTEs or JOINs are often better alternatives.

IN checks whether a value appears in a static list or subquery result set. EXISTS checks whether a subquery returns at least one row and short-circuits as soon as it finds one. EXISTS is generally faster when the subquery result is large because it does not materialise the full set.

sql
-- IN: loads all product_ids
SELECT * FROM products WHERE id IN (SELECT product_id FROM deals);

-- EXISTS: stops at first match
SELECT * FROM products p
WHERE EXISTS (SELECT 1 FROM deals d WHERE d.product_id = p.id);

With NULLs, IN can produce surprising results — if the subquery contains a NULL, the NOT IN version may return zero rows.

UNION combines the result sets of two queries and removes duplicates (adds a sort/hash step). UNION ALL combines them and keeps all rows including duplicates, making it faster.

sql
SELECT email FROM customers
UNION ALL
SELECT email FROM leads;
-- keeps duplicates, faster

SELECT email FROM customers
UNION
SELECT email FROM leads;
-- removes duplicates, slower

Both require the same number of columns and compatible data types in each SELECT. Prefer UNION ALL unless duplicate removal is required.

CASE WHEN is SQL's conditional expression, evaluating conditions in order and returning the first matching THEN value. It can appear anywhere an expression is valid: SELECT, WHERE, ORDER BY, aggregates.

sql
SELECT
  name,
  CASE
    WHEN score >= 90 THEN 'A'
    WHEN score >= 70 THEN 'B'
    WHEN score >= 50 THEN 'C'
    ELSE 'F'
  END AS grade
FROM students;

The simple CASE form (CASE col WHEN val THEN ...) is equivalent to equality checks.

LIKE performs case-sensitive pattern matching using % (any sequence of characters) and _ (any single character). ILIKE in PostgreSQL does case-insensitive matching.

sql
SELECT * FROM products WHERE name LIKE 'Bolt%';    -- starts with Bolt
SELECT * FROM users  WHERE email LIKE '%@gmail.com'; -- Gmail addresses
SELECT * FROM codes  WHERE code LIKE 'A_C';         -- A, any char, C

Leading % prevents index use because the database cannot do a range scan. For heavy text search, use full-text search (tsvector) instead.

Window functions compute a value across a "window" of rows related to the current row, without collapsing the result set to one row per group. The OVER() clause defines the window. GROUP BY collapses rows; window functions preserve every row.

sql
SELECT
  employee_id,
  department,
  salary,
  AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;
-- Each employee row is returned, plus the department average alongside it.

Common window functions: ROW_NUMBER, RANK, DENSE_RANK, NTILE, LAG, LEAD, FIRST_VALUE, LAST_VALUE, SUM/AVG/COUNT with OVER().

All three assign a numeric rank within a partition ordered by a column. They differ when ties exist. ROW_NUMBER always assigns unique sequential integers regardless of ties. RANK skips numbers after a tie (1,1,3). DENSE_RANK never skips (1,1,2).

sql
SELECT
  name, score,
  ROW_NUMBER()  OVER (ORDER BY score DESC) AS row_num,
  RANK()        OVER (ORDER BY score DESC) AS rnk,
  DENSE_RANK()  OVER (ORDER BY score DESC) AS dense_rnk
FROM leaderboard;
-- Scores: 100,100,90 → row_num: 1,2,3 | rank: 1,1,3 | dense_rank: 1,1,2

Use DENSE_RANK when gaps in ranking numbers would confuse users; use ROW_NUMBER to pick exactly one row per group.

NTILE(n) divides the ordered partition into n roughly equal buckets and assigns each row a bucket number from 1 to n. If the rows do not divide evenly, earlier buckets get one extra row.

sql
SELECT
  customer_id,
  total_spend,
  NTILE(4) OVER (ORDER BY total_spend DESC) AS quartile
FROM customer_summary;
-- quartile 1 = top 25% spenders

NTILE is commonly used for percentile bucketing and A/B test stratification.

LAG accesses a column value from a previous row within the partition; LEAD accesses a future row. Both accept an offset (default 1) and an optional default for when no prior/next row exists.

sql
SELECT
  date,
  revenue,
  LAG(revenue, 1, 0) OVER (ORDER BY date) AS prev_day_revenue,
  revenue - LAG(revenue, 1, 0) OVER (ORDER BY date) AS day_over_day_change
FROM daily_revenue;

Useful for day-over-day comparisons, detecting changes in status, and computing time-between-events without a self-join.

Using SUM as a window function with an ORDER BY in the OVER clause computes a cumulative (running) sum. The default frame is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

sql
SELECT
  order_date,
  amount,
  SUM(amount) OVER (ORDER BY order_date
                   ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
                  ) AS running_total
FROM orders;

Adding PARTITION BY resets the running total per group (e.g., per customer or per month).

PARTITION BY divides rows into independent groups (partitions) before the window function is applied. The function restarts for each partition, similar to GROUP BY but without collapsing rows.

sql
SELECT
  region,
  salesperson,
  sales,
  SUM(sales) OVER (PARTITION BY region) AS region_total,
  sales * 1.0 / SUM(sales) OVER (PARTITION BY region) AS pct_of_region
FROM sales_data;

Without PARTITION BY, the window spans the entire result set.

A CTE is a named temporary result set defined in a WITH clause at the start of a query. It improves readability by breaking complex queries into named steps and can be referenced multiple times.

sql
WITH monthly_revenue AS (
  SELECT
    DATE_TRUNC('month', created_at) AS month,
    SUM(amount) AS revenue
  FROM orders
  GROUP BY 1
)
SELECT
  month,
  revenue,
  LAG(revenue) OVER (ORDER BY month) AS prev_month
FROM monthly_revenue;

In PostgreSQL, CTEs are optimisation fences by default (materialised), but you can use WITH ... AS NOT MATERIALIZED to let the planner inline them.

A recursive CTE references itself to traverse hierarchical or graph data. It consists of an anchor (base case) and a recursive member joined with UNION ALL, and runs until no new rows are produced.

sql
WITH RECURSIVE org_chart AS (
  -- anchor: top-level employees
  SELECT id, name, manager_id, 0 AS depth
  FROM employees WHERE manager_id IS NULL
  UNION ALL
  -- recursive: each employee's reports
  SELECT e.id, e.name, e.manager_id, oc.depth + 1
  FROM employees e
  JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT * FROM org_chart ORDER BY depth;

Common uses: org charts, folder trees, bill-of-materials, graph traversal, path finding.

A correlated subquery references columns from the outer query, causing it to execute once per outer row. Unlike an uncorrelated subquery (which runs once), it can be slow on large datasets.

sql
SELECT e.name, e.salary
FROM employees e
WHERE e.salary > (
  SELECT AVG(e2.salary)
  FROM employees e2
  WHERE e2.department = e.department  -- references outer row
);

This returns employees earning above their department average. The subquery runs for every row in `e`. Rewriting with a window function (AVG OVER PARTITION BY department) is usually much faster.

A B-tree (balanced tree) index keeps column values in a sorted structure of tree nodes, allowing the database to find a value in O(log n) time instead of scanning all rows (O(n)). Each leaf node stores the indexed value and a pointer to the heap row.

sql
CREATE INDEX idx_orders_user_id ON orders (user_id);
-- Now lookups like WHERE user_id = 42 go through the index tree,
-- not a full table scan.

B-tree indexes support equality (=), range (<, >, BETWEEN), and prefix LIKE ('abc%') lookups. They are the default index type in PostgreSQL.

A composite (multi-column) index covers multiple columns in a specified order. The leftmost column(s) can be used independently; columns further right are only used when all preceding columns are also constrained.

sql
CREATE INDEX idx_orders_user_status ON orders (user_id, status);
-- Efficient for: WHERE user_id = 1
-- Efficient for: WHERE user_id = 1 AND status = 'shipped'
-- NOT efficient for: WHERE status = 'shipped' alone (no left-prefix)

Place the most selective column first, and put equality conditions before range conditions for maximum index utility.

An expression index indexes the result of a function or expression rather than a raw column value. The query must use the identical expression to benefit from the index.

sql
-- Index the lowercased email for case-insensitive lookups
CREATE INDEX idx_users_email_lower ON users (LOWER(email));

-- Now this query uses the index:
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';

PostgreSQL evaluates the expression at write time and stores the result, so reads are fast but writes are slightly slower. Ensure the function is immutable (same output for same input).

EXPLAIN shows the query plan with estimated costs. EXPLAIN ANALYZE actually executes the query and shows real timings. Key fields: "cost=startup..total" (planner estimate), "rows" (estimated row count), "actual time=start..end" (real time), "loops" (how many times the node executed).

sql
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 42;
-- Seq Scan: full table scan (bad for large tables)
-- Index Scan: uses an index (usually good)
-- Hash Join / Nested Loop / Merge Join: join strategies

Look for large discrepancies between "rows" (estimate) and "actual rows" — that signals stale statistics. Nodes with high "actual time" and many "loops" are tuning targets.

A covering index includes all columns referenced by a query so the database can answer the query entirely from the index without accessing the main table (heap) at all. This eliminates the extra I/O of following row pointers.

sql
-- Query needs id, status, and created_at
SELECT id, status, created_at FROM orders WHERE user_id = 42;

-- A covering index on (user_id, status, created_at, id)
-- lets PostgreSQL use an Index Only Scan.
CREATE INDEX idx_orders_cover ON orders (user_id, status, created_at, id);

PostgreSQL shows "Index Only Scan" in EXPLAIN when a covering index is used. Over-indexing wastes write performance, so add extra columns to an index only when the query is very hot.

A partial index is built on a subset of rows defined by a WHERE clause. It is smaller, faster to update, and more likely to fit in cache than a full index on the same column.

sql
-- Index only unprocessed orders (far fewer rows than all orders)
CREATE INDEX idx_orders_pending
  ON orders (created_at)
  WHERE status = 'pending';

-- This query benefits:
SELECT * FROM orders
WHERE status = 'pending' AND created_at > NOW() - INTERVAL '1 day';

The query's WHERE clause must imply the index predicate for the partial index to be used.

From weakest to strongest: READ UNCOMMITTED (can see dirty reads — not supported in PostgreSQL, treated as READ COMMITTED), READ COMMITTED (default in PostgreSQL — sees only committed rows, but non-repeatable reads possible), REPEATABLE READ (same row returns same value within a transaction, prevents non-repeatable reads; PostgreSQL also prevents phantom reads at this level), SERIALIZABLE (full isolation — transactions appear to execute one at a time).

sql
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
-- your queries
COMMIT;

Higher isolation levels reduce concurrency and can increase deadlock risk.

COMMIT makes all changes in the current transaction permanent and visible to other sessions. ROLLBACK undoes all changes back to the start of the transaction (or to the last SAVEPOINT). SAVEPOINT creates a named checkpoint within a transaction; ROLLBACK TO SAVEPOINT undoes only the work since that checkpoint.

sql
BEGIN;
  INSERT INTO accounts (id, balance) VALUES (1, 1000);
  SAVEPOINT after_insert;
  UPDATE accounts SET balance = -500 WHERE id = 1;  -- oops
  ROLLBACK TO SAVEPOINT after_insert;               -- undo the bad update
  UPDATE accounts SET balance = 900 WHERE id = 1;
COMMIT;

Pessimistic locking acquires a row lock at read time, preventing others from modifying the row until the transaction completes (SELECT ... FOR UPDATE). Optimistic locking reads the row without locking, attempts the update, and checks a version counter or timestamp to detect concurrent modification — rolling back if the version changed.

sql
-- Pessimistic
SELECT * FROM inventory WHERE product_id = 5 FOR UPDATE;

-- Optimistic: application re-reads and retries if version changed
UPDATE inventory
SET quantity = quantity - 1, version = version + 1
WHERE product_id = 5 AND version = 7;

Optimistic is better for low-contention workloads (avoids lock overhead); pessimistic is safer under high contention.

INSERT ... ON CONFLICT allows you to specify what to do when the inserted row violates a unique or primary key constraint: either DO NOTHING (silently skip) or DO UPDATE SET (update the existing row — an atomic upsert).

sql
INSERT INTO page_views (page_id, views)
VALUES (42, 1)
ON CONFLICT (page_id)
DO UPDATE SET views = page_views.views + EXCLUDED.views,
              updated_at = NOW();

EXCLUDED refers to the row that was proposed for insertion. This pattern is safe under concurrent writes because the entire operation is atomic at the row level.

STRING_AGG concatenates values from a group into a single string, separated by a delimiter. An ORDER BY inside the aggregate controls the order of concatenation.

sql
-- PostgreSQL
SELECT
  order_id,
  STRING_AGG(product_name, ', ' ORDER BY product_name) AS products
FROM order_items
GROUP BY order_id;

-- MySQL / SQLite equivalent
SELECT order_id, GROUP_CONCAT(product_name ORDER BY product_name SEPARATOR ', ')
FROM order_items GROUP BY order_id;

Useful for creating comma-separated lists without application-side looping.

PostgreSQL has two JSON types: json (stored as text, preserves whitespace and key order) and jsonb (stored as a binary decomposed format — faster for querying, supports GIN indexing, deduplicates keys). Prefer jsonb in almost all cases.

sql
-- Query a jsonb column
SELECT data->>'name' AS name,
       (data->'address'->>'city') AS city
FROM profiles
WHERE data @> '{"role": "admin"}';

-- Index a jsonb field
CREATE INDEX idx_profiles_role ON profiles USING GIN (data);

`->` returns jsonb; `->>` returns text. `@>` checks containment and uses GIN indexes.

A view is a named saved SELECT query that behaves like a virtual table. It does not store data (unless materialised); every query against it re-executes the underlying SELECT.

sql
CREATE VIEW active_users AS
SELECT id, email, created_at
FROM users
WHERE deleted_at IS NULL;

SELECT * FROM active_users WHERE id = 5;

Use views to: simplify complex queries, apply row-level security by exposing subsets, or provide stable interfaces over tables that may change schema. Do not use views as a performance tool — the query still executes in full unless it is a materialized view.

A materialized view stores the query result physically on disk at creation or refresh time. Reads are fast (it is a real table), but data is stale until refreshed. A regular view always reflects current data but re-runs the query on every access.

sql
CREATE MATERIALIZED VIEW monthly_sales AS
SELECT DATE_TRUNC('month', created_at) AS month,
       SUM(amount) AS revenue
FROM orders GROUP BY 1;

-- Refresh (blocks reads in PostgreSQL by default)
REFRESH MATERIALIZED VIEW monthly_sales;

-- Non-blocking refresh (requires a unique index)
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_sales;

Materialized views are ideal for expensive aggregations that do not need to be real-time.

A function returns a value and can be used inside a SELECT or WHERE clause. A stored procedure is called with CALL, does not return a value directly, and (in PostgreSQL 11+) can manage transactions (COMMIT/ROLLBACK inside the procedure). Functions in PostgreSQL run inside the caller's transaction and cannot commit independently.

sql
-- Function
CREATE FUNCTION get_user_tier(uid UUID)
RETURNS TEXT AS $$
  SELECT plan FROM subscriptions WHERE user_id = uid;
$$ LANGUAGE sql STABLE;

SELECT get_user_tier('abc-123');

-- Procedure (PostgreSQL 11+)
CREATE PROCEDURE archive_old_orders(cutoff DATE)
LANGUAGE plpgsql AS $$
BEGIN
  INSERT INTO order_archive SELECT * FROM orders WHERE created_at < cutoff;
  DELETE FROM orders WHERE created_at < cutoff;
END;
$$;
CALL archive_old_orders('2023-01-01');

FIRST_VALUE returns the value of an expression from the first row of the window frame; LAST_VALUE returns it from the last row. The default window frame for LAST_VALUE is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which means it returns the current row — you usually need to extend the frame to get the true last row.

sql
SELECT
  product_id,
  sale_date,
  amount,
  FIRST_VALUE(amount) OVER w AS first_sale,
  LAST_VALUE(amount)  OVER w AS last_sale
FROM sales
WINDOW w AS (
  PARTITION BY product_id
  ORDER BY sale_date
  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
);

Always specify the frame explicitly when using LAST_VALUE to avoid the default frame surprise.

ROW_NUMBER within a PARTITION of the duplicate group lets you label duplicates and then delete all but the first (or last).

sql
WITH ranked AS (
  SELECT
    id,
    ROW_NUMBER() OVER (
      PARTITION BY email
      ORDER BY created_at ASC
    ) AS rn
  FROM users
)
DELETE FROM users
WHERE id IN (
  SELECT id FROM ranked WHERE rn > 1
);

Always verify with a SELECT first before running the DELETE. Wrapping it in a transaction lets you ROLLBACK if needed.

The FILTER clause (SQL standard, supported in PostgreSQL) adds a conditional to an aggregate function without requiring a CASE WHEN inside the aggregate. It is cleaner and can be more efficient than the CASE workaround.

sql
SELECT
  COUNT(*)                                      AS total,
  COUNT(*) FILTER (WHERE status = 'shipped')    AS shipped,
  COUNT(*) FILTER (WHERE status = 'cancelled')  AS cancelled,
  AVG(amount) FILTER (WHERE status = 'shipped') AS avg_shipped_amount
FROM orders;

Equivalent CASE workaround: COUNT(CASE WHEN status = 'shipped' THEN 1 END). FILTER is also valid inside window functions.

These extensions to GROUP BY compute multiple grouping combinations in a single pass. ROLLUP generates hierarchical subtotals from left to right. CUBE generates all possible subtotal combinations. GROUPING SETS explicitly lists the grouping combinations you want.

sql
-- ROLLUP: totals by (year, quarter), (year), and grand total
SELECT year, quarter, SUM(revenue)
FROM sales
GROUP BY ROLLUP (year, quarter);

-- CUBE: all combinations of region and product
SELECT region, product, SUM(sales)
FROM facts
GROUP BY CUBE (region, product);

-- GROUPING SETS: explicit
GROUP BY GROUPING SETS ((region, product), (region), ());

NULL in the output marks a rolled-up dimension; use GROUPING() to distinguish intentional NULLs from rollup NULLs.

Most SQL databases (including PostgreSQL) lack a PIVOT syntax, but you can simulate it using conditional aggregation with CASE WHEN inside aggregate functions.

sql
-- Pivot monthly revenue into columns
SELECT
  product_id,
  SUM(CASE WHEN month = 1 THEN revenue ELSE 0 END) AS jan,
  SUM(CASE WHEN month = 2 THEN revenue ELSE 0 END) AS feb,
  SUM(CASE WHEN month = 3 THEN revenue ELSE 0 END) AS mar
FROM monthly_revenue
GROUP BY product_id;

This approach requires knowing the pivot values at query-write time. For dynamic pivots (unknown values), generate the SQL dynamically in application code or a stored procedure. The FILTER clause is a cleaner modern alternative: SUM(revenue) FILTER (WHERE month = 1).

The N+1 problem occurs when an application fetches a list of N records (1 query) and then executes a separate query for each record to fetch related data (N queries). For 1,000 users, this becomes 1,001 round-trips to the database.

sql
-- N+1 pattern (bad): one query per user to get their latest order
SELECT id FROM users;  -- returns N users
-- then per user:
SELECT * FROM orders WHERE user_id = ? ORDER BY created_at DESC LIMIT 1;

-- Fixed with a single query using LATERAL or ROW_NUMBER
SELECT u.id, u.email, o.id AS latest_order_id
FROM users u
LEFT JOIN LATERAL (
  SELECT id FROM orders WHERE user_id = u.id
  ORDER BY created_at DESC LIMIT 1
) o ON TRUE;

ORM solutions: eager loading (SELECT IN or JOIN), DataLoader pattern for batching. Always profile with pg_stat_statements to detect repeated similar queries.

The window frame clause inside OVER(...) refines which rows participate in the window function computation for each current row. ROWS counts physical rows; RANGE groups rows with identical ORDER BY values together (logical peers).

sql
SELECT
  sale_date, amount,
  -- ROWS: exactly the 3 rows before current row
  SUM(amount) OVER (ORDER BY sale_date
    ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS rolling_3row,
  -- RANGE: all rows with same or earlier sale_date value
  SUM(amount) OVER (ORDER BY sale_date
    RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM daily_sales;

Rrows BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is the safe default for running totals. RANGE is the default when you write just ORDER BY with no frame clause — which can cause surprising results with duplicate ORDER BY values.

A trigger is a stored function that executes automatically in response to INSERT, UPDATE, or DELETE events on a table. Triggers can fire BEFORE or AFTER the event, and FOR EACH ROW or FOR EACH STATEMENT.

sql
CREATE OR REPLACE FUNCTION set_updated_at()
RETURNS TRIGGER AS $$
BEGIN
  NEW.updated_at := NOW();
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_set_updated_at
  BEFORE UPDATE ON products
  FOR EACH ROW EXECUTE FUNCTION set_updated_at();

Common uses: auto-updating updated_at timestamps, audit logging to a history table, enforcing complex constraints, replicating changes to a denormalised summary table. Pitfalls: triggers are invisible — they make behaviour surprising. Cascading triggers (trigger A fires trigger B) can cause infinite loops. Heavy trigger logic degrades write throughput. Prefer application-layer logic or generated columns for simple cases.

PostgreSQL can parallelise sequential scans, aggregations, joins, and index scans by spawning worker processes that each handle a slice of the data. The leader process gathers partial results from workers via a Gather or Gather Merge node.

sql
SET max_parallel_workers_per_gather = 4;
EXPLAIN ANALYZE SELECT SUM(amount) FROM orders;
-- Look for: Parallel Seq Scan, Partial Aggregate, Gather

Key parameters: max_parallel_workers, max_parallel_workers_per_gather, min_parallel_table_scan_size. Parallelism is disabled for functions marked VOLATILE or for tables with row-level security in some configurations. Not all query shapes parallelise — correlated subqueries and CTEs (materialised) typically do not. Parallel query helps most for large analytical scans on tables that do not fit in shared_buffers.

Partitioning divides a large logical table into smaller physical child tables (partitions) while queries target the parent table transparently. Three built-in strategies: Range — rows go into the partition whose range covers the partition key value (ideal for time-series data, e.g., one partition per month). List — rows are routed by explicit discrete values (e.g., country code). Hash — rows are distributed by a hash of the key, giving even distribution across N partitions.

sql
CREATE TABLE events (
  id         BIGSERIAL,
  created_at TIMESTAMPTZ NOT NULL,
  payload    JSONB
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2026_q1
  PARTITION OF events
  FOR VALUES FROM ('2026-01-01') TO ('2026-04-01');

Partitioning benefits: partition pruning (the planner skips irrelevant partitions), faster bulk deletes (DROP the old partition), and better cache utilisation. Drawbacks: foreign keys into partitioned tables have restrictions, and global unique indexes are not supported (only per-partition).

Partition pruning is the query planner's ability to eliminate child partitions that cannot possibly contain rows matching the WHERE clause, avoiding scanning them entirely. For a range-partitioned table with 36 monthly partitions, a query with WHERE created_at BETWEEN '2026-01-01' AND '2026-01-31' scans only the January 2026 partition.

sql
EXPLAIN SELECT * FROM events
WHERE created_at >= '2026-01-01' AND created_at < '2026-02-01';
-- Plan shows: Seq Scan on events_2026_01 (1 partition, not 36)

Pruning works when the partition key appears in the WHERE clause with a constant or stable expression. Dynamic pruning (runtime pruning for parameterised queries) was added in PostgreSQL 11. Without proper partition key predicates, the planner scans all partitions — potentially worse than an unpartitioned table due to partition overhead.

pg_stat_statements is a PostgreSQL extension that tracks execution statistics for every unique query text: total time, mean time, call count, rows returned, buffer hits/misses, and I/O time. It is the first tool to reach for when identifying slow queries.

sql
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Top 10 slowest queries by total time
SELECT query, calls, total_exec_time, mean_exec_time, rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

Key columns: calls (frequency), mean_exec_time (average latency), stddev_exec_time (variance — high variance = occasional slow outliers), blk_read_time (I/O-bound). Reset with pg_stat_statements_reset(). Enable track_io_timing = on in postgresql.conf to populate I/O timing columns.

Key patterns: (1) Range-partition by time (one partition per day/month) so old data can be dropped cheaply and queries on recent data only scan recent partitions. (2) Use TIMESTAMPTZ for all timestamps. (3) Create composite indexes on (entity_id, recorded_at) for per-entity time queries. (4) Use materialised views or summary tables for common aggregate windows (hourly/daily rollups). (5) TimescaleDB (a PostgreSQL extension) adds automatic time-based partitioning, continuous aggregates, and compression.

sql
-- Efficient hourly rollup query
SELECT
  DATE_TRUNC('hour', recorded_at) AS hour,
  AVG(cpu_pct) AS avg_cpu
FROM metrics
WHERE device_id = 'srv-01'
  AND recorded_at >= NOW() - INTERVAL '24 hours'
GROUP BY 1
ORDER BY 1;

Avoid storing one row per metric per timestamp for thousands of devices — use JSONB or array columns for batched metrics to reduce row count.

EAV stores flexible attributes as rows in a three-column table (entity_id, attribute_name, value) instead of as columns. It enables schema-free extensibility without ALTER TABLE but comes with severe costs: no data types (everything is text or requires multiple value columns), no NOT NULL constraints per attribute, joins to reconstruct an entity are complex, query performance degrades quickly, and it is hard to enforce referential integrity.

sql
-- EAV (anti-pattern)
SELECT entity_id,
  MAX(CASE WHEN attr='price' THEN value END)::NUMERIC AS price,
  MAX(CASE WHEN attr='name'  THEN value END)          AS name
FROM attributes
WHERE entity_id = 5
GROUP BY entity_id;

Better alternatives: PostgreSQL JSONB column for sparse/variable attributes (typed, indexed, queryable); hstore; partial columns with NULLs for known optional fields; or a proper polymorphic design with a base table and type-specific extension tables.

SCDs handle how historical changes to dimension data (e.g., customer address, product price) are tracked in a data warehouse. Type 1: Overwrite — just update the current value, no history kept. Simple but loses history. Type 2: Add a new row — each change creates a new record with effective_from/effective_to dates and a current_flag. Full history is preserved; the natural key + effective_from uniquely identifies a version.

sql
-- SCD Type 2 customer dimension
UPDATE dim_customer SET effective_to = NOW(), current_flag = FALSE
WHERE customer_id = 42 AND current_flag = TRUE;

INSERT INTO dim_customer (customer_id, address, effective_from, effective_to, current_flag)
VALUES (42, '456 New St', NOW(), '9999-12-31', TRUE);

Type 3: Add a column — store both old_value and new_value columns. Only the most recent change is tracked, not the full history. Type 2 is the most common in data warehouses.

PostgreSQL converts text to a tsvector (sorted list of normalised lexemes with position info) using to_tsvector(). Queries are expressed as tsquery with Boolean operators. The @@ operator checks if a tsquery matches a tsvector. A GIN index on the tsvector column makes search fast.

sql
-- Add a generated tsvector column (PostgreSQL 12+)
ALTER TABLE articles ADD COLUMN fts tsvector
  GENERATED ALWAYS AS
  (to_tsvector('english', coalesce(title,'')||' '||coalesce(body,''))) STORED;

CREATE INDEX idx_articles_fts ON articles USING GIN (fts);

-- Search
SELECT title, ts_rank(fts, q) AS rank
FROM articles, to_tsquery('english', 'database & performance') q
WHERE fts @@ q
ORDER BY rank DESC
LIMIT 10;

ts_rank and ts_rank_cd compute relevance scores. For production use, consider a dedicated search engine (Elasticsearch, Typesense) for richer features like fuzzy matching and facets.

PostGIS is a PostgreSQL extension that adds geometry and geography data types plus hundreds of spatial functions. GEOMETRY uses a Cartesian coordinate system; GEOGRAPHY uses spherical coordinates (longitude/latitude) and computes real-world distances in meters.

sql
-- Find restaurants within 1 km of a point
SELECT name, ST_Distance(location::geography,
  ST_MakePoint(-73.9857, 40.7484)::geography) AS dist_m
FROM restaurants
WHERE ST_DWithin(
  location::geography,
  ST_MakePoint(-73.9857, 40.7484)::geography,
  1000  -- 1 km radius
)
ORDER BY dist_m;

-- Speed it up with a GIST index
CREATE INDEX idx_restaurants_location ON restaurants USING GIST (location);

Common functions: ST_Distance, ST_Contains, ST_Intersects, ST_Within, ST_Buffer, ST_Area, ST_AsGeoJSON.

RLS lets you attach WHERE-clause-like policies to a table so that different database users (or application roles) see and can modify only the rows they are permitted to access. The policy is enforced at the storage layer regardless of which query is issued — even SELECT * FROM users only returns the allowed rows.

sql
ALTER TABLE notes ENABLE ROW LEVEL SECURITY;

-- Users can only see their own notes
CREATE POLICY notes_isolation ON notes
  USING (owner_id = current_setting('app.current_user_id')::INT);

-- Application sets the context variable at session start
SET app.current_user_id = 42;
SELECT * FROM notes;  -- returns only notes where owner_id = 42

RLS is used by Supabase to implement per-user data isolation without application-layer filtering. BYPASSRLS privilege allows superusers and service roles to skip policies. Always test policies with a non-superuser role.

SQL injection occurs when untrusted user input is concatenated into a SQL string, allowing an attacker to alter query logic. Classic example: input' OR '1'='1 turns a login check into an always-true condition.

sql
-- VULNERABLE (never do this)
query = "SELECT * FROM users WHERE email = '" + user_input + "'";

-- SAFE: parameterized query (PostgreSQL node-postgres)
const result = await client.query(
  'SELECT * FROM users WHERE email = $1',
  [userInput]
);

With a parameterised query the driver sends the SQL template and the parameter value separately — the database treats the value purely as data, never as SQL syntax. Additional defences: least-privilege DB roles (no DROP/ALTER), input validation, WAF rules, and pg_audit logging. ORMs use parameterised queries by default; raw string interpolation is the primary injection vector.

VACUUM reclaims storage occupied by dead tuples (MVCC leftovers) and updates the visibility map to allow index-only scans. It does not shrink the table file; VACUUM FULL rewrites the table to return space to the OS but takes an exclusive lock. ANALYZE collects column statistics (histograms, most-common values) used by the query planner to estimate row counts. Autovacuum handles both automatically, but run them manually when: (1) after a large bulk DELETE/UPDATE to reclaim space immediately; (2) after a bulk COPY/INSERT before a performance-critical query (so planner has fresh stats); (3) after enabling a new index to ensure the planner knows about updated statistics.

sql
VACUUM ANALYZE orders;          -- reclaim + refresh stats
VACUUM VERBOSE orders;          -- shows what was done
VACUUM FULL orders;             -- full rewrite (exclusive lock!)

A sequence is a persistent counter object that generates unique integers. SERIAL is a legacy shorthand that creates a sequence and sets the column default to nextval(). Identity columns (PostgreSQL 10+, SQL standard) use GENERATED ALWAYS AS IDENTITY or GENERATED BY DEFAULT AS IDENTITY and are more portable.

sql
-- Modern identity column
CREATE TABLE events (
  id   BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  data JSONB
);

-- Inspect or alter the sequence
SELECT last_value FROM events_id_seq;
ALTER SEQUENCE events_id_seq RESTART WITH 1000;

Sequences are non-transactional: if an INSERT rolls back, the sequence value is NOT returned, creating gaps. Gaps are normal and expected. Use uuid_generate_v4() or gen_random_uuid() (PostgreSQL 13+) when you need globally unique IDs across distributed systems.

Advisory locks are application-defined locks that PostgreSQL stores and manages but does not automatically associate with any table row or transaction. Applications acquire them by passing an integer key; the lock is exclusive or shared, and can be session-level or transaction-level.

sql
-- Acquire an exclusive session-level advisory lock (blocks until available)
SELECT pg_advisory_lock(12345);
-- ... do work that requires mutual exclusion ...
SELECT pg_advisory_unlock(12345);

-- Non-blocking try-acquire (returns false instead of waiting)
SELECT pg_try_advisory_lock(12345);

Common uses: leader election (only one cron worker runs at a time), distributed mutex (prevent duplicate job processing), co-ordination between background workers. They are lightweight compared to row locks and do not cause MVCC bloat. Session-level locks persist until released or the session ends; transaction-level locks release on COMMIT/ROLLBACK.

A LATERAL join allows a subquery in the FROM clause to reference columns from preceding FROM items (like a correlated subquery, but in FROM position). It is essential when the subquery must produce multiple rows per outer row — something a scalar correlated subquery cannot do.

sql
-- For each user, get their 3 most recent orders
SELECT u.id, u.email, o.id AS order_id, o.created_at
FROM users u
CROSS JOIN LATERAL (
  SELECT id, created_at
  FROM orders
  WHERE user_id = u.id
  ORDER BY created_at DESC
  LIMIT 3
) o;

LATERAL is also useful with set-returning functions like unnest(), jsonb_array_elements(), and generate_series() applied per row. Without LATERAL, a subquery in FROM cannot reference the outer table — it is evaluated independently.

Bulk DELETE or UPDATE of millions of rows in a single transaction: holds locks for the entire duration, generates massive WAL, can bloat tables severely, and may cause replication lag to spike. Best practices: 1. Batch in small chunks with a loop:

sql
DO $$
DECLARE deleted INT;
BEGIN
  LOOP
    DELETE FROM events
    WHERE id IN (
      SELECT id FROM events WHERE archived = TRUE LIMIT 5000
    );
    GET DIAGNOSTICS deleted = ROW_COUNT;
    EXIT WHEN deleted = 0;
    PERFORM pg_sleep(0.05);  -- throttle WAL / replication
  END LOOP;
END $$;

2. Use partition drops for time-based data instead of DELETE. 3. Run VACUUM ANALYZE after large deletes. 4. Schedule during low-traffic windows. 5. Monitor replication lag (pg_stat_replication) during the operation.

INTERSECT returns only rows that appear in both query result sets (the mathematical intersection). EXCEPT (MINUS in Oracle) returns rows from the first query that do not appear in the second. Both remove duplicates by default; append ALL to keep duplicates (INTERSECT ALL, EXCEPT ALL).

sql
-- Customers who have both placed orders AND left reviews
SELECT customer_id FROM orders
INTERSECT
SELECT customer_id FROM reviews;

-- Customers who ordered but have NOT reviewed anything
SELECT customer_id FROM orders
EXCEPT
SELECT customer_id FROM reviews;

Like UNION, both operators require matching column counts and compatible types. EXCEPT is a clean alternative to NOT IN / NOT EXISTS when the semantics are set-based and NULLs could cause NOT IN to misbehave.

A bitmap index scan is a two-phase strategy: phase 1 scans one or more indexes and builds an in-memory bitmap where each bit corresponds to a heap page, marking pages that contain matching rows. Phase 2 reads only the marked heap pages in physical page order (avoiding random I/O). PostgreSQL can AND/OR multiple bitmaps from separate indexes before hitting the heap — enabling multi-index queries.

sql
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE status = 'pending' AND amount > 500;
-- May show:
-- Bitmap Heap Scan on orders
--   -> BitmapAnd
--        -> Bitmap Index Scan on idx_orders_status
--        -> Bitmap Index Scan on idx_orders_amount

Bitmap scans are chosen when a plain Index Scan would require too many random reads (moderately selective queries on large tables). For highly selective queries (few matching rows) a regular Index Scan is used; for very low selectivity a Seq Scan wins. The bitmap is stored in work_mem — if it overflows, PostgreSQL uses "lossy" bitmaps (page-level, not row-level), adding a recheck step.

SQL Interview Questions (2026)

Frequently Asked Questions

Which SQL dialect should I study?

How important are window functions?

What about indexes?

Do I need NoSQL too?

How do you practice SQL?

Related Topics

Ready to apply?