Basic query patterns for fast recall
This note is literally just a list of queries for fast recall. It is not intended to be educational in any way.
A in B
Q: Give me a query for rows in A that have a match in B
SELECT a.*
FROM A a
WHERE EXISTS (
SELECT 1 FROM B b WHERE b.key = a.key
);
-- NOT IN (SELECT b.key FROM B b) can be less than ideal as an alternative if sub-query can return NULL
I.e. “if (key in b)”
A not in B
Q: Give me a query for rows in A with no match in B
SELECT a.*
FROM A a
WHERE NOT EXISTS (
SELECT 1 FROM B b WHERE b.key = a.key
);
A XOR B
Q: Give me a query for rows that are either in A or B, but not both
SELECT a.key
FROM A a
WHERE NOT EXISTS (SELECT 1 FROM B b WHERE b.key = a.key)
UNION ALL
SELECT b.key
FROM B b
WHERE NOT EXISTS (SELECT 1 FROM A a WHERE a.key = b.key);
TL;DR: NOT EXISTS (SELECT 1 FROM B WHERE B.key = A.key) is basically
(NR==FNR) {
b[$1]=1;
next;
}
!($1 in b) {
print;
}
Find top-n (ROW_NUMBER())
Q: Give me the second highest salary in an employees table
SELECT salary
FROM (
SELECT salary, ROW_NUMBER() OVER (ORDER BY salary DESC) AS rn
FROM employees
) ranked
WHERE rn = 2;
Find top and bottom n
Q: Give me the second highest and second lowest salary in an employees table
SELECT salary
FROM (
SELECT salary, ROW_NUMBER() OVER (ORDER BY salary DESC) AS rn,
COUNT('x') OVER () AS tc
FROM employees
) ranked
WHERE rn = 2 OR rn = (tc-1);
NOTE: doesn’t consider ties -> DENSE_RANK() if that matters
Find top-n per group (ROW_NUMBER() + PARTITION BY)
Q: Give me the top three salaries per department in an employees table
SELECT *
FROM (
SELECT
dept,
employee,
salary,
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rn
FROM employees
) ranked
WHERE rn <= 3;
Find top-n of summed value
Q: How do you find the top 3 customers by revenue?
WITH total AS (
SELECT
c.customer_name,
SUM(t.price) AS price_sum,
DENSE_RANK() OVER (ORDER BY SUM(t.price) DESC) as rnk
FROM customers c
JOIN transaction t ON c.customer_id = t.customer_id
GROUP BY c.customer_id, c.customer_name
)
SELECT customer_name, price_sum FROM total WHERE rnk <= 3;
NOTE: If customers with 0 transactions should show up:
- ->
JOINtoLEFT JOINwrapSUM(t.price)insideCOALESCE(..., 0)
Why not ROW_NUMBER()?
->
ROW_NUMBER()assigns a unique sequential number to each result, which gets exactlynrows->
DENSE_RANK()assigns the same number for ties, may not exactly getnrows, but gets topnvalues correctly in case ties need to be considered
a,100
b,100
c,90
d,80
->
ROW_NUMBER()-> 1,2,3,4->
DENSE_RANK()-> 1,1,2,3
Same attribute grouping
Q: Give me all the employees who have the same manager
SELECT manager_id, GROUP_CONCAT(employee_name) AS team_members
FROM employees
WHERE manager_id IS NOT NULL
GROUP BY manager_id;
PostgreSQL version:
SELECT
manager_id,
STRING_AGG(employee_name, ', ' ORDER BY employee_name) AS team_members
FROM employees
WHERE manager_id IS NOT NULL
GROUP BY manager_id;
PostgreSQL’s STRING_AGG requires an explicit delimiter.
Duplicate Filtering
Q: Give me a query to delete duplicate products from a product table
WITH duplicates AS (
SELECT product_id, product_name, price,
ROW_NUMBER() OVER (PARTITION BY product_name, price ORDER BY product_id) AS rn
FROM product
)
DELETE p FROM product p JOIN duplicates d ON p.product_id = d.product_id
WHERE d.rn > 1;
For small ad-hoc tasks/reports, consider piping output through awk and rebuilding from the results, the idiom is easier to remember than this.
PostgreSQL syntax differs ever so slightly:
WITH duplicates AS (
SELECT product_id, product_name, price,
ROW_NUMBER() OVER (PARTITION BY product_name, price ORDER BY product_id) AS rn
FROM product
)
DELETE p FROM product p USING duplicates d WHERE p.product_id = d.product_id
AND d.rn > 1;
Relational division (GROUP BY + HAVING COUNT = n)
Q: Give me a query to find customers who made purchases in every month of a year.
WITH purchases AS (
SELECT customer_id, MONTH(date) as transaction_month
FROM transaction
WHERE YEAR(date) = 2025
GROUP BY customer_id, MONTH(date)
)
SELECT customer_id FROM purchases GROUP BY customer_id HAVING COUNT(transaction_month) = 12;
PostgreSQL has no MONTH(), use DATE_TRUNC().
Contribution analysis
Q: Give me a query to calculate the % contribution of each product to total value
WITH product_summary AS (
SELECT
t.product_id,
SUM(t.price) AS price_sum
FROM
transaction t
GROUP BY
t.product_id
)
SELECT
ps.product_id,
ps.price_sum,
ROUND((ps.price_sum * 100.0) / SUM(ps.price_sum) OVER (), 2) AS contrib
FROM
product_summary ps;
Anti-Join
Q: Give me a query to look for products that had no sales in the last 30 days
WITH product_summary AS (
SELECT DISTINCT
t.product_id
FROM
transaction t
WHERE
t.date >= (NOW() - INTERVAL 30 DAY)
)
SELECT p.product_id FROM product p
LEFT JOIN product_summary ps
ON p.product_id = ps.product_id
WHERE ps.product_id IS NULL;
In a real system, this kind of query is not always feasible. A transaction ledger might be hundreds of millions, or even billions of rows. This can potentially turn into a full table scan. A summary table would be the realistic option, if this information is frequently requested.
Without CTE for old DB versions
-- products that have at least one transaction (Semi-Join)
SELECT p.*
FROM product p
WHERE EXISTS (
SELECT 1 FROM transaction t WHERE t.product_id = p.product_id AND t.date >= (NOW() - INTERVAL 30 DAY)
);
-- products with no transactions (Anti-Join)
SELECT p.*
FROM product p
WHERE NOT EXISTS (
SELECT 1 FROM transaction t WHERE t.product_id = p.product_id AND t.date >= (NOW() - INTERVAL 30 DAY)
);
Self-Join
Q: Give me a query to find all pairs of employees who work for the same department
SELECT e1.name AS employee1, e2.name AS employee2, e1.department
FROM employees e1
JOIN employees e2 ON e1.department = e2.department
WHERE e1.employee_id < e2.employee_id; -- Avoid duplicates and self-pairs
The WHERE clause makes sure we don’t pick up things like [Bob,Bob].
Gaps and Islands
Q: Give me a query to find all periods of consecutive days where a user logged in
WITH unique_login_days AS (
-- 1. Ensure one record per user per day
SELECT DISTINCT user_id, CAST(login_at AS DATE) AS login_date
FROM user_logins
),
login_summary AS (
-- 2. Summary of each login day sequentially for each user
SELECT user_id, login_date,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY login_date) AS rn
FROM unique_login_days
),
buckets AS (
-- 3. Subtract the rank from the date.
-- Consecutive dates will result in the same 'bucket_id' date.
SELECT user_id, login_date,
(login_date - INTERVAL rn DAY) AS bucket_id
FROM login_summary
)
-- 4. Group by the bucket_id to find the start, end, and duration
SELECT user_id,
MIN(login_date) AS streak_start,
MAX(login_date) AS streak_end,
COUNT('x') AS consecutive_days
FROM buckets
GROUP BY user_id, bucket_id
ORDER BY user_id, streak_start;
If only streaks of a certain length are of interest, slap a HAVING onto the final query, e.g. HAVING COUNT('x') >= 3;
PostgreSQL: login_date - rn * INTERVAL '1 day'
Conditional aggregation (SUM(CASE WHEN …))
Q: Give me a query to find out how many windows/linux machines are available to a department
SELECT
dept,
SUM(CASE WHEN symbol= 'W' THEN 1 ELSE 0 END) AS windows,
SUM(CASE WHEN symbol = 'L' THEN 1 ELSE 0 END) AS linux
FROM machines
GROUP BY dept;
Running totals
Q: Give me a query for a cumulative sum of positive sales
WITH transaction_summary AS (
SELECT DAYOFYEAR(date) AS d, COUNT('x') AS c
FROM transaction
WHERE price > 0
GROUP BY d
)
SELECT d, c, SUM(c) OVER (ORDER BY d) FROM transaction_summary ORDER BY d;
PostgreSQL: No DAYOFYEAR(), use EXTRACT(DOY FROM TIMESTAMP date)
Compare with previous - LAG()
Q: Give me a query to find instances where sales increased from the previous day
WITH sales_summary AS (
SELECT
date,
sales,
LAG(sales) OVER (ORDER BY date) AS previous_sales
FROM daily_sales
)
SELECT date, sales, previous_sales
FROM sales_summary
WHERE sales > previous_sales;
UPSERT
MariaDB/MySQL:
INSERT INTO price_history (product_id, price)
VALUES (1, 9.99)
ON DUPLICATE KEY UPDATE price = VALUES(price);
PostgreSQL:
INSERT INTO price_history (product_id, price)
VALUES (1, 9.99)
ON CONFLICT (product_id)
DO UPDATE SET price = EXCLUDED.price;