Core Query Mechanics
SELECT fundamentals: Retrieve columns from tables. Use SELECT * sparingly in production—specify columns for performance and clarity.
-- SELECT: Specify columns explicitly
SELECT user_id, email, created_at
FROM users;FROM & JOIN: Combine data from multiple tables. Master INNER, LEFT, RIGHT, and FULL OUTER joins. Understand the difference between JOIN (explicit) and WHERE-based joins (implicit, avoid in modern code).
-- INNER JOIN: Only matching rows from both tables
SELECT u.user_id, u.email, o.order_id, o.total
FROM users u
INNER JOIN orders o ON u.user_id = o.user_id;
-- LEFT JOIN: All users, even without orders
SELECT u.user_id, u.email, o.order_id
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id;
-- Multiple joins
SELECT u.email, o.order_id, oi.product_id, oi.quantity
FROM users u
INNER JOIN orders o ON u.user_id = o.user_id
INNER JOIN order_items oi ON o.order_id = oi.order_id;
JOINdefaults toINNER JOINif no join type is specified. Explicitly writingINNER JOINis clearer and makes intent obvious, especially in queries with multiple joins mixing INNER and LEFT/RIGHT joins
WHERE clause: Filter rows before aggregation. Supports comparison operators, pattern matching (LIKE, ILIKE), range checks (BETWEEN), and set membership (IN).
-- WHERE: Basic filtering
SELECT * FROM orders
WHERE total > 100 AND status = 'completed';
-- Pattern matching
SELECT * FROM users
WHERE email ILIKE '%@gmail.com'; -- case-insensitive
-- Range and set membership
SELECT * FROM orders
WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31'
AND status IN ('pending', 'processing');DISTINCT: Remove duplicate rows. Often signals a data modeling issue if overused.
-- DISTINCT: Remove duplicates
SELECT DISTINCT country FROM users;Filtering & Sorting
WHERE vs HAVING: WHERE filters rows; HAVING filters aggregated results after GROUP BY.
-- WHERE filters before aggregation, HAVING filters after
SELECT
status,
COUNT(*) as order_count,
SUM(total) as total_revenue
FROM orders
WHERE created_at >= '2024-01-01' -- WHERE filters rows
GROUP BY status
HAVING COUNT(*) > 10 -- HAVING filters groups
ORDER BY total_revenue DESC;ORDER BY: Sort results. Use ASC (default) or DESC. Can order by multiple columns, expressions, or column positions (though explicit names are clearer).
-- ORDER BY: Sorting
SELECT * FROM users
ORDER BY created_at DESC, email ASC;LIMIT & OFFSET: Paginate results. Be cautious with OFFSET on large datasets—performance degrades linearly.
-- LIMIT & OFFSET: Pagination
SELECT * FROM products
ORDER BY price DESC
LIMIT 20 OFFSET 40; -- Get 3rd page (20 items per page)NULL handling: Use IS NULL / IS NOT NULL. Remember NULL != NULL in comparisons. Use COALESCE for default values.
-- NULL handling
SELECT * FROM users
WHERE last_login IS NULL;
SELECT COALESCE(phone_number, 'N/A') AS phone FROM users;Aggregation & Grouping
Aggregate functions: COUNT, SUM, AVG, MIN, MAX. COUNT(*) vs COUNT(column)—the latter excludes NULLs.
-- Aggregate functions
SELECT
COUNT(*) as total_users,
COUNT(email) as users_with_email,
SUM(order_total) as total_revenue,
AVG(order_total) as avg_order_value,
MIN(created_at) as oldest_user,
MAX(created_at) as newest_user
FROM orders;GROUP BY: Combine rows sharing common values. Every non-aggregated column in SELECT must appear in GROUP BY.
-- GROUP BY
SELECT country, COUNT(*) AS user_count
FROM users
GROUP BY country
ORDER BY user_count DESC;
-- Multiple aggregations
SELECT
user_id,
COUNT(*) AS order_count,
SUM(total) AS total_spent,
AVG(total) AS avg_order_value,
MAX(total) AS largest_order
FROM orders
GROUP BY user_id;HAVING: Filter aggregated results after GROUP BY.
-- HAVING: Filter aggregated results
SELECT
status,
COUNT(*) as order_count,
SUM(total) as total_revenue
FROM orders
WHERE created_at >= '2024-01-01'
GROUP BY status
HAVING COUNT(*) > 10
ORDER BY total_revenue DESC;Window functions: Perform calculations across row sets without collapsing them (unlike GROUP BY). Common ones: ROW_NUMBER(), RANK(), LAG(), LEAD(), SUM() OVER().
-- Window functions: Running totals without collapsing rows
-- PARTITION BY: Like GROUP BY, but just for the calculation. It restarts the calculation for each group.
-- ORDER BY: Defines the sequence in which the calculation happens (crucial for running totals or ranking).
SELECT
order_id,
user_id,
total,
-- Calculates running total by date (earliest first)
SUM(total) OVER (PARTITION BY user_id ORDER BY created_at) AS running_total,
-- Calculates order rank by date (latest first)
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS order_rank
FROM orders;
-- LAG/LEAD: Access previous/next row values
-- LAG: Accesses the value from the previous row in the window.
-- LEAD: Accesses the value from the next row in the window.
SELECT
order_id,
created_at,
total,
-- Calculates the previous order total
LAG(total) OVER (ORDER BY created_at) AS previous_order_total,
-- Calculates the difference between current and previous order
total - LAG(total) OVER (ORDER BY created_at) AS difference
FROM orders;The best way to understand window functions is to think of them as “aggregations that don’t squash your rows.” In a normal GROUP BY query, you lose the individual row details—everything gets collapsed into one summary row. Window functions let you keep the individual rows while also adding a column that has calculated data from surrounding rows (the “window”).
Data Modification
INSERT: Add rows. Use RETURNING clause to get inserted values (very useful for getting auto-generated IDs).
-- INSERT: Single row
INSERT INTO users (email, name, country)
VALUES ('user@example.com', 'John Doe', 'US');
-- INSERT: Multiple rows
INSERT INTO users (email, name, country)
VALUES
('user1@example.com', 'Alice', 'UK'),
('user2@example.com', 'Bob', 'CA');
-- INSERT with RETURNING
INSERT INTO users (email, name)
VALUES ('new@example.com', 'Jane')
RETURNING user_id, created_at;UPDATE: Modify existing rows. Always test with SELECT first using the same WHERE clause.
-- UPDATE
UPDATE users
SET last_login = NOW()
WHERE user_id = 123;
-- UPDATE with RETURNING
UPDATE orders
SET status = 'shipped'
WHERE order_id = 456
RETURNING order_id, status, updated_at;DELETE: Remove rows. Use WHERE carefully—omitting it deletes all rows.
-- DELETE
DELETE FROM orders
WHERE status = 'cancelled' AND created_at < NOW() - INTERVAL '1 year';UPSERT (INSERT … ON CONFLICT): PostgreSQL-specific. Insert or update based on constraint violations.
-- UPSERT (INSERT ... ON CONFLICT)
INSERT INTO user_preferences (user_id, theme, language)
VALUES (123, 'dark', 'en')
ON CONFLICT (user_id)
DO UPDATE SET
theme = EXCLUDED.theme,
language = EXCLUDED.language,
updated_at = NOW();RETURNING is a PostgreSQL feature that some databases have adopted in various forms.
CTEs & Subqueries
-- CTE: Break complex queries into readable parts
WITH high_value_users AS (
SELECT user_id, SUM(total) AS total_spent
FROM orders
GROUP BY user_id
HAVING SUM(total) > 1000
)
SELECT u.email, hvu.total_spent
FROM high_value_users hvu
JOIN users u ON hvu.user_id = u.user_id
ORDER BY hvu.total_spent DESC;
-- Multiple CTEs
WITH
user_stats AS (
SELECT user_id, COUNT(*) AS order_count
FROM orders
GROUP BY user_id
),
active_users AS (
SELECT user_id FROM users WHERE last_login > NOW() - INTERVAL '30 days'
)
SELECT u.email, us.order_count
FROM user_stats us
JOIN active_users au ON us.user_id = au.user_id
JOIN users u ON us.user_id = u.user_id;
-- Recursive CTE: Hierarchical data
WITH RECURSIVE employee_hierarchy AS (
SELECT employee_id, name, manager_id, 1 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
SELECT e.employee_id, e.name, e.manager_id, eh.level + 1
FROM employees e
JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT * FROM employee_hierarchy ORDER BY level, name;Performance & Indexing
Indexes: Speed up lookups but slow down writes. B-tree (default) for equality/range, GIN for JSON/arrays, GiST for geometric data.
-- B-tree index (default, good for equality and range queries)
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_created_at ON orders(created_at DESC);
-- Composite index
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- Partial index (smaller, faster for specific queries)
CREATE INDEX idx_active_users ON users(user_id)
WHERE deleted_at IS NULL;
-- GIN index for JSONB
CREATE INDEX idx_events_data ON events USING GIN(data);
-- Index for text search
CREATE INDEX idx_products_name ON products USING GIN(to_tsvector('english', name));Rule of thumb: If you’re searching inside a value (JSON keys, words in text, elements in arrays), use GIN. If you’re searching for a value (exact match, range), use B-tree.
EXPLAIN / EXPLAIN ANALYZE: Understand query execution plans. Look for sequential scans on large tables — often indicate missing indexes.
-- EXPLAIN: See query plan (doesn't execute)
EXPLAIN
SELECT * FROM users WHERE email = 'test@example.com';
-- EXPLAIN ANALYZE: See actual execution (runs query)
EXPLAIN ANALYZE
SELECT u.email, COUNT(o.order_id)
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
GROUP BY u.email;
-- Check index usage
EXPLAIN ANALYZE
SELECT * FROM users WHERE email = 'test@example.com';Query optimization: Avoid SELECT *, use indexes wisely, consider denormalization for read-heavy workloads, use prepared statements to prevent SQL injection and improve performance.
-- Avoid N+1 queries: Use JOIN instead of multiple queries
-- Bad: Query in loop
-- Good: Single query with JOIN
SELECT u.user_id, u.email, json_agg(json_build_object('id', o.order_id, 'total', o.total)) AS orders
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
GROUP BY u.user_id, u.email;PostgreSQL-Specific Features
JSON/JSONB: Store and query JSON. JSONB is binary, supports indexing, and is generally preferred.
-- JSONB operations
CREATE TABLE events (
event_id SERIAL PRIMARY KEY,
data JSONB
);
INSERT INTO events (data)
VALUES ('{"user_id": 123, "action": "login", "metadata": {"ip": "1.2.3.4"}}');
-- Query JSON fields
SELECT * FROM events
WHERE data->>'action' = 'login';
SELECT * FROM events
WHERE data->'metadata'->>'ip' = '1.2.3.4';
-- JSON aggregation
SELECT
data->>'user_id' AS user_id,
jsonb_agg(data->'action') AS actions
FROM events
GROUP BY data->>'user_id';Array operations: Native array support with functions like array_agg, unnest, ANY/ALL.
-- Arrays
SELECT ARRAY[1, 2, 3, 4, 5];
SELECT array_agg(order_id) AS order_ids
FROM orders
WHERE user_id = 123;
SELECT unnest(ARRAY[1, 2, 3]) AS value;
-- Check if value in array
SELECT * FROM products
WHERE 'electronics' = ANY(categories);
-- Array operations
UPDATE products
SET tags = array_append(tags, 'featured')
WHERE product_id = 456;Transactions: BEGIN, COMMIT, ROLLBACK. Use for atomic operations across multiple statements.
-- Basic transaction
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE account_id = 1;
UPDATE accounts SET balance = balance + 100 WHERE account_id = 2;
COMMIT;
-- Rollback on error
BEGIN;
UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 123;
-- If this fails, entire transaction rolls back
INSERT INTO orders (user_id, product_id) VALUES (456, 123);
COMMIT;
-- Savepoints for partial rollback
BEGIN;
INSERT INTO logs (message) VALUES ('Step 1');
SAVEPOINT step1;
INSERT INTO logs (message) VALUES ('Step 2');
SAVEPOINT step2;
-- Rollback to step1, keeping first insert
ROLLBACK TO SAVEPOINT step1;
COMMIT;