GROUP BY and HAVING
An introduction to the GROUP BY and HAVING statements in SQL
Aggregate functions take multiple inputs and provide a single output. The most common are:
AVG() which returns the average value
SELECT AVG(column_name) FROM table_name
COUNT() which returns the number of values
SELECT COUNT(*) FROM table_name
MAX() which returns the maximum value
SELECT MAX(column_name) FROM table_name
MIN() which returns the minimum value
SELECT MIN(column_name) FROM table_name
SUM() which adds up all the values and returns the answer
SELECT SUM(column_name) FROM table_name
ROUND() this rounds off a number up to a specified number of decimal places. Used a lot with AVG() with get a response that is easier to read and work with
SELECT ROUND(AVG(column_name)) FROM table_name
Aggregate functions can only be called in the SELECT or HAVING clause
Aggregate functions return a single value, and as such you can’t include multiple columns in your query when using the aggregate functions on their own. You are able to run multiple aggregate functions in a single query because they will both return a single value. Each function will return its own result
SELECT MAX(column_name), MIN(column_name), ROUND(AVG(column_name),2) FROM table_name
GROUP BY allows us to aggregate and apply functions to better understand how data is distributed per category. It allows us to aggregate columns by category.
SELECT category_column, AGGREGATE_FUNCTION(data_column) FROM table_name GROUP BY (category_column)
First we need a category column to group by. They are treated as non continuous for the sake of the group by. They can appear to be continuous, but will be treated as not, for the sake of the GROUP BY. Also, numeric values can also be categorical.
The way it works is that the category column is first split into various mini tables, then an aggregate function is run on those mini tables, and then a result is given with a “new table” grouped into those specified categories, but with aggregated values.
A | 3 |
B | 5 |
C | 9 |
C | 6 |
A | 8 |
B | 2 |
Becomes three tables like
A | 3 |
A | 8 |
B | 5 |
B | 2 |
C | 9 |
C | 6 |
Then they are put back into one table, but after being aggregated by their group. The below example uses SUM() as the aggregator.
A | 11 |
B | 7 |
C | 15 |
The GROUP BY clause must always appear immediately after a FROM or WHERE statement. All the columns that are in the GROUP BY call must also be present in the SELECT call. The WHERE statement should also not refer to the aggregate result. That can be done with the HAVING statement
The HAVING clause allows us to filter after an aggregation has already taken place. In the below query, we are filtering using WHERE before executing the GROUP BY
SELECT company, SUM(sales) FROM finance_table WHERE company != ‘Google’ GROUP BY company
HAVING allows us to use the aggregate result as a filter along with a GROU BY. It filters what has already been aggregated by the GROUP BY. In the below example, the WHERE is filtered first, ten the results are grouped according to the GROUP BY, and then the HAVING filter is applied
SELECT company, SUM(sales) FROM finance_table WHERE company != ‘Google’ GOUP BY company HAVING SUM(sales) > 1000