GROUP BY and HAVING

GROUP BY and HAVING

An introduction to the GROUP BY and HAVING statements in SQL

Aggregate functions take multiple inputs and provide a single output. The most common are:

AVG() which returns the average value

SELECT AVG(column_name) FROM table_name

COUNT() which returns the number of values

SELECT COUNT(*) FROM table_name

MAX() which returns the maximum value

SELECT MAX(column_name) FROM table_name

MIN() which returns the minimum value

SELECT MIN(column_name) FROM table_name

SUM() which adds up all the values and returns the answer

SELECT SUM(column_name) FROM table_name

ROUND() this rounds off a number up to a specified number of decimal places. Used a lot with AVG() with get a response that is easier to read and work with

SELECT ROUND(AVG(column_name)) FROM table_name

Aggregate functions can only be called in the SELECT or HAVING clause

Aggregate functions return a single value, and as such you can’t include multiple columns in your query when using the aggregate functions on their own. You are able to run multiple aggregate functions in a single query because they will both return a single value. Each function will return its own result

SELECT MAX(column_name), MIN(column_name), ROUND(AVG(column_name),2) FROM table_name

GROUP BY allows us to aggregate and apply functions to better understand how data is distributed per category. It allows us to aggregate columns by category.

SELECT category_column, AGGREGATE_FUNCTION(data_column) FROM table_name GROUP BY (category_column)

First we need a category column to group by. They are treated as non continuous for the sake of the group by. They can appear to be continuous, but will be treated as not, for the sake of the GROUP BY. Also, numeric values can also be categorical.

The way it works is that the category column is first split into various mini tables, then an aggregate function is run on those mini tables, and then a result is given with a “new table” grouped into those specified categories, but with aggregated values.

A

3

B

5

C

9

C

6

A

8

B

2

Becomes three tables like

A

3

A

8

B

5

B

2

C

9

C

6

Then they are put back into one table, but after being aggregated by their group. The below example uses SUM() as the aggregator.

A

11

B

7

C

15

The GROUP BY clause must always appear immediately after a FROM or WHERE statement. All the columns that are in the GROUP BY call must also be present in the SELECT call. The WHERE statement should also not refer to the aggregate result. That can be done with the HAVING statement

The HAVING clause allows us to filter after an aggregation has already taken place. In the below query, we are filtering using WHERE before executing the GROUP BY

SELECT company, SUM(sales) FROM finance_table WHERE company != ‘Google’ GROUP BY  company

HAVING allows us to use the aggregate result as a filter along with a GROU BY. It filters what has already been aggregated by the GROUP BY. In the below example, the WHERE is filtered first, ten the results are grouped according to the GROUP BY, and then the HAVING filter is applied

SELECT company, SUM(sales) FROM finance_table WHERE company != ‘Google’ GOUP BY company HAVING SUM(sales) > 1000