Grouping and Sorting Data
Objectives
Explain the importance of grouping and sorting data stored in a database
Group and sort data with the
GROUP BY()
andORDER BY()
keywordsCraft advanced queries using aggregator functions along with sorting keywords and other conditional clauses
Grouping and Sorting Data
SQL isn't picky about how it returns data to you, based on your queries. It will simply return the relevant table rows in the order in which they exist in the table. This is often insufficient for the purposes of data analysis and organization.
How common is it to order a list of items alphabetically? Or numerically from least to greatest?
We can tell our SQL queries and aggregate functions to group and sort our data using several clauses:
ORDER BY()
LIMIT
GROUP BY()
HAVING
andWHERE
ASC
/DESC
Let's take a closer look at how we use these keywords to narrow our search criteria as well as to order and group it.
Setting up the Database
Some cats are very famous, and accordingly very wealthy. Our Pets Database will have a cats
table in which each cat has a name, age, breed, and net worth. Our database will also have an owners
table and cats_owners
join table so that a cat can have many owners and an owner can have many cats.
Creating the Database:
Create the database in your terminal with the following:
Creating the tables:
In the sqlite3>
prompt in your terminal:
cats
table:
owners
Table:
cats_owners
Table:
Inserting the values:
cats
:
owners
:
cats_owners
:
Code Along I: ORDER BY()
ORDER BY()
Syntax
ORDER BY()
will automatically sort the returned values in ascending order. Use the DESC
keyword, as above, to sort in descending order.
Exercise
Imagine you're working for an important investment firm in Manhattan. The investors are interested in investing in a lucrative and popular cat. They need your help to decide which cat that will be. They want a list of famous and wealthy cats. We can do that with a basic SELECT
statement:
This will return:
Our investors are busy people though. They don't have time to manually sort through this list of cats for the best candidate. They want you to return the list to them with the cats sorted by net worth, from greatest to least.
We can do so with the following lines:
This will return:
Code Along II: The LIMIT
Keyword
LIMIT
KeywordTurns out our investors are very impatient. They don't want to review the list themselves, they just want you to return to them the wealthiest cat. We can accomplish this by using the LIMIT
keyword with the above query:
Which will return:
The LIMIT
keyword specifies how many of the records resulting from the query you'd like to actually return.
Code Along III: GROUP BY()
GROUP BY()
The GROUP BY()
keyword is very similar to ORDER BY()
. The only difference is that ORDER BY()
sorts the resulting data set of basic queries while GROUP BY()
sorts the result sets of aggregate functions.
Syntax
Exercise
Let's calculate the sum of the net worth of all of the cats, grouped by owner name:
This should return:
In the above query, we've implemented two joins. First, we're joining owners
and cat_owners
on owners.id = cats_owners.owner_id
. This first joined table would look like the following if we were to query it:
With this table, we then implement a second join with cats
on cats_owners.cat_id = cats.id
. To better understand this, try running the provided query, but select everything rather than just the owner's name and the sum of their cats' net worth, and remove the GROUP BY
line. You'll be able to see all three tables have been joined.
In our example query above, when we use the SUM(cats.net_worth)
aggregator in conjunction with GROUP BY
, the combination changes the way that our query behaves. Without GROUP BY
, we would get a sum of the net worth of all the cats:
By adding GROUP BY
, we now get the net_worth of all cats by owner. In our original data, Sophie is the owner of Maru and Hana (100000 + 21800), while Penny is the owner of Lil' Bub (181600).
SUM
looks at all of the values in the net_worth
column of the cats
table (or whichever column you specify in parentheses) and takes the sum of the those values, but only after those cats have been grouped.
Note: If you forget to add
SUM
here and just try to getcats.net_worth
, you'll still group by owner, but it will only display the first cat's net worth, not the aggregate.
Code Along IV: HAVING
vs WHERE
clause
HAVING
vs WHERE
clauseSuppose we have a table called employee_bonus
as shown below. Note that the table has multiple entries for employees Abigail and Matthew.
employee_bonus
:
Employee
Bonus
Matthew
1000
Abigail
2000
Matthew
500
Tom
700
Abigail
1250
To calculate the total bonus that each employee received, we would write a SQL statement like this:
This should return:
Employee
Bonus
Abigail
3250
Matthew
1500
Tom
700
Now, suppose we wanted to find the employees who received more than $1,000 in bonuses for the year of 2007. You might think that we could write a query like this:
Unfortunately, the above will not work because the WHERE
clause doesn’t work with aggregates – like SUM
, AVG
, MAX
, etc. What we need to use is the HAVING
clause. The HAVING
clause was added to SQL so that we could compare aggregates to other values – just how the WHERE
clause can be used with non-aggregates. Now, the correct SQL will look like this:
Difference between HAVING
and WHERE
clause
HAVING
and WHERE
clauseThe difference between the HAVING
and WHERE
clause in SQL is that the WHERE
clause can not be used with aggregates but the HAVING
clause can. HAVING filters out groups of rows, created by 'GROUP BY' and WHERE filters out rows. Another way to think of it is that the HAVING
clause is an additional filter to the WHERE
clause.
HAVING supports aggregate functions as it has to work with groups of rows. so for example, if there are multiple integers in a group it can filter out the groups with a low average, a high total (sum) or count how many rows are in the group.
WHERE on the other hand deals with each row individually, so aggregate functions wouldn't work for what would you be aggregating.
Also, note syntax differences:
HAVING is after GROUP BY and WHERE is before GROUP BY changing the order will produce a syntax error.
Resources
View Grouping and Sorting Data on Learn.co and start learning to code for free.
Last updated