Advanced Database Operation
Advanced Database Operation
In PostgreSQL, the "HAVING" statement is used in conjunction with the "GROUP BY" statement to filter the
results of a query based on a condition or criteria that applies to groups of rows. The "HAVING" statement is
used to filter the groups generated by the "GROUP BY" statement.
The "HAVING" and "WHERE" statements are not interchangeable in SQL. Although they both filter data, they
serve different purposes and are used in different parts of a SQL query.
The "WHERE" statement is used to filter the rows returned by a SQL query based on a condition that applies to
individual rows. It is typically used to narrow down the results to a specific subset of rows that meet a certain
criteria. The "WHERE" statement is specified before the "GROUP BY" statement in a SQL query.
The "HAVING" statement, on the other hand, is used to filter the groups generated by a "GROUP BY" statement
based on a condition that applies to groups of rows. It is typically used to filter the results of an aggregate
function such as COUNT, SUM, AVG, MAX, or MIN. The "HAVING" statement is specified after the "GROUP BY"
statement in a SQL query.
The general syntax of a query that uses the "HAVING" statement in PostgreSQL is as follows:
In this syntax, "column1" and "column2" are the columns that you want to group by, and "column3" is the
column that you want to apply an aggregate function to, such as SUM, COUNT, AVG, MAX, or MIN. The
"condition" in the "HAVING" statement specifies the condition that the groups must meet in order to be included
in the query results.
Suppose you want to find the total revenue generated by each album, and you only want to include albums that
have a total revenue greater than $100. You can use the following SQL query:
In [1]:
df_6 = _deepnote_execute_sql('SELECT "Album"."Title", SUM("Track"."UnitPrice" * "InvoiceL
ine"."Quantity") AS "RevenueTotal"\n FROM "Album"\n JOIN "Track" ON "Album"."AlbumId" = "Tr
ack"."AlbumId"\n JOIN "InvoiceLine" ON "Track"."TrackId" = "InvoiceLine"."TrackId"\n GROUP
BY "Album"."AlbumId"', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829EB3188444')
df_6
Out[1]:
Title RevenueTotal
In [4]:
df_1 = _deepnote_execute_sql('SELECT "Album"."Title", SUM("Track"."UnitPrice" * "InvoiceL
ine"."Quantity") AS "RevenueTotal"\n FROM "Album"\n JOIN "Track" ON "Album"."AlbumId" = "Tr
ack"."AlbumId"\n JOIN "InvoiceLine" ON "Track"."TrackId" = "InvoiceLine"."TrackId"\n GROUP
BY "Album"."AlbumId"\n HAVING SUM("Track"."UnitPrice" * "InvoiceLine"."Quantity") > 10\n OR
DER BY "RevenueTotal" DESC', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829EB3188444')
df_1
Out[4]:
Title RevenueTotal
64 rows × 2 columns
In this example, the "JOIN" statements link the "Album", "Track", and "InvoiceLine" tables together, and the
"GROUP BY" statement groups the results by album. The "SUM(Track.UnitPrice * InvoiceLine.Quantity)"
calculates the total revenue for each album based on the unit price of each track and the quantity sold on each
invoice. Finally, the "HAVING" statement filters the results to only include albums that have a total revenue
greater than $100.
Note that in the Chinook database, the "album" table does not have a "total revenue" column, so we need to use
the "JOIN" and "SUM" statements to calculate it.
QUIZ: Write a query to calculate total genre which has more than 100 tracks
In [5]:
df_2 = _deepnote_execute_sql('SELECT "Genre"."Name", COUNT("Track"."TrackId") \n FROM "Gen
re"\n JOIN "Track" ON "Track"."GenreId" = "Genre"."GenreId"\n GROUP BY "Genre"."Name"\n HAVI
NG COUNT("Track"."TrackId") > 100', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829EB3188444')
df_2
Out[5]:
Name count
0 Latin 579
0 Latin 579
Name count
1 Rock 1297
3 Metal 374
4 Jazz 130
In [8]:
DeepnoteChart(df_2, """{"layer":[{"layer":[{"mark":{"clip":true,"type":"bar","tooltip":tr
ue},"encoding":{"x":{"sort":null,"type":"nominal","field":"Name","scale":{"type":"linear"
}},"y":{"sort":null,"type":"quantitative","field":"count","scale":{"type":"linear"},"form
at":{"type":"default","decimals":null},"aggregate":"sum","formatType":"numberFormatFromNu
mberType"},"color":{"sort":null,"type":"nominal","field":"Name","scale":{"scheme":"tablea
u10"}}}}]}],"title":"","config":{"legend":{} },"$schema":"https://vega.github.io/schema/ve
ga-lite/v5.json","encoding":{} }""")
Out[8]:
<__main__.DeepnoteChart at 0x7f4bd2d37550>
In [10]:
df_7 = _deepnote_execute_sql('SELECT * FROM "Album" WHERE "Title" = \' Big Ones\' OR "Titl
e" = \' Facelift\' ', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829EB3188444')
df_7
Out[10]:
0 5 Big Ones 3
1 7 Facelift 5
In [11]:
df_8 = _deepnote_execute_sql('SELECT * FROM "Album" WHERE "Title" IN (\' Big Ones\' , \' Fac
elift\' )', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829EB3188444')
df_8
Out[11]:
0 5 Big Ones 3
1 7 Facelift 5
Subquery operations
A subquery, or inner query, is a query that is embedded within another query in SQL. Subqueries are used to
retrieve data that will be used in the main query as a filter or as a value for a calculation.
A subquery is enclosed in parentheses and can be placed in various parts of a SQL statement, such as the
WHERE clause, HAVING clause, or SELECT clause. The results of the subquery are used by the outer query to
generate the final result set. For example, consider the following SQL query:
SELECT *
FROM orders
WHERE customerid IN (SELECT customerid FROM customers WHERE country = 'USA');
In this query, the subquery is used in the WHERE clause to retrieve the customer IDs of customers who are
located in the USA. The outer query then uses these customer IDs to filter the orders table to only include orders
from customers in the USA.
Suppose you want to find all the tracks that have the same genre as the track with ID 1. You can use a subquery
in the WHERE clause to achieve this:
In [12]:
df_3 = _deepnote_execute_sql('SELECT *\n FROM "Track"\n WHERE "GenreId" = (SELECT "GenreId"
FROM "Track" WHERE "TrackId" = 1);', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829EB3188444')
df_3
Out[12]:
F. Baltes, S. Kaufman,
2 3 Fast As a Shark 3 2 1 U. Dirkscneider & W. 230619 3990994 0.99
Ho...
... ... ... ... ... ... ... ... ... ...
Tease Me Please
1292 3297 257 2 1 None 287229 4811894 0.99
Me
In [14]:
df_9 = _deepnote_execute_sql('EXPLAIN ANALYZE SELECT *\n FROM "Track"\n WHERE "GenreId" = (
SELECT "GenreId" FROM "Track" WHERE "TrackId" = 1);', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829EB3
188444')
df_9
Out[14]:
QUERY PLAN
Subqueries can also be used in the SELECT clause to calculate a value based on a subquery. For example:
In [15]:
df_4 = _deepnote_execute_sql('SELECT "AlbumId", "Title", (SELECT COUNT(*) FROM "Track" WH
ERE "Track"."AlbumId" = "Album"."AlbumId") AS "TrackCount"\n FROM "Album";', 'SQL_5C2E9A9B
_B591_4FC4_ADD5_829EB3188444')
df_4
Out[15]:
4 5 Big Ones 15
343 344 Schubert: The Late String Quartets & String Qu... 1
In this query, the subquery is used in the SELECT clause to count the number of tracks for each album, and the
outer query returns the album ID, title, and track count for each album.
Subqueries can be very useful for retrieving data based on complex conditions or for performing calculations
based on other tables in the database. However, it's important to note that subqueries can have a performance
impact on the database if they are not written efficiently, so it's important to optimize the subquery to ensure
that it executes quickly and does not cause performance issues.
In [21]:
df_10 = _deepnote_execute_sql('-- SELECT * FROM "Artist" WHERE "Name" =\n SELECT * FROM "
Artist" WHERE LOWER("Name") = LOWER(\' LED ZEPPELIN\' )', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829E
B3188444')
df_10
Out[21]:
ArtistId Name
0 22 Led Zeppelin
In [24]:
df_11 = _deepnote_execute_sql('SELECT * FROM "Artist" WHERE LOWER("Name") LIKE LOWER(\' %
/%\' )', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829EB3188444')
df_11
Out[24]:
ArtistId Name
0 ArtistId
1 AC/DC
Name
Luciana Souza/Romero
2 201
Lubambo
QUIZ: Selects the name of each artist and calculates the average track length for each artist contains "Black"
In [29]:
df_12 = _deepnote_execute_sql('SELECT "Artist"."Name", AVG("Track"."Milliseconds") FROM "
Artist"\n JOIN "Album" ON "Album"."ArtistId" = "Artist"."ArtistId"\n JOIN "Track" ON "Track
"."AlbumId" = "Album"."AlbumId"\n WHERE UPPER("Artist"."Name") LIKE UPPER(\' %black%\' )\n G
ROUP BY "Artist"."Name"', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829EB3188444')
df_12
Out[29]:
Name avg
A common table expression (CTE) is a temporary result set that is defined within a SQL statement and can be
referenced within the same statement. CTEs are useful for breaking down complex queries into smaller, more
manageable parts, and for improving the readability and maintainability of SQL code.
In PostgreSQL, a CTE is defined using the WITH clause. The syntax of a CTE is as follows:
WITH cte_name AS (
SELECT ...
FROM ...
WHERE ...
)
SELECT ...
FROM ...
JOIN cte_name ON ...
WHERE ...
In this syntax, "cte_name" is the name of the CTE, and the SELECT statement inside the parentheses defines the
temporary result set. The CTE can then be referenced in the main SELECT statement or in subsequent JOIN or
WHERE clauses.
Here's an example of using a CTE with the Chinook sample database in PostgreSQL:
Suppose you want to find the total revenue generated by each artist, but you also want to include the artist's
name in the query results. You can use a CTE to achieve this:
In [31]:
df_5 = _deepnote_execute_sql('WITH artist_revenue AS (\n SELECT "Artist"."ArtistId",
"Artist"."Name", SUM("Track"."UnitPrice" * "InvoiceLine"."Quantity") AS "TotalRevenue"\n
FROM "Artist"\n JOIN "Album" ON "Artist"."ArtistId" = "Album"."ArtistId"\n JOIN "Tr
ack" ON "Album"."AlbumId" = "Track"."AlbumId"\n JOIN "InvoiceLine" ON "Track"."TrackId
" = "InvoiceLine"."TrackId"\n GROUP BY "Artist"."ArtistId"\n )\n\nSELECT "artist_revenu
e"."Name", "artist_revenue"."TotalRevenue"\n FROM "artist_revenue"\n ORDER BY "artist_reven
ue"."TotalRevenue" DESC;', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829EB3188444')
df_5
Out[31]:
Name TotalRevenue
1 U2 105.93
2 Metallica 90.09
4 Lost 81.59
160 Academy of St. Martin in the Fields & Sir Nevi... 0.99
In this query, the CTE named "artist_revenue" calculates the total revenue for each artist based on the unit price
of each track and the quantity sold on each invoice. The CTE also includes the artist's name for reference. The
main SELECT statement then selects the artist's name and total revenue from the CTE, and orders the results by
total revenue in descending order.
Note that in this example, the CTE is used to break down a complex query into smaller, more manageable parts,
and to improve the readability and maintainability of the SQL code. The resulting query results will only include
the artists and their total revenue, ordered by total revenue in descending order.
QUIZ: Make a CTE named track_counts to calculate the number of tracks in each album in the Track table
In [34]:
df_13 = _deepnote_execute_sql('SELECT "Album"."Title", COUNT("Track"."TrackId") FROM "Alb
um"\n JOIN "Track" ON "Track"."AlbumId" = "Album"."AlbumId"\n GROUP BY "Album"."Title"', 'S
QL_5C2E9A9B_B591_4FC4_ADD5_829EB3188444')
df_13
Out[34]:
Title count
3 Minha Historia 34
346 Load 14