0% found this document useful (0 votes)
61 views8 pages

Advanced Database Operation

The HAVING clause is used with the GROUP BY clause to filter groups of rows based on aggregate functions like COUNT, SUM, etc. It is applied after groups are formed by GROUP BY, while WHERE filters rows before grouping. A subquery can be used in the WHERE or HAVING clauses to filter based on the results of an inner query.

Uploaded by

famasya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views8 pages

Advanced Database Operation

The HAVING clause is used with the GROUP BY clause to filter groups of rows based on aggregate functions like COUNT, SUM, etc. It is applied after groups are formed by GROUP BY, while WHERE filters rows before grouping. A subquery can be used in the WHERE or HAVING clauses to filter based on the results of an inner query.

Uploaded by

famasya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

HAVING operations

In PostgreSQL, the "HAVING" statement is used in conjunction with the "GROUP BY" statement to filter the
results of a query based on a condition or criteria that applies to groups of rows. The "HAVING" statement is
used to filter the groups generated by the "GROUP BY" statement.

The "HAVING" and "WHERE" statements are not interchangeable in SQL. Although they both filter data, they
serve different purposes and are used in different parts of a SQL query.

The "WHERE" statement is used to filter the rows returned by a SQL query based on a condition that applies to
individual rows. It is typically used to narrow down the results to a specific subset of rows that meet a certain
criteria. The "WHERE" statement is specified before the "GROUP BY" statement in a SQL query.

The "HAVING" statement, on the other hand, is used to filter the groups generated by a "GROUP BY" statement
based on a condition that applies to groups of rows. It is typically used to filter the results of an aggregate
function such as COUNT, SUM, AVG, MAX, or MIN. The "HAVING" statement is specified after the "GROUP BY"
statement in a SQL query.

The general syntax of a query that uses the "HAVING" statement in PostgreSQL is as follows:

SELECT column1, column2, aggregate_function(column3)


FROM table
GROUP BY column1, column2
HAVING condition;

In this syntax, "column1" and "column2" are the columns that you want to group by, and "column3" is the
column that you want to apply an aggregate function to, such as SUM, COUNT, AVG, MAX, or MIN. The
"condition" in the "HAVING" statement specifies the condition that the groups must meet in order to be included
in the query results.

Suppose you want to find the total revenue generated by each album, and you only want to include albums that
have a total revenue greater than $100. You can use the following SQL query:

In [1]:
df_6 = _deepnote_execute_sql('SELECT "Album"."Title", SUM("Track"."UnitPrice" * "InvoiceL
ine"."Quantity") AS "RevenueTotal"\n FROM "Album"\n JOIN "Track" ON "Album"."AlbumId" = "Tr
ack"."AlbumId"\n JOIN "InvoiceLine" ON "Track"."TrackId" = "InvoiceLine"."TrackId"\n GROUP
BY "Album"."AlbumId"', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829EB3188444')
df_6
Out[1]:

Title RevenueTotal

0 Quanta Gente Veio ver--Bônus De Carnaval 1.98

1 Os Cães Ladram Mas A Caravana Não Pára 8.91

2 Emergency On Planet Earth 5.94

3 Up An' Atom 16.83

4 Djavan Ao Vivo - Vol. 1 8.91

... ... ...

299 Chronicle, Vol. 2 18.81

300 Black Album 8.91


301 In Through The Out Door 7.92
Title RevenueTotal
302 Carry On 3.96

303 Minha Historia 26.73

304 rows × 2 columns

In [4]:
df_1 = _deepnote_execute_sql('SELECT "Album"."Title", SUM("Track"."UnitPrice" * "InvoiceL
ine"."Quantity") AS "RevenueTotal"\n FROM "Album"\n JOIN "Track" ON "Album"."AlbumId" = "Tr
ack"."AlbumId"\n JOIN "InvoiceLine" ON "Track"."TrackId" = "InvoiceLine"."TrackId"\n GROUP
BY "Album"."AlbumId"\n HAVING SUM("Track"."UnitPrice" * "InvoiceLine"."Quantity") > 10\n OR
DER BY "RevenueTotal" DESC', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829EB3188444')
df_1

Out[4]:

Title RevenueTotal

0 Battlestar Galactica (Classic), Season 1 35.82

1 The Office, Season 3 31.84

2 Minha Historia 26.73

3 Heroes, Season 1 25.87

4 Lost, Season 2 25.87

... ... ...

59 Vozes do MPB 10.89

60 Kill 'Em All 10.89

61 Body Count 10.89

62 B-Sides 1980-1990 10.89

63 Greatest Hits I 10.89

64 rows × 2 columns

In this example, the "JOIN" statements link the "Album", "Track", and "InvoiceLine" tables together, and the
"GROUP BY" statement groups the results by album. The "SUM(Track.UnitPrice * InvoiceLine.Quantity)"
calculates the total revenue for each album based on the unit price of each track and the quantity sold on each
invoice. Finally, the "HAVING" statement filters the results to only include albums that have a total revenue
greater than $100.

NOTE: Having clause cannot be used with alias

Note that in the Chinook database, the "album" table does not have a "total revenue" column, so we need to use
the "JOIN" and "SUM" statements to calculate it.

QUIZ: Write a query to calculate total genre which has more than 100 tracks

In [5]:
df_2 = _deepnote_execute_sql('SELECT "Genre"."Name", COUNT("Track"."TrackId") \n FROM "Gen
re"\n JOIN "Track" ON "Track"."GenreId" = "Genre"."GenreId"\n GROUP BY "Genre"."Name"\n HAVI
NG COUNT("Track"."TrackId") > 100', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829EB3188444')
df_2
Out[5]:

Name count

0 Latin 579
0 Latin 579
Name count
1 Rock 1297

2 Alternative & Punk 332

3 Metal 374

4 Jazz 130

In [8]:
DeepnoteChart(df_2, """{"layer":[{"layer":[{"mark":{"clip":true,"type":"bar","tooltip":tr
ue},"encoding":{"x":{"sort":null,"type":"nominal","field":"Name","scale":{"type":"linear"
}},"y":{"sort":null,"type":"quantitative","field":"count","scale":{"type":"linear"},"form
at":{"type":"default","decimals":null},"aggregate":"sum","formatType":"numberFormatFromNu
mberType"},"color":{"sort":null,"type":"nominal","field":"Name","scale":{"scheme":"tablea
u10"}}}}]}],"title":"","config":{"legend":{} },"$schema":"https://vega.github.io/schema/ve
ga-lite/v5.json","encoding":{} }""")
Out[8]:
<__main__.DeepnoteChart at 0x7f4bd2d37550>

In [10]:
df_7 = _deepnote_execute_sql('SELECT * FROM "Album" WHERE "Title" = \' Big Ones\' OR "Titl
e" = \' Facelift\' ', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829EB3188444')
df_7
Out[10]:

AlbumId Title ArtistId

0 5 Big Ones 3

1 7 Facelift 5

In [11]:
df_8 = _deepnote_execute_sql('SELECT * FROM "Album" WHERE "Title" IN (\' Big Ones\' , \' Fac
elift\' )', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829EB3188444')
df_8
Out[11]:

AlbumId Title ArtistId

0 5 Big Ones 3

1 7 Facelift 5

Subquery operations

A subquery, or inner query, is a query that is embedded within another query in SQL. Subqueries are used to
retrieve data that will be used in the main query as a filter or as a value for a calculation.

A subquery is enclosed in parentheses and can be placed in various parts of a SQL statement, such as the
WHERE clause, HAVING clause, or SELECT clause. The results of the subquery are used by the outer query to
generate the final result set. For example, consider the following SQL query:

SELECT *
FROM orders
WHERE customerid IN (SELECT customerid FROM customers WHERE country = 'USA');

In this query, the subquery is used in the WHERE clause to retrieve the customer IDs of customers who are
located in the USA. The outer query then uses these customer IDs to filter the orders table to only include orders
from customers in the USA.

Suppose you want to find all the tracks that have the same genre as the track with ID 1. You can use a subquery
in the WHERE clause to achieve this:

In [12]:
df_3 = _deepnote_execute_sql('SELECT *\n FROM "Track"\n WHERE "GenreId" = (SELECT "GenreId"
FROM "Track" WHERE "TrackId" = 1);', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829EB3188444')
df_3
Out[12]:

TrackId Name AlbumId MediaTypeId GenreId Composer Milliseconds Bytes UnitPrice

For Those About Angus Young,


0 1 To Rock (We Salute 1 1 1 Malcolm Young, 343719 11170334 0.99
You) Brian Johnson

1 2 Balls to the Wall 2 2 1 None 342562 5510424 0.99

F. Baltes, S. Kaufman,
2 3 Fast As a Shark 3 2 1 U. Dirkscneider & W. 230619 3990994 0.99
Ho...

F. Baltes, R.A. Smith-


3 4 Restless and Wild 3 2 1 Diesel, S. Kaufman, U. 252051 4331779 0.99
D...

Princess of the Deaffy & R.A. Smith-


4 5 3 2 1 375418 6290521 0.99
Dawn Diesel

... ... ... ... ... ... ... ... ... ...

Tease Me Please
1292 3297 257 2 1 None 287229 4811894 0.99
Me

1293 3298 Wind of Change 257 2 1 None 315325 5268002 0.99

1294 3299 Send Me an Angel 257 2 1 None 273041 4581492 0.99

Darius "Take One"


I Guess You're
1295 3353 265 5 1 Minwalla/Jon 212044 3453849 0.99
Right
Auer/Ken String...

Darius "Take One"


1296 3355 Love Comes 265 5 1 Minwalla/Jon 199923 3240609 0.99
Auer/Ken String...

1297 rows × 9 columns

In [14]:
df_9 = _deepnote_execute_sql('EXPLAIN ANALYZE SELECT *\n FROM "Track"\n WHERE "GenreId" = (
SELECT "GenreId" FROM "Track" WHERE "TrackId" = 1);', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829EB3
188444')
df_9
Out[14]:

QUERY PLAN

Index Scan using "IFK_TrackGenreId" on


0
"Track"...

1 Index Cond: ("GenreId" = $0)

2 InitPlan 1 (returns $0)

3 -> Index Scan using "PK_Track" on "Track"...

4 Index Cond: ("TrackId" = 1)

5 Planning Time: 0.105 ms

6 Execution Time: 0.359 ms


In this query, the subquery is used in the WHERE clause to retrieve the genre ID of the track with ID 1. The outer
query then returns all the tracks that have the same genre ID as the track with ID 1.

Subqueries can also be used in the SELECT clause to calculate a value based on a subquery. For example:

In [15]:
df_4 = _deepnote_execute_sql('SELECT "AlbumId", "Title", (SELECT COUNT(*) FROM "Track" WH
ERE "Track"."AlbumId" = "Album"."AlbumId") AS "TrackCount"\n FROM "Album";', 'SQL_5C2E9A9B
_B591_4FC4_ADD5_829EB3188444')
df_4
Out[15]:

AlbumId Title TrackCount

0 1 For Those About To Rock We Salute You 10

1 2 Balls to the Wall 1

2 3 Restless and Wild 3

3 4 Let There Be Rock 8

4 5 Big Ones 15

... ... ... ...

342 343 Respighi:Pines of Rome 1

343 344 Schubert: The Late String Quartets & String Qu... 1

344 345 Monteverdi: L'Orfeo 1

345 346 Mozart: Chamber Music 1

Koyaanisqatsi (Soundtrack from the Motion


346 347 1
Pict...

347 rows × 3 columns

In this query, the subquery is used in the SELECT clause to count the number of tracks for each album, and the
outer query returns the album ID, title, and track count for each album.

Subqueries can be very useful for retrieving data based on complex conditions or for performing calculations
based on other tables in the database. However, it's important to note that subqueries can have a performance
impact on the database if they are not written efficiently, so it's important to optimize the subquery to ensure
that it executes quickly and does not cause performance issues.

In [21]:
df_10 = _deepnote_execute_sql('-- SELECT * FROM "Artist" WHERE "Name" =\n SELECT * FROM "
Artist" WHERE LOWER("Name") = LOWER(\' LED ZEPPELIN\' )', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829E
B3188444')
df_10
Out[21]:

ArtistId Name

0 22 Led Zeppelin

In [24]:
df_11 = _deepnote_execute_sql('SELECT * FROM "Artist" WHERE LOWER("Name") LIKE LOWER(\' %
/%\' )', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829EB3188444')
df_11
Out[24]:

ArtistId Name
0 ArtistId
1 AC/DC
Name

1 188 Mundo Livre S/A

Luciana Souza/Romero
2 201
Lubambo

QUIZ: Selects the name of each artist and calculates the average track length for each artist contains "Black"

In [29]:
df_12 = _deepnote_execute_sql('SELECT "Artist"."Name", AVG("Track"."Milliseconds") FROM "
Artist"\n JOIN "Album" ON "Album"."ArtistId" = "Artist"."ArtistId"\n JOIN "Track" ON "Track
"."AlbumId" = "Album"."AlbumId"\n WHERE UPPER("Artist"."Name") LIKE UPPER(\' %black%\' )\n G
ROUP BY "Artist"."Name"', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829EB3188444')
df_12
Out[29]:

Name avg

0 The Black Crowes 329794.105263

1 Black Sabbath 288042.470588

2 Black Label Society 305981.888889

Common Table Expressions (CTEs)

A common table expression (CTE) is a temporary result set that is defined within a SQL statement and can be
referenced within the same statement. CTEs are useful for breaking down complex queries into smaller, more
manageable parts, and for improving the readability and maintainability of SQL code.

In PostgreSQL, a CTE is defined using the WITH clause. The syntax of a CTE is as follows:

WITH cte_name AS (
SELECT ...
FROM ...
WHERE ...
)
SELECT ...
FROM ...
JOIN cte_name ON ...
WHERE ...

In this syntax, "cte_name" is the name of the CTE, and the SELECT statement inside the parentheses defines the
temporary result set. The CTE can then be referenced in the main SELECT statement or in subsequent JOIN or
WHERE clauses.

Here's an example of using a CTE with the Chinook sample database in PostgreSQL:

Suppose you want to find the total revenue generated by each artist, but you also want to include the artist's
name in the query results. You can use a CTE to achieve this:

In [31]:
df_5 = _deepnote_execute_sql('WITH artist_revenue AS (\n SELECT "Artist"."ArtistId",
"Artist"."Name", SUM("Track"."UnitPrice" * "InvoiceLine"."Quantity") AS "TotalRevenue"\n
FROM "Artist"\n JOIN "Album" ON "Artist"."ArtistId" = "Album"."ArtistId"\n JOIN "Tr
ack" ON "Album"."AlbumId" = "Track"."AlbumId"\n JOIN "InvoiceLine" ON "Track"."TrackId
" = "InvoiceLine"."TrackId"\n GROUP BY "Artist"."ArtistId"\n )\n\nSELECT "artist_revenu
e"."Name", "artist_revenue"."TotalRevenue"\n FROM "artist_revenue"\n ORDER BY "artist_reven
ue"."TotalRevenue" DESC;', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829EB3188444')
df_5
Out[31]:

Name TotalRevenue

0 Iron Maiden 138.60

1 U2 105.93

2 Metallica 90.09

3 Led Zeppelin 86.13

4 Lost 81.59

... ... ...

160 Academy of St. Martin in the Fields & Sir Nevi... 0.99

161 Dread Zeppelin 0.99

162 Academy of St. Martin in the Fields, John Birc... 0.99

163 The King's Singers 0.99

164 Adrian Leaper & Doreen de Feis 0.99

165 rows × 2 columns

In this query, the CTE named "artist_revenue" calculates the total revenue for each artist based on the unit price
of each track and the quantity sold on each invoice. The CTE also includes the artist's name for reference. The
main SELECT statement then selects the artist's name and total revenue from the CTE, and orders the results by
total revenue in descending order.

Note that in this example, the CTE is used to break down a complex query into smaller, more manageable parts,
and to improve the readability and maintainability of the SQL code. The resulting query results will only include
the artists and their total revenue, ordered by total revenue in descending order.

QUIZ: Make a CTE named track_counts to calculate the number of tracks in each album in the Track table

In [34]:
df_13 = _deepnote_execute_sql('SELECT "Album"."Title", COUNT("Track"."TrackId") FROM "Alb
um"\n JOIN "Track" ON "Track"."AlbumId" = "Album"."AlbumId"\n GROUP BY "Album"."Title"', 'S
QL_5C2E9A9B_B591_4FC4_ADD5_829EB3188444')
df_13
Out[34]:

Title count

0 Heart of the Night 12

1 The Cream Of Clapton 18

2 Van Halen III 12

3 Minha Historia 34

4 Chill: Brazil (Disc 2) 17

... ... ...

342 Fauré: Requiem, Ravel: Pavane & Others 1

343 The Beast Live 10

344 The Office, Season 2 22

345 O Samba Poconé 11

346 Load 14

347 rows × 2 columns


Created in Deepnote

You might also like