The Basic of SQL + List of Courses
The Basic of SQL + List of Courses
Consider our online bookstore. We need to store information about books. Instead of just listing facts randomly,
we organize them in a table called books:
1 The Lord of the Rings J.R.R. Tolkien 19.99 1954-07-29 TRUE 1178
● Relation (Table): The entire table, books, is a relation. It represents a collection of related data.
● Attributes (Columns): book_id, title, author, price, publication_date, in_stock, and pages
are attributes. They describe the characteristics of each book.
● Tuples (Rows): Each row represents a single book – a specific instance of the "book" entity.
● Domain: Each attribute has a domain, which is the set of all possible values it can take. For example, the
domain of in_stock is {TRUE, FALSE}. The domain of price is all positive decimal numbers up to a
certain precision. Data types (which we'll discuss next) are how we define these domains in SQL.
This structure is based on the relational model, a formal data model where data is organized into relations (tables).
This model provides a consistent and mathematically sound way to manage and query data.
We can think about data at different levels: we understand conceptually that a book has a title (conceptual level);
we represent this in SQL with a books table and a title column (logical level); and the database system stores
this data on disk in a specific file format (physical level). SQL operates at the logical level.
The distinction between raw data and information is also important. "19.99" is raw data. "The price of 'The Lord of
the Rings' is 19.99" is information – we've added context and meaning. Databases help us transform raw data into
information. Also as defined above we have structured, unstructured, and semi-structured data.
Let's examine the common SQL data types, with examples from our bookstore:
1. Numeric Types:
SQL (Structured Query Language) is how we interact with the database. SELECT and FROM are fundamental
clauses for retrieving data.
● FROM Clause:
○ Specifies the table (or tables) we want to query. It's the starting point – where is the data?
○ FROM books: We're getting data from the books table.
● SELECT Clause:
○ Specifies which columns (attributes) we want to retrieve. It's a projection operation in relational
algebra terms.
○ SELECT *: Retrieve all columns.
■ SELECT * FROM books; (This gets all columns and all rows from the books table).
○ SELECT title, author: Retrieve only the title and author columns.
■ SELECT title, author FROM books;
○ SELECT title AS book_title, price AS cost: Retrieve title and price, but rename
the columns in the output. The AS keyword creates an alias.
■ SELECT pages * 2 AS double_pages FROM books; (We can even perform
calculations within the SELECT clause).
Example Queries:
SQL
-- Get the title, author, and publication date, renaming the columns.
SELECT title AS book_title, author AS book_author, publication_date AS pub_date
FROM books;
In the context of relational databases, a key is one or more columns whose values are used to identify rows
(tuples) in a table. Think of them as unique identifiers or addresses for each record. Different types of keys serve
different purposes.
2. Types of Keys
● Super Key:
○ A superkey is any set of columns that can uniquely identify a row. It might have more columns than
strictly necessary for uniqueness.
○ Example (Bookstore): In our books table, {book_id} is a superkey. But so is {book_id,
title}, {book_id, author}, {book_id, title, author, price}, and even the set of
all columns in the table. Any combination that includes book_id will be a superkey, because
book_id itself guarantees uniqueness. A superkey is a very broad concept.
● Candidate Key:
○ A candidate key is a minimal superkey. This means it's a set of columns that uniquely identifies
rows, and no subset of those columns can also uniquely identify rows. You can't remove any
columns from a candidate key and still maintain uniqueness.
○ Example (Bookstore): In our books table, {book_id} is a candidate key. It uniquely identifies
each book, and it's minimal (we can't remove any columns).
○ Another Example: Let's say we also have an isbn (International Standard Book Number) column,
and we enforce that ISBNs must be unique. Then, {isbn} would also be a candidate key. A table
can have multiple candidate keys.
○ Finding Candidate Keys: This requires understanding the meaning of the data. You need to know
which combinations of attributes should be unique based on the real-world entities you're
modeling.
● Primary Key:
○ The primary key is one of the candidate keys that you choose to be the main identifier for rows in
the table. It's a design decision.
○ Example (Bookstore): We would likely choose {book_id} as the primary key for our books
table.
○ Characteristics of a Primary Key:
■ Uniqueness: The primary key must be unique for every row. The database will enforce this.
■ Non-NULL: The primary key cannot contain NULL values. Every row must have a value for
the primary key.
■ Immutability (Ideally): While not strictly enforced by all databases, it's generally best
practice for primary key values to be immutable (never change). This helps maintain data
integrity and simplifies relationships.
■ Single Column (Usually): While a primary key can be composed of multiple columns (a
composite primary key), it's often simpler and more efficient to use a single-column primary
key.
○ Example (Composite Primary Key): Imagine a table order_items that stores the individual
items in each order. It might have columns order_id and product_id. Neither order_id nor
product_id is unique on its own (an order can have multiple products, and a product can be in
multiple orders). But the combination {order_id, product_id} is unique – a specific product
within a specific order. This would be a composite primary key.
● Alternate Key (or Secondary Key):
○ Any candidate key that is not chosen as the primary key is called an alternate key (or sometimes
secondary key).
○ Example (Bookstore): If book_id is the primary key, and isbn is also unique, then isbn is an
alternate key.
○ Usefulness: Alternate keys are often used for indexing and enforcing uniqueness constraints on
other columns besides the primary key.
● Foreign Key:
○ A foreign key is a column (or set of columns) in one table that refers to the primary key (or a
candidate key) of another table. This is how relationships between tables are established.
○ The author_id column in books is a foreign key. It refers to the author_id column (the
primary key) in the authors table.
■ Each value present in the author_id column in the books table, should have a
corresponding value in the author_id column in the authors table.
○ Referential Integrity: Foreign keys enforce referential integrity. This means the database will
prevent you from:
■ Inserting a row into the books table with an author_id that doesn't exist in the
authors table.
■ Deleting a row from the authors table if there are still rows in the books table that
reference that author.
■ Updating the author_id in authors if there are dependent rows in books.
○ ON DELETE and ON UPDATE Clauses: You can specify what should happen when a referenced
primary key value is deleted or updated:
■ ON DELETE CASCADE: If an author is deleted from authors, all books by that author are
also deleted from books.
■ ON DELETE SET NULL: If an author is deleted, the author_id in the corresponding
books rows is set to NULL.
■ ON DELETE RESTRICT (or NO ACTION): Prevents the deletion of the author if there are
related books. This is often the default behavior.
■ Similar options exist for ON UPDATE.
● Unique Key:
○ A unique key constraint ensures that all values in a column (or set of columns) are unique. It's
similar to a primary key, but:
■ A table can have multiple unique keys.
■ Unique keys can allow NULL values (although only one NULL value is typically allowed).
■ A unique index is created for unique keys.
Example:
CREATE TABLE books (
book_id INT PRIMARY KEY,
title VARCHAR(255),
author_id INT,
price DECIMAL(5, 2),
publication_date DATE,
in_stock BOOLEAN,
pages INT,
isbn VARCHAR(20) UNIQUE, -- Enforces uniqueness on ISBN
FOREIGN KEY (author_id) REFERENCES authors(author_id)
);
● Composite Key:
○ A key that consists of two or more columns. This is necessary when a single column is not sufficient
to guarantee uniqueness.
○ Example (Order Items):
CREATE TABLE order_items (
order_id INT,
product_id INT,
quantity INT,
PRIMARY KEY (order_id, product_id), -- Composite primary
key
FOREIGN KEY (order_id) REFERENCES orders(order_id),
FOREIGN KEY (product_id) REFERENCES products(product_id)
);
● In this case, a single order can have multiple products, and a single product can be part of multiple orders.
The combination of order_id and product_id uniquely identifies each item within an order.
● Data Integrity: Keys enforce uniqueness and referential integrity, preventing data inconsistencies and
errors.
● Data Relationships: Foreign keys define relationships between tables, allowing you to link related data.
● Query Performance: Keys are often used to create indexes, which dramatically speed up data retrieval.
● Data Normalization: Keys are a fundamental part of database normalization, a process of organizing data
to reduce redundancy and improve data integrity.
5. Relationships in Relational Databases
Relationships in a database describe how different tables are connected. There are three main types:
● One-to-One (1:1)
● One-to-Many (1:N) or Many-to-One (N:1)
● Many-to-Many (M:N)
The easiest way to figure out the relationship type is to ask "how many?" questions. Let's use our bookstore and
some new examples to illustrate this. We'll use "A" and "B" to represent the two entities (tables) we're
considering.
Examples (Bookstore)
● Tables: books (book_id, ...), categories (category_id, ...), book_categories (book_id, category_id)
● Question 1 (Books to Categories): For one book, how many categories can there be? Answer: Many (A
book can belong to multiple categories: Fiction, Mystery, etc.).
● Question 2 (Categories to Books): For one category, how many books can there be? Answer: Many (A
category can contain many books).
● Conclusion: Many-to-Many (M:N) relationship. We need a junction table (book_categories) to
represent this.
● Tables: students (student_id, ...), courses (course_id, ...), student_courses (student_id, course_id)
● Question 1 (Students to Courses): For one student, how many courses can they take? Answer: Many
● Question 2 (Courses to Students): For one course, how many students can be enrolled? Answer: Many
● Conclusion: Many-to-Many (M:N). Requires a junction table (student_courses).
● One-to-Many: Foreign key on the "many" side table, referencing the primary key of the "one" side table.
● Many-to-Many: Create a junction table with foreign keys referencing the primary keys of both original
tables. The primary key of the junction table is usually a composite key of these two foreign keys.
● One-to-One: Foreign key in one table referencing the primary key of the other. Often, the foreign key
column is also made the primary key of its table.
6. Relationships in Relational Databases & ER Diagrams
We'll cover:
Entity-Relationship Diagrams (ERDs) are visual tools for representing the entities (tables) in a database and the
relationships between them. They provide a clear way to understand the database structure at a glance.
Entities: Represented by rectangles. Each rectangle corresponds to a table. The name of the entity (table) is inside
the rectangle.
+-------------+
| Customers |
+-------------+
Attributes: Listed inside the entity rectangle (often shown as ovals connected to the rectangle in more detailed
diagrams). Key attributes (primary keys) are usually underlined.
+-------------+
| Customers |
+-------------+
| customer_id | (Underlined - Primary Key)
| name |
| address |
+---------------+
Relationships: Represented by diamonds connected to the entities involved. The type of relationship (1:1, 1:N,
M:N) is indicated by crow's foot notation on the lines connecting the relationship diamond to the entities.
One-to-Many (1:N):
+-------------+ +----------+ +----------+
| Customers |-----< Orders >-----| Orders |
+-------------+ +----------+ +----------+
Many-to-Many (M:N):
+--------+ +----------+ +----------+
| Books |>----< Related >-----< Categories|
+--------+ +----------+ +----------+
2. Relationship Types:
Let's revisit each relationship type, focusing on the key structure (primary, foreign, composite) and providing clear
SQL examples.
● Key Structure:
○ Primary Key: Each table has its own primary key.
○ Foreign Key: The table on the "many" side has a foreign key that references the primary key of the
table on the "one" side.
Many-to-Many (M:N)
Key Structure:
● Primary Keys: Each of the original tables has its own primary key.
● Junction Table: A third table (junction table) is created.
● Foreign Keys: The junction table has two foreign keys: one referencing each of the original tables' primary
keys.
● Composite Primary Key: The junction table's primary key is usually a composite key consisting of the two
foreign keys.
ERD: (As shown above, with the M:N relationship implicitly requiring the junction table). A more detailed ERD
might explicitly show the Book_Categories table.
One-to-One (1:1)
Key Structure:
1. Identify Entities: What are the main "things" you're storing information about? These are your tables.
2. Ask the "How Many?" Questions: For each pair of entities, ask the "how many?" questions to determine
the relationship type (1:1, 1:N, M:N).
3. Draw the ERD: Sketch a simple ERD to visualize the relationships. Use crow's foot notation.
4. Determine Primary Keys: Choose a primary key for each table. This should uniquely identify each row.
5. Determine Foreign Keys:
○ 1:N: The "many" side table gets the foreign key.
○ M:N: Create a junction table with two foreign keys and a composite primary key.
○ 1:1: One table gets a foreign key referencing the other. Often, this foreign key is also the primary
key.
6. Consider Composite Keys: Only use composite primary keys when a single column is not sufficient for
uniqueness (most commonly in junction tables).
By following this systematic approach, you can quickly and accurately determine the relationships between your
entities and design the appropriate key structure for your database. The "how many?" questions and the ERD
visualization are powerful tools for understanding and designing relational databases.
The WHERE clause comes after the FROM clause in a SELECT statement:
condition: An expression that evaluates to either TRUE, FALSE, or NULL for each row in the table. Only rows
for which the condition is TRUE are included in the results.
2. Comparison Operators
● = (Equal to): WHERE price = 19.99 (Selects books with a price exactly equal to 19.99).
● <> or != (Not equal to): WHERE in_stock <> TRUE (Selects books that are not in stock). <> is the
standard SQL operator; != is also widely supported.
● > (Greater than): WHERE pages > 500 (Selects books with more than 500 pages).
● < (Less than): WHERE publication_date < '2000-01-01' (Selects books published before the year
2000).
● >= (Greater than or equal to): WHERE price >= 10.00 (Selects books with a price of 10.00 or more).
● <= (Less than or equal to): WHERE pages <= 200 (Selects books with 200 pages or fewer).
Example:
Example (Bookstore):
6. IN and NOT IN Operators
Example (Bookstore):
Example (Bookstore):
8. IS NULL and IS NOT NULL (Checking for Null Values)
● NULL represents a missing or unknown value. It's not the same as zero or an empty string.
● You cannot use = to check for NULL. You must use IS NULL or IS NOT NULL.
Example (Bookstore): Let's assume we added a subtitle to the books, that might be NULL
Important:
● Comparison with NULL using operators like =, <, >, etc. will result in UNKNOWN, which does not satisfy
the condition.
● When you have a condition like price = NULL, this will not work.
You can create complex filtering conditions by combining all of the above operators and using parentheses to
control precedence.
EXAMPLE
-- Complex example:
-- Get books that (are in stock AND (have a price between 10 and 20 OR are by
'Jane Austen'))
● Purpose: The ORDER BY clause sorts the result set of a query based on one or more columns.
● Syntax:
● column1, column2, ...: The columns to sort by. You can sort by multiple columns.
● ASC: Sorts in ascending order (smallest to largest, A to Z, earliest to latest). This is the default if you don't
specify ASC or DESC.
● DESC: Sorts in descending order (largest to smallest, Z to A, latest to earliest).
Examples (Bookstore):
-- Get all books, sorted by author (A-Z) and then by title (A-Z)
within each author.
SELECT title, author
FROM books
ORDER BY author ASC, title ASC;
○ The placement of NULL values in the sorted output depends on the database system. Some
systems place them first, others last. You may be able to control this behavior with
database-specific extensions (e.g., NULLS FIRST or NULLS LAST in PostgreSQL).
● Purpose: The LIMIT clause (or its equivalent in some databases) restricts the number of rows returned by
a query.
● Syntax:
Examples (Bookstore):
● LIMIT without ORDER BY: While you can use LIMIT without ORDER BY, the results are not guaranteed
to be in any particular order. The database will simply return any number_of_rows rows that meet the
other criteria (if any). It's generally best practice to use ORDER BY with LIMIT to get predictable results.
● Purpose: The OFFSET clause, used in conjunction with LIMIT, allows you to skip a specified number of
rows before starting to return results. This is essential for pagination – displaying results in pages (e.g.,
"Page 1," "Page 2," etc.).
● Syntax:
Example (Generalized Pagination): Let's say we want page number 4, with 5 books per page:
● MySQL, PostgreSQL, SQLite: Use the LIMIT and OFFSET keywords as described above.
● SQL Server (newer versions): Uses OFFSET ... FETCH NEXT ... ROWS ONLY.
EXAMPLE
● Oracle (newer versions): Similar to SQL Server, uses OFFSET ... FETCH FIRST ... ROWS ONLY.
● Older database versions might use different methods.
Key Considerations
● Performance: Using OFFSET with very large values can be inefficient, especially on large tables. The
database has to read and discard all the skipped rows. For large datasets, consider alternative pagination
strategies (e.g., "keyset pagination") if performance is critical.
● Consistency: If new data are inserted or updated, the order may be changed. So, it is important to use a
primary key in ordering.
BEST COURSE RECOMMENDATIONS
These courses are your starting point. Pick ONE that suits your learning style. Don't do all of them at this stage;
you'll get redundant information.
1. SQL for Data Science (University of California, Davis): This is a very popular and well-regarded starting point.
It focuses on SQL within the context of data analysis, which is a great way to learn. It uses SQLite, which is good
for learning but you'll want to branch out later.
● Pros: Excellent introduction to the why of SQL, good pacing, practical exercises, strong focus on
data-related tasks.
● Cons: Limited to SQLite. Doesn't go as deep into database design principles as some others.
2. Databases and SQL for Data Science with Python (IBM): Another very strong option, especially if you're
interested in the intersection of SQL and Python. It teaches IBM Db2 (a variant of SQL), which is closer to
industry standards than SQLite. Includes using SQL within Jupyter Notebooks.
● Pros: Good integration with Python, covers a more "real-world" SQL dialect, includes database
connectivity and working with APIs.
● Cons: Might be slightly faster-paced for complete beginners than the UC Davis course.
3. Querying Data with Transact-SQL (Microsoft): This is a fantastic introduction if you're interested in working
with Microsoft SQL Server (a very common database system in enterprise environments). Focuses on T-SQL, the
Microsoft dialect.
● Pros: Excellent for learning a specific, in-demand dialect (T-SQL), very well-structured, taught by
Microsoft experts.
● Cons: Highly specific to SQL Server. If you don't know you'll be using SQL Server, a more general course
might be a better start.
4. A Introduction for querying Databases: Analyze data within a database using SQL. Create a relational
database on Cloud and work with tables. Write SQL statements including SELECT, INSERT, UPDATE, and
DELETE.
Recommendation: If you want a general introduction and are interested in data analysis, start with the SQL for
Data Science (UC Davis) Specialization. If you anticipate working with Microsoft technologies, the Querying Data
with Transact-SQL (Microsoft) course on edX is excellent. If you know you'll be using Python, the Databases and
SQL for Data Science with Python (IBM) is a great choice.
5. Data Analysis with SQL (Part of the Google Data Analytics Professional Certificate): While part of a larger
certificate, this section is excellent for solidifying your SQL skills and applying them to real-world data analysis
problems. Uses BigQuery (Google's cloud-based data warehouse).
● Pros: Very practical, focuses on using SQL for analysis, teaches a cloud-based platform (BigQuery). Good
for building a portfolio.
● Cons: Requires some basic data analysis knowledge (which you'll get in the earlier parts of the certificate
if you choose to take it).
6. Advanced SQL for Data Scientists (LinkedIn Learning): Focuses on more advanced query techniques, window
functions, common table expressions (CTEs), and performance optimization. Uses a variety of SQL dialects.
● Pros: Covers essential advanced topics, good for improving query writing efficiency.
● Cons: Assumes a solid foundation in SQL.
7. Analyzing Big Data with SQL (Cloudera): Excellent for learning how to work with large datasets using SQL and
tools like Apache Hive and Impala. A good choice if you're interested in "big data" technologies.
● Pros: Focuses on big data concepts, introduces relevant tools (Hive, Impala).
● Cons: Requires some understanding of distributed systems (although it provides an introduction). Less
focused on traditional relational databases.
8. Data Science: Querying with SQL (HarvardX): This course, which can be part of the Professional Certificate in
Data Science, covers SQL with a focus on the skills needed for a data science career.
These courses are for those who want to become true SQL experts and/or move into database administration
roles.
9. Database Design and Basic SQL in PostgreSQL (University of Michigan): This is a very strong course for
learning about relational database design principles, normalization, and using PostgreSQL (a popular and
powerful open-source database).
● Pros: Excellent for understanding database design theory, teaches a widely-used open-source database.
Crucial for building efficient and scalable databases.
● Cons: Less focused on data analysis, more on database structure.
10. PostgreSQL for Everybody Specialization (University of Michigan): An extension of the single course that
provides a more in-depth treatment of PostgreSQL. It will cover more complex operations and usage.
11. NoSQL systems: While not strictly SQL, understanding NoSQL databases (like MongoDB, Cassandra) is
increasingly important. This course covers the concepts and how they differ from relational databases.
12. Data Management and Visualization (Wesleyan University): While focusing more broadly on data
management, a significant portion covers database design, SQL, and data modeling.
Learning Path:
1. Beginner: Choose one from:
○ SQL for Data Science (Coursera - UC Davis)
○ Databases and SQL for Data Science with Python (Coursera - IBM)
○ Querying Data with Transact-SQL (edX - Microsoft)
2. Intermediate: Choose at least two from:
○ Data Analysis with SQL (Coursera - Google)
○ Advanced SQL for Data Scientists (Coursera - LinkedIn Learning)
○ Analyzing Big Data with SQL (edX - Cloudera)
○ Data Science: Querying with SQL (edX - HarvardX)
Important Considerations:
● Practice, Practice, Practice: SQL is a practical skill. The best way to learn is to write SQL queries. Use the
exercises provided in the courses, and also try to find your own datasets to work with. Sites like Kaggle,
data.world, and public government data portals are great resources.
● SQL Dialects: Be aware that there are different "dialects" of SQL (e.g., MySQL, PostgreSQL, SQL Server,
Oracle, SQLite). They share the core syntax, but there are differences in functions and features. The
courses above cover various dialects; choose ones that align with your career goals or the tools you're
likely to use.
● Projects: Build projects! Design your own database, populate it with data, and then write queries to
answer questions about it. This is the best way to solidify your skills and build a portfolio.
● Don't Be Afraid to Experiment: Try different queries, break things (in a safe environment!), and learn
from your mistakes. SQL is a very forgiving language to learn.
● Read Documentation: Become comfortable reading the official documentation for the specific SQL
dialect you are using. This is a crucial skill for any database professional.