0% found this document useful (0 votes)
32 views

The Basic of SQL + List of Courses

The document provides an overview of data in relational databases, focusing on structured data and its organization in tables, using an online bookstore as an example. It covers data types, basic SQL clauses for data retrieval, and the importance of keys in relational databases, including primary, foreign, and unique keys. The document emphasizes the role of SQL in managing and querying data effectively within this structured framework.

Uploaded by

New king India
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

The Basic of SQL + List of Courses

The document provides an overview of data in relational databases, focusing on structured data and its organization in tables, using an online bookstore as an example. It covers data types, basic SQL clauses for data retrieval, and the importance of keys in relational databases, including primary, foreign, and unique keys. The document emphasizes the role of SQL in managing and querying data effectively within this structured framework.

Uploaded by

New king India
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Data, Data Types, and Basic SQL Clauses

Let's build a solid foundation for understanding data in relational databases


and how SQL interacts with it, using a running example of an online bookstore.

1. Understanding Data in Relational Databases


Data, in the context of databases, refers to organized facts and figures. We're primarily concerned with structured
data – information arranged in a predefined format, typically a table. Think of it like a well-organized spreadsheet.
This structure is crucial for efficient querying, analysis, and management.

Consider our online bookstore. We need to store information about books. Instead of just listing facts randomly,
we organize them in a table called books:

book_id title author price publication_date in_stock pages

1 The Lord of the Rings J.R.R. Tolkien 19.99 1954-07-29 TRUE 1178

2 Pride and Prejudice Jane Austen 9.99 1813-01-28 TRUE 432

3 1984 George Orwell 12.50 1949-06-08 FALSE 328

4 The Hitchhiker's Guide Douglas Adams 15.00 1979-10-12 TRUE 224

This table illustrates several key concepts:

●​ Relation (Table): The entire table, books, is a relation. It represents a collection of related data.
●​ Attributes (Columns): book_id, title, author, price, publication_date, in_stock, and pages
are attributes. They describe the characteristics of each book.
●​ Tuples (Rows): Each row represents a single book – a specific instance of the "book" entity.
●​ Domain: Each attribute has a domain, which is the set of all possible values it can take. For example, the
domain of in_stock is {TRUE, FALSE}. The domain of price is all positive decimal numbers up to a
certain precision. Data types (which we'll discuss next) are how we define these domains in SQL.

This structure is based on the relational model, a formal data model where data is organized into relations (tables).
This model provides a consistent and mathematically sound way to manage and query data.

We can think about data at different levels: we understand conceptually that a book has a title (conceptual level);
we represent this in SQL with a books table and a title column (logical level); and the database system stores
this data on disk in a specific file format (physical level). SQL operates at the logical level.
The distinction between raw data and information is also important. "19.99" is raw data. "The price of 'The Lord of
the Rings' is 19.99" is information – we've added context and meaning. Databases help us transform raw data into
information. Also as defined above we have structured, unstructured, and semi-structured data.

2. Data Types: Defining the Nature of Data


Data types are fundamental. They tell the database what kind of data each column can hold, and what operations
can be performed on it. They are not just about storage; they are about data integrity and meaning.

Let's examine the common SQL data types, with examples from our bookstore:

1. Numeric Types:​

●​ INT (or INTEGER): Represents whole numbers (positive, negative, or zero).


○​ pages INT: The number of pages in a book (e.g., 328, 1178). We can perform arithmetic
operations on pages (e.g., find the average number of pages). We wouldn't store fractional page
numbers.
○​ Internally, INT is typically stored using a fixed number of bits (e.g., 32 bits), which limits its range.
●​ SMALLINT: A smaller-range integer, useful for saving space when you know the values will be small.
○​ rating SMALLINT: A rating on a scale of 1-5.
●​ BIGINT: A larger-range integer, for very large numbers.
○​ total_copies_sold BIGINT: If we sell billions of books.
●​ DECIMAL(p, s) or NUMERIC(p, s): Fixed-point decimal numbers. Crucial for financial data.
○​ price DECIMAL(5, 2): Stores prices up to 999.99. The 5 is the precision (total number of
digits), and the 2 is the scale (digits after the decimal point).
○​ Why DECIMAL and not FLOAT for price? Because DECIMAL stores the value exactly. FLOAT can
introduce tiny rounding errors, which are unacceptable for financial transactions.
●​ FLOAT(p), REAL, DOUBLE PRECISION: Floating-point decimal numbers (approximate representations).
○​ Less common in our bookstore example. Used more in scientific computing where a wide range of
values is needed, and absolute precision is less critical. They follow the IEEE 754 standard.

2. Character String Types:

●​ CHAR(n): Fixed-length strings. n specifies the exact number of characters.


○​ isbn_type CHAR(4): We might store "ISBN" here. It will always be 4 characters. If we insert
'ISB', it gets stored as 'ISB ' (with a space). If we have a book code that is always 5 characters,
CHAR(5) would be appropriate.
○​ Important: CHAR(n) always uses n bytes of storage, even if the string is shorter (it's padded with
spaces). This can waste space if lengths vary significantly. Also, string comparison will have those
spaces too.
●​ VARCHAR(n): Variable-length strings. n is the maximum length.
○​ title VARCHAR(255): Book titles vary in length. "1984" is stored as "1984". "The Lord of the
Rings" is stored as "The Lord of the Rings". If a title exceeds 255 characters, it will be truncated.
○​ CHAR vs. VARCHAR - Key Considerations:
■​ Storage: VARCHAR is generally more space-efficient for varying-length strings.
■​ Performance: While CHAR can have a slight edge in very specific cases (all values exactly n
characters), this is often negligible. The space savings of VARCHAR usually outweigh this.
Incorrectly using CHAR (e.g., CHAR(255) for short strings) can hurt performance due to
increased disk I/O.
■​ Comparisons: Trailing spaces in CHAR matter in comparisons. This can lead to unexpected
results if you're not careful.
■​ Recommendation: Use VARCHAR unless you have a very good reason to use CHAR (e.g., a
field that is guaranteed to always be a fixed length).
●​ TEXT: For very large text strings (e.g., book summaries, reviews).

3. Date and Time Types:

○​ DATE: Stores a date (year, month, day).


■​ publication_date DATE: Stores values like '1954-07-29'. SQL provides functions for
date arithmetic (e.g., calculating the difference between two dates).
○​ TIME: Stores a time (hour, minute, second).
○​ DATETIME or TIMESTAMP: Stores both a date and a time. TIMESTAMP often includes timezone
information.
○​ INTERVAL: Represents a duration of time (e.g., '3 days', '1 year 2 months').
●​ Boolean Type:
○​ BOOLEAN: Represents true/false values.
■​ in_stock BOOLEAN: Indicates whether a book is currently in stock.
●​ Other Types:
○​ ENUM: Allows a column to store fixed values.
■​ Example: book_condition ENUM('new', 'used', 'damaged')
○​ UUID: Universally Unique Identifier.
■​ Example: book_id UUID
○​ BLOB: Binary Large Object (for storing images, audio, etc.).
○​ JSON, JSONB: For storing JSON data (semi-structured). JSONB is usually preferred for its indexing
capabilities.​
3. Basic SQL Clauses:
1. Retrieving Data

SQL (Structured Query Language) is how we interact with the database. SELECT and FROM are fundamental
clauses for retrieving data.

●​ FROM Clause:​

○​ Specifies the table (or tables) we want to query. It's the starting point – where is the data?
○​ FROM books: We're getting data from the books table.
●​ SELECT Clause:​

○​ Specifies which columns (attributes) we want to retrieve. It's a projection operation in relational
algebra terms.
○​ SELECT *: Retrieve all columns.
■​ SELECT * FROM books; (This gets all columns and all rows from the books table).
○​ SELECT title, author: Retrieve only the title and author columns.
■​ SELECT title, author FROM books;
○​ SELECT title AS book_title, price AS cost: Retrieve title and price, but rename
the columns in the output. The AS keyword creates an alias.
■​ SELECT pages * 2 AS double_pages FROM books; (We can even perform
calculations within the SELECT clause).

Example Queries:

SQL

-- Get all information about all books.


SELECT *
FROM books;

-- Get the title and price of all books.


SELECT title, price
FROM books;

-- Get the title, author, and publication date, renaming the columns.
SELECT title AS book_title, author AS book_author, publication_date AS pub_date
FROM books;

-- Calculate the price plus a 10% tax.


SELECT title, price,
price * 1.10 AS price_with_tax
FROM books;
When you execute a SELECT statement, the database goes through several steps:

-​ parsing (checking syntax)


-​ validation (checking table/column existence and permissions)
-​ optimization (finding the most efficient way to execute the query)
-​ execution (retrieving the data)
-​ and result formatting (returning the data to you).

4. Keys in Relational Databases


Keys are a crucial concept in relational databases. They are used to uniquely identify rows within a table and to
establish relationships between tables. Understanding keys is essential for designing well-structured and efficient
databases.

1. What are Keys?

In the context of relational databases, a key is one or more columns whose values are used to identify rows
(tuples) in a table. Think of them as unique identifiers or addresses for each record. Different types of keys serve
different purposes.

2. Types of Keys

●​ Super Key:​

○​ A superkey is any set of columns that can uniquely identify a row. It might have more columns than
strictly necessary for uniqueness.
○​ Example (Bookstore): In our books table, {book_id} is a superkey. But so is {book_id,
title}, {book_id, author}, {book_id, title, author, price}, and even the set of
all columns in the table. Any combination that includes book_id will be a superkey, because
book_id itself guarantees uniqueness. A superkey is a very broad concept.
●​ Candidate Key:​

○​ A candidate key is a minimal superkey. This means it's a set of columns that uniquely identifies
rows, and no subset of those columns can also uniquely identify rows. You can't remove any
columns from a candidate key and still maintain uniqueness.
○​ Example (Bookstore): In our books table, {book_id} is a candidate key. It uniquely identifies
each book, and it's minimal (we can't remove any columns).
○​ Another Example: Let's say we also have an isbn (International Standard Book Number) column,
and we enforce that ISBNs must be unique. Then, {isbn} would also be a candidate key. A table
can have multiple candidate keys.
○​ Finding Candidate Keys: This requires understanding the meaning of the data. You need to know
which combinations of attributes should be unique based on the real-world entities you're
modeling.
●​ Primary Key:
○​ The primary key is one of the candidate keys that you choose to be the main identifier for rows in
the table. It's a design decision.
○​ Example (Bookstore): We would likely choose {book_id} as the primary key for our books
table.
○​ Characteristics of a Primary Key:
■​ Uniqueness: The primary key must be unique for every row. The database will enforce this.
■​ Non-NULL: The primary key cannot contain NULL values. Every row must have a value for
the primary key.
■​ Immutability (Ideally): While not strictly enforced by all databases, it's generally best
practice for primary key values to be immutable (never change). This helps maintain data
integrity and simplifies relationships.
■​ Single Column (Usually): While a primary key can be composed of multiple columns (a
composite primary key), it's often simpler and more efficient to use a single-column primary
key.
○​ Example (Composite Primary Key): Imagine a table order_items that stores the individual
items in each order. It might have columns order_id and product_id. Neither order_id nor
product_id is unique on its own (an order can have multiple products, and a product can be in
multiple orders). But the combination {order_id, product_id} is unique – a specific product
within a specific order. This would be a composite primary key.
●​ Alternate Key (or Secondary Key):
○​ Any candidate key that is not chosen as the primary key is called an alternate key (or sometimes
secondary key).
○​ Example (Bookstore): If book_id is the primary key, and isbn is also unique, then isbn is an
alternate key.
○​ Usefulness: Alternate keys are often used for indexing and enforcing uniqueness constraints on
other columns besides the primary key.
●​ Foreign Key:
○​ A foreign key is a column (or set of columns) in one table that refers to the primary key (or a
candidate key) of another table. This is how relationships between tables are established.​

Example (Bookstore): Let's add an authors table:



CREATE TABLE authors (
author_id INT PRIMARY KEY,
author_name VARCHAR(255)
);
Now, we modify our books table to include a foreign key referencing
the authors table:​


CREATE TABLE books (
book_id INT PRIMARY KEY,
title VARCHAR(255),
author_id INT, -- Foreign Key
price DECIMAL(5, 2),
publication_date DATE,
in_stock BOOLEAN,
pages INT,
FOREIGN KEY (author_id) REFERENCES authors(author_id)
);

○​ The author_id column in books is a foreign key. It refers to the author_id column (the
primary key) in the authors table.​

■​ Each value present in the author_id column in the books table, should have a
corresponding value in the author_id column in the authors table.
○​ Referential Integrity: Foreign keys enforce referential integrity. This means the database will
prevent you from:​

■​ Inserting a row into the books table with an author_id that doesn't exist in the
authors table.
■​ Deleting a row from the authors table if there are still rows in the books table that
reference that author.
■​ Updating the author_id in authors if there are dependent rows in books.
○​ ON DELETE and ON UPDATE Clauses: You can specify what should happen when a referenced
primary key value is deleted or updated:​

■​ ON DELETE CASCADE: If an author is deleted from authors, all books by that author are
also deleted from books.
■​ ON DELETE SET NULL: If an author is deleted, the author_id in the corresponding
books rows is set to NULL.
■​ ON DELETE RESTRICT (or NO ACTION): Prevents the deletion of the author if there are
related books. This is often the default behavior.
■​ Similar options exist for ON UPDATE.
●​ Unique Key:​

○​ A unique key constraint ensures that all values in a column (or set of columns) are unique. It's
similar to a primary key, but:
■​ A table can have multiple unique keys.
■​ Unique keys can allow NULL values (although only one NULL value is typically allowed).
■​ A unique index is created for unique keys.
Example:


CREATE TABLE books (
book_id INT PRIMARY KEY,
title VARCHAR(255),
author_id INT,
price DECIMAL(5, 2),
publication_date DATE,
in_stock BOOLEAN,
pages INT,
isbn VARCHAR(20) UNIQUE, -- Enforces uniqueness on ISBN
FOREIGN KEY (author_id) REFERENCES authors(author_id)
);

●​ Composite Key:​

○​ A key that consists of two or more columns. This is necessary when a single column is not sufficient
to guarantee uniqueness.
○​ Example (Order Items):


CREATE TABLE order_items (
order_id INT,
product_id INT,
quantity INT,
PRIMARY KEY (order_id, product_id), -- Composite primary
key
FOREIGN KEY (order_id) REFERENCES orders(order_id),
FOREIGN KEY (product_id) REFERENCES products(product_id)
);

●​ In this case, a single order can have multiple products, and a single product can be part of multiple orders.
The combination of order_id and product_id uniquely identifies each item within an order.​

3. Why are Keys Important?

●​ Data Integrity: Keys enforce uniqueness and referential integrity, preventing data inconsistencies and
errors.
●​ Data Relationships: Foreign keys define relationships between tables, allowing you to link related data.
●​ Query Performance: Keys are often used to create indexes, which dramatically speed up data retrieval.
●​ Data Normalization: Keys are a fundamental part of database normalization, a process of organizing data
to reduce redundancy and improve data integrity.
5. Relationships in Relational Databases
Relationships in a database describe how different tables are connected. There are three main types:

●​ One-to-One (1:1)
●​ One-to-Many (1:N) or Many-to-One (N:1)
●​ Many-to-Many (M:N)

The "How Many?" Method for Identifying Relationships

The easiest way to figure out the relationship type is to ask "how many?" questions. Let's use our bookstore and
some new examples to illustrate this. We'll use "A" and "B" to represent the two entities (tables) we're
considering.

1.​ Ask the "One to?" Question (A to B):


○​ "For one A, how many Bs can there be?"
2.​ Ask the "One to?" Question (B to A):
○​ "For one B, how many As can there be?"​

Examples (Bookstore)

Example 1: Authors and Books (One-to-Many)

●​ Tables: authors (author_id, author_name), books (book_id, title, author_id, ...)


●​ Question 1 (Authors to Books): For one author, how many books can there be? Answer: Many (An author
can write many books).
●​ Question 2 (Books to Authors): For one book, how many authors can there be? Answer: One (We're
assuming a single author per book for simplicity).
●​ Conclusion: One-to-Many (1:N) relationship from authors to books. The books table will have a
foreign key (author_id) referencing the authors table.

Example 2: Customers and Orders (One-to-Many)

●​ Tables: customers (customer_id, customer_name, ...), orders (order_id, customer_id, ...)


●​ Question 1 (Customers to Orders): For one customer, how many orders can there be? Answer: Many (A
customer can place many orders).
●​ Question 2 (Orders to Customers): For one order, how many customers can there be? Answer: One (An
order belongs to a single customer).
●​ Conclusion: One-to-Many (1:N) relationship from customers to orders. The orders table will have a
foreign key (customer_id) referencing the customers table.
Example 3: Books and Categories (Many-to-Many)

●​ Tables: books (book_id, ...), categories (category_id, ...), book_categories (book_id, category_id)
●​ Question 1 (Books to Categories): For one book, how many categories can there be? Answer: Many (A
book can belong to multiple categories: Fiction, Mystery, etc.).
●​ Question 2 (Categories to Books): For one category, how many books can there be? Answer: Many (A
category can contain many books).
●​ Conclusion: Many-to-Many (M:N) relationship. We need a junction table (book_categories) to
represent this.

Example 4: Students and Courses (Many-to-Many)

●​ Tables: students (student_id, ...), courses (course_id, ...), student_courses (student_id, course_id)
●​ Question 1 (Students to Courses): For one student, how many courses can they take? Answer: Many
●​ Question 2 (Courses to Students): For one course, how many students can be enrolled? Answer: Many
●​ Conclusion: Many-to-Many (M:N). Requires a junction table (student_courses).

Example 5: Employees and Departments (One-to-Many / Many-to-One)

●​ Tables: employees (employee_id, ..., department_id), departments (department_id, ...)


●​ Question 1 (Employees to Departments): For one employee, how many departments can they belong to?
Answer: One (Typically, an employee is in one department).
●​ Question 2 (Departments to Employees): For one department, how many employees can there be?
Answer: Many (A department can have many employees).
●​ Conclusion: One-to-Many (1:N) from departments to employees, or Many-to-One (N:1) from
employees to departments. The employees table has a foreign key (department_id).

●​ Example 6: Products and Suppliers (Many-to-Many - Potentially)​

○​ Tables: products, suppliers, product_suppliers (junction table)


○​ Question 1 (Products to Suppliers): For one product, how many suppliers can there be? Answer:
Many (A product could be sourced from multiple suppliers).
○​ Question 2 (Suppliers to Products): For one supplier, how many products can they supply?
Answer: Many (A supplier can provide many products).
○​ Conclusion: Many-to-Many (M:N). Requires a product_suppliers junction table.
Example 7: Books and Book Details (One-to-One - Less Common)

○​ Tables: books, book_details


○​ Question 1 (Books to Book Details): For one book, how many detail entries can there be? Answer:
One
○​ Question 2 (Book Details to Books): For one book detail entry, how many books can it relate to?
Answer: One
○​ Conclusion: One-to-One (1:1). Often, you'd just combine these into a single table, but there might
be reasons (like very large text fields in book_details) to separate them.

Implementation Summary (SQL)

●​ One-to-Many: Foreign key on the "many" side table, referencing the primary key of the "one" side table.
●​ Many-to-Many: Create a junction table with foreign keys referencing the primary keys of both original
tables. The primary key of the junction table is usually a composite key of these two foreign keys.
●​ One-to-One: Foreign key in one table referencing the primary key of the other. Often, the foreign key
column is also made the primary key of its table.
6. Relationships in Relational Databases & ER Diagrams
We'll cover:

1.​ ER Diagrams: Visualizing Relationships


2.​ Relationship Types: Deep Dive with Key Details
3.​ Best Approach: Identifying Relationships and Keys Quickly

1. ER Diagrams: Visualizing Relationships

Entity-Relationship Diagrams (ERDs) are visual tools for representing the entities (tables) in a database and the
relationships between them. They provide a clear way to understand the database structure at a glance.

Entities: Represented by rectangles. Each rectangle corresponds to a table. The name of the entity (table) is inside
the rectangle.​

+-------------+
| Customers |
+-------------+

Attributes: Listed inside the entity rectangle (often shown as ovals connected to the rectangle in more detailed
diagrams). Key attributes (primary keys) are usually underlined.​

+-------------+
| Customers |
+-------------+
| customer_id | (Underlined - Primary Key)
| name |
| address |
+---------------+

Relationships: Represented by diamonds connected to the entities involved. The type of relationship (1:1, 1:N,
M:N) is indicated by crow's foot notation on the lines connecting the relationship diamond to the entities.​

One-to-Many (1:N):​

+-------------+ +----------+ +----------+
| Customers |-----< Orders >-----| Orders |
+-------------+ +----------+ +----------+

●​ The single line near Customers indicates "one".


●​ The "crow's foot" (three-pronged line) near Orders indicates "many".
●​ This reads as: "One customer can have many orders, but each order belongs to one customer."

Many-to-Many (M:N):​

+--------+ +----------+ +----------+
| Books |>----< Related >-----< Categories|
+--------+ +----------+ +----------+

●​ Crow's feet on both sides indicate "many".


●​ This reads as: "Many books can belong to many categories, and many categories can contain many books."
●​ Important: In an ERD, an M:N relationship implicitly indicates the need for a junction table. Sometimes the
junction table is shown explicitly, and sometimes it's just implied by the M:N relationship.

○​ One-to-one(1:1): +--------+ +----------+ +----------+ | Books |-----|


Relates |-----| BookDetails| +--------+ +----------+ +----------+​

○​ One line on both sides.​

2. Relationship Types:

Let's revisit each relationship type, focusing on the key structure (primary, foreign, composite) and providing clear
SQL examples.

One-to-Many (1:N) / Many-to-One (N:1)

●​ Key Structure:
○​ Primary Key: Each table has its own primary key.
○​ Foreign Key: The table on the "many" side has a foreign key that references the primary key of the
table on the "one" side.

Example (Customers and Orders):​



CREATE TABLE Customers (
customer_id INT PRIMARY KEY, -- Primary key of Customers
name VARCHAR(255)
);

CREATE TABLE Orders (


order_id INT PRIMARY KEY, -- Primary key of Orders
customer_id INT, -- Foreign key referencing Customers
order_date DATE,
FOREIGN KEY (customer_id) REFERENCES Customers(customer_id)
);

●​ Customers.customer_id: Primary key.


●​ Orders.order_id: Primary key.
●​ Orders.customer_id: Foreign key, referencing Customers.customer_id.
●​ No Composite Keys are needed in this example.

ERD: (As shown above)

Many-to-Many (M:N)

Key Structure:

●​ Primary Keys: Each of the original tables has its own primary key.
●​ Junction Table: A third table (junction table) is created.
●​ Foreign Keys: The junction table has two foreign keys: one referencing each of the original tables' primary
keys.
●​ Composite Primary Key: The junction table's primary key is usually a composite key consisting of the two
foreign keys.

Example (Books and Categories):​



CREATE TABLE Books (
book_id INT PRIMARY KEY,
title VARCHAR(255)
);

CREATE TABLE Categories (


category_id INT PRIMARY KEY,
category_name VARCHAR(255)
);

CREATE TABLE Book_Categories ( -- Junction Table


book_id INT,
category_id INT,
PRIMARY KEY (book_id, category_id), -- Composite Primary Key
FOREIGN KEY (book_id) REFERENCES Books(book_id),
FOREIGN KEY (category_id) REFERENCES Categories(category_id)
);

●​ Books.book_id: Primary key.


●​ Categories.category_id: Primary key.
●​ Book_Categories.book_id: Foreign key referencing Books.
●​ Book_Categories.category_id: Foreign key referencing Categories.
●​ Book_Categories.(book_id, category_id): Composite primary key. This ensures that a
book-category combination is unique.

ERD: (As shown above, with the M:N relationship implicitly requiring the junction table). A more detailed ERD
might explicitly show the Book_Categories table.​

One-to-One (1:1)​

Key Structure:

●​ Primary Keys: Each table has its own primary key.


●​ Foreign Key: One of the tables has a foreign key referencing the primary key of the other table.
●​ Unique Constraint: The foreign key column often also has a UNIQUE constraint to enforce the one-to-one
rule. Often, the foreign key column is the primary key of its table.

Example (Books and BookDetails):​



CREATE TABLE Books (
book_id INT PRIMARY KEY,
title VARCHAR(255)
);

CREATE TABLE Book_Details (


book_id INT PRIMARY KEY, -- Also a foreign key
synopsis TEXT,
FOREIGN KEY (book_id) REFERENCES Books(book_id)
);

●​ Books.book_id: Primary key.


●​ Book_Details.book_id: Primary key and foreign key referencing Books.book_id. This ensures the
1:1 relationship.

ERD: (As shown above)​

3. Best Approach TO Identifying Relationships and Keys Quickly

1.​ Identify Entities: What are the main "things" you're storing information about? These are your tables.
2.​ Ask the "How Many?" Questions: For each pair of entities, ask the "how many?" questions to determine
the relationship type (1:1, 1:N, M:N).
3.​ Draw the ERD: Sketch a simple ERD to visualize the relationships. Use crow's foot notation.
4.​ Determine Primary Keys: Choose a primary key for each table. This should uniquely identify each row.
5.​ Determine Foreign Keys:
○​ 1:N: The "many" side table gets the foreign key.
○​ M:N: Create a junction table with two foreign keys and a composite primary key.
○​ 1:1: One table gets a foreign key referencing the other. Often, this foreign key is also the primary
key.
6.​ Consider Composite Keys: Only use composite primary keys when a single column is not sufficient for
uniqueness (most commonly in junction tables).

Example - (Students and Classes)

1.​ Entities: Students, Classes​

2.​ "How Many?":


○​ One student can take many classes.
○​ One class can have many students.
○​ Conclusion: Many-to-Many (M:N)
3.​ ERD: +-----------+ +----------+ +---------+ | Students |>----< Enrolled
>-----< Classes | +-----------+ +----------+ +---------+​

4.​ Primary Keys:​

○​ Students: student_id (INT, PRIMARY KEY)


○​ Classes: class_id (INT, PRIMARY KEY)
5.​ Foreign Keys & Junction Table:​

○​ Create a junction table: Student_Classes


○​ Student_Classes:
■​ student_id (INT, FOREIGN KEY referencing Students)
■​ class_id (INT, FOREIGN KEY referencing Classes)
■​ PRIMARY KEY (student_id, class_id) (Composite primary key)
6.​ Final Tables

CREATE TABLE Students (


student_id INT PRIMARY KEY,
student_name VARCHAR(60)
);

CREATE TABLE Classes (


class_id INT PRIMARY KEY,
class_name VARCHAR(60)
);

CREATE TABLE Student_Classes (


student_id INT,
class_id INT,
PRIMARY KEY(student_id, class_id),
FOREIGN KEY (student_id) REFERENCES Students(student_id),
FOREIGN KEY (class_id) REFERENCES Classes(class_id)
);

By following this systematic approach, you can quickly and accurately determine the relationships between your
entities and design the appropriate key structure for your database. The "how many?" questions and the ERD
visualization are powerful tools for understanding and designing relational databases.

7. Filtering Data with WHERE Clause: Operators and Logical Conditions


The WHERE clause is a fundamental part of SQL, allowing you to specify conditions that rows must meet to be
included in the result set of a query. It's how you filter your data to retrieve only the information you need.

1. Basic Syntax and Purpose

The WHERE clause comes after the FROM clause in a SELECT statement:

condition: An expression that evaluates to either TRUE, FALSE, or NULL for each row in the table. Only rows
for which the condition is TRUE are included in the results.

2. Comparison Operators

These operators compare values:

●​ = (Equal to): WHERE price = 19.99 (Selects books with a price exactly equal to 19.99).
●​ <> or != (Not equal to): WHERE in_stock <> TRUE (Selects books that are not in stock). <> is the
standard SQL operator; != is also widely supported.
●​ > (Greater than): WHERE pages > 500 (Selects books with more than 500 pages).
●​ < (Less than): WHERE publication_date < '2000-01-01' (Selects books published before the year
2000).
●​ >= (Greater than or equal to): WHERE price >= 10.00 (Selects books with a price of 10.00 or more).
●​ <= (Less than or equal to): WHERE pages <= 200 (Selects books with 200 pages or fewer).

3. Logical Operators (Combining Conditions)

You can combine multiple conditions using logical operators:

●​ AND: Both conditions must be true.


○​ WHERE price > 10.00 AND in_stock = TRUE (Selects books that cost more than $10.00
and are in stock).
●​ OR: At least one condition must be true.
○​ WHERE author = 'J.R.R. Tolkien' OR author = 'Jane Austen' (Selects books by
either Tolkien or Austen).
●​ NOT: Reverses the truth value of a condition.
○​ WHERE NOT (price > 20.00) (Selects books where the price is not greater than 20.00 –
equivalent to WHERE price <= 20.00).
4. Operator Precedence and Parentheses

●​ NOT has the highest precedence, followed by AND, then OR.


●​ Use parentheses () to control the order of evaluation and make your logic clear. This is highly
recommended for readability and avoiding unexpected results.

Example:

-- Without parentheses, this is ambiguous:


SELECT * FROM books WHERE price > 10 AND in_stock = TRUE OR author = 'Jane
Austen';

-- With parentheses, it's clear:


SELECT * FROM books WHERE (price > 10 AND in_stock = TRUE) OR author = 'Jane
Austen'; -- (price>10 and in_stock) OR author
SELECT * FROM books WHERE price > 10 AND (in_stock = TRUE OR author = 'Jane
Austen'); -- price > 10 AND (in_stock OR author)

The two queries above, gives two different results.

5. BETWEEN Operator (Range Check)

●​ BETWEEN checks if a value falls within a specified range (inclusive).


●​ WHERE column BETWEEN value1 AND value2 is equivalent to WHERE column >= value1 AND
column <= value2.

Example (Bookstore):
6. IN and NOT IN Operators

●​ IN checks if a value matches any value in a list.


●​ NOT IN checks if a value does not match any value in a list.

Example (Bookstore):

7. LIKE Operator (Pattern Matching)

●​ LIKE is used for pattern matching with wildcards.


●​ %: Represents zero or more characters.
●​ _: Represents exactly one character.
●​ Case sensitivity of LIKE depends on the database system and collation settings. Some databases offer
case-insensitive variations (e.g., ILIKE in PostgreSQL).

Example (Bookstore):
8. IS NULL and IS NOT NULL (Checking for Null Values)

●​ NULL represents a missing or unknown value. It's not the same as zero or an empty string.
●​ You cannot use = to check for NULL. You must use IS NULL or IS NOT NULL.

Example (Bookstore): Let's assume we added a subtitle to the books, that might be NULL

Important:

●​ Comparison with NULL using operators like =, <, >, etc. will result in UNKNOWN, which does not satisfy
the condition.
●​ When you have a condition like price = NULL, this will not work.

9. Combining All Operators

You can create complex filtering conditions by combining all of the above operators and using parentheses to
control precedence.

EXAMPLE

-- Complex example:

-- Get books that (are in stock AND (have a price between 10 and 20 OR are by
'Jane Austen'))

-- AND (do NOT have a title starting with 'The')

SELECT title, author, price, in_stock


FROM books
WHERE (in_stock = TRUE AND (price BETWEEN 10.00 AND 20.00 OR author = 'Jane
Austen'))
AND (title NOT LIKE 'The%');
8. Ordering, Limiting, and Pagination (ORDER BY, LIMIT, OFFSET)
This section covers how to control the order in which rows are returned, how to limit the number of rows
returned, and how to implement pagination (displaying results in "pages").

1. ORDER BY Clause: Sorting Results

●​ Purpose: The ORDER BY clause sorts the result set of a query based on one or more columns.​

●​ Syntax:

●​ column1, column2, ...: The columns to sort by. You can sort by multiple columns.
●​ ASC: Sorts in ascending order (smallest to largest, A to Z, earliest to latest). This is the default if you don't
specify ASC or DESC.
●​ DESC: Sorts in descending order (largest to smallest, Z to A, latest to earliest).

Examples (Bookstore):

-- Get all books, sorted by title (alphabetically, A-Z).


SELECT title, author
FROM books
ORDER BY title; -- ASC is implied

-- Get all books, sorted by price (highest to lowest).


SELECT title, author, price
FROM books
ORDER BY price DESC;

-- Get all books, sorted by author (A-Z) and then by title (A-Z)
within each author.
SELECT title, author
FROM books
ORDER BY author ASC, title ASC;

-- Get all books, sorted by publication date (latest to earliest) and


then by price (lowest to highest).
SELECT title, author, publication_date, price
FROM books
ORDER BY publication_date DESC, price ASC;
●​ Sorting by Multiple Columns: When you sort by multiple columns, the sorting is done hierarchically. The
results are first sorted by the first column specified, then within each group of rows that have the same
value for the first column, they are sorted by the second column, and so on.​

●​ Sorting and NULLs:​

○​ The placement of NULL values in the sorted output depends on the database system. Some
systems place them first, others last. You may be able to control this behavior with
database-specific extensions (e.g., NULLS FIRST or NULLS LAST in PostgreSQL).

2. LIMIT Clause: Restricting the Number of Rows

●​ Purpose: The LIMIT clause (or its equivalent in some databases) restricts the number of rows returned by
a query.​

●​ Syntax:

●​ number_of_rows: The maximum number of rows to return.

Examples (Bookstore):

●​ LIMIT without ORDER BY: While you can use LIMIT without ORDER BY, the results are not guaranteed
to be in any particular order. The database will simply return any number_of_rows rows that meet the
other criteria (if any). It's generally best practice to use ORDER BY with LIMIT to get predictable results.​

3. OFFSET Clause: Implementing Pagination

●​ Purpose: The OFFSET clause, used in conjunction with LIMIT, allows you to skip a specified number of
rows before starting to return results. This is essential for pagination – displaying results in pages (e.g.,
"Page 1," "Page 2," etc.).​

●​ Syntax:

●​ number_of_rows_to_skip: The number of rows to skip before returning results.

Examples (Bookstore - Pagination):


General Formula for Pagination:

●​ To get page number page_number with page_size items per page:


○​ LIMIT page_size
○​ OFFSET (page_number - 1) * page_size

Example (Generalized Pagination): Let's say we want page number 4, with 5 books per page:

LIMIT and OFFSET Variations Across Databases:

●​ MySQL, PostgreSQL, SQLite: Use the LIMIT and OFFSET keywords as described above.
●​ SQL Server (newer versions): Uses OFFSET ... FETCH NEXT ... ROWS ONLY.

EXAMPLE

●​ Oracle (newer versions): Similar to SQL Server, uses OFFSET ... FETCH FIRST ... ROWS ONLY.
●​ Older database versions might use different methods.

Key Considerations

●​ Performance: Using OFFSET with very large values can be inefficient, especially on large tables. The
database has to read and discard all the skipped rows. For large datasets, consider alternative pagination
strategies (e.g., "keyset pagination") if performance is critical.
●​ Consistency: If new data are inserted or updated, the order may be changed. So, it is important to use a
primary key in ordering.
BEST COURSE RECOMMENDATIONS

I. Foundational SQL (Beginner - No Prior Experience Needed)

These courses are your starting point. Pick ONE that suits your learning style. Don't do all of them at this stage;
you'll get redundant information.

1. SQL for Data Science (University of California, Davis): This is a very popular and well-regarded starting point.
It focuses on SQL within the context of data analysis, which is a great way to learn. It uses SQLite, which is good
for learning but you'll want to branch out later.

●​ Pros: Excellent introduction to the why of SQL, good pacing, practical exercises, strong focus on
data-related tasks.
●​ Cons: Limited to SQLite. Doesn't go as deep into database design principles as some others.

2. Databases and SQL for Data Science with Python (IBM): Another very strong option, especially if you're
interested in the intersection of SQL and Python. It teaches IBM Db2 (a variant of SQL), which is closer to
industry standards than SQLite. Includes using SQL within Jupyter Notebooks.

●​ Pros: Good integration with Python, covers a more "real-world" SQL dialect, includes database
connectivity and working with APIs.
●​ Cons: Might be slightly faster-paced for complete beginners than the UC Davis course.


3. Querying Data with Transact-SQL (Microsoft): This is a fantastic introduction if you're interested in working
with Microsoft SQL Server (a very common database system in enterprise environments). Focuses on T-SQL, the
Microsoft dialect.

●​ Pros: Excellent for learning a specific, in-demand dialect (T-SQL), very well-structured, taught by
Microsoft experts.
●​ Cons: Highly specific to SQL Server. If you don't know you'll be using SQL Server, a more general course
might be a better start.

4. A Introduction for querying Databases: Analyze data within a database using SQL. Create a relational
database on Cloud and work with tables. Write SQL statements including SELECT, INSERT, UPDATE, and
DELETE.
Recommendation: If you want a general introduction and are interested in data analysis, start with the SQL for
Data Science (UC Davis) Specialization. If you anticipate working with Microsoft technologies, the Querying Data
with Transact-SQL (Microsoft) course on edX is excellent. If you know you'll be using Python, the Databases and
SQL for Data Science with Python (IBM) is a great choice.

II. Intermediate SQL (Building on the Foundations)


Once you have the basics, these courses help you deepen your knowledge and tackle more complex queries. You'll
likely want to take at least two courses from this section, possibly more, depending on your goals.


5. Data Analysis with SQL (Part of the Google Data Analytics Professional Certificate): While part of a larger
certificate, this section is excellent for solidifying your SQL skills and applying them to real-world data analysis
problems. Uses BigQuery (Google's cloud-based data warehouse).

●​ Pros: Very practical, focuses on using SQL for analysis, teaches a cloud-based platform (BigQuery). Good
for building a portfolio.
●​ Cons: Requires some basic data analysis knowledge (which you'll get in the earlier parts of the certificate
if you choose to take it).

6. Advanced SQL for Data Scientists (LinkedIn Learning): Focuses on more advanced query techniques, window
functions, common table expressions (CTEs), and performance optimization. Uses a variety of SQL dialects.

●​ Pros: Covers essential advanced topics, good for improving query writing efficiency.
●​ Cons: Assumes a solid foundation in SQL.


7. Analyzing Big Data with SQL (Cloudera): Excellent for learning how to work with large datasets using SQL and
tools like Apache Hive and Impala. A good choice if you're interested in "big data" technologies.

●​ Pros: Focuses on big data concepts, introduces relevant tools (Hive, Impala).
●​ Cons: Requires some understanding of distributed systems (although it provides an introduction). Less
focused on traditional relational databases.

8. Data Science: Querying with SQL (HarvardX): This course, which can be part of the Professional Certificate in
Data Science, covers SQL with a focus on the skills needed for a data science career.

●​ Pros: From a prestigious institution, data science focused, solid content.


●​ Cons: Can be more challenging, and part of a broader series.
III. Advanced SQL and Database Management

These courses are for those who want to become true SQL experts and/or move into database administration
roles.​

9. Database Design and Basic SQL in PostgreSQL (University of Michigan): This is a very strong course for
learning about relational database design principles, normalization, and using PostgreSQL (a popular and
powerful open-source database).

●​ Pros: Excellent for understanding database design theory, teaches a widely-used open-source database.
Crucial for building efficient and scalable databases.
●​ Cons: Less focused on data analysis, more on database structure.

10. PostgreSQL for Everybody Specialization (University of Michigan): An extension of the single course that
provides a more in-depth treatment of PostgreSQL. It will cover more complex operations and usage.

●​ Pros: Comprehensive PostgreSQL coverage.


●​ Cons: PostgreSQL specific.

11. NoSQL systems: While not strictly SQL, understanding NoSQL databases (like MongoDB, Cassandra) is
increasingly important. This course covers the concepts and how they differ from relational databases.

●​ Pros: Introduces a crucial area of modern database technology.


●​ Cons: Not SQL; requires a shift in thinking from relational databases.

12. Data Management and Visualization (Wesleyan University): While focusing more broadly on data
management, a significant portion covers database design, SQL, and data modeling.

●​ Pros: Well-rounded perspective on the whole data pipeline.


●​ Cons: SQL is not the sole focus.

Learning Path:
1.​ Beginner: Choose one from:
○​ SQL for Data Science (Coursera - UC Davis)
○​ Databases and SQL for Data Science with Python (Coursera - IBM)
○​ Querying Data with Transact-SQL (edX - Microsoft)
2.​ Intermediate: Choose at least two from:
○​ Data Analysis with SQL (Coursera - Google)
○​ Advanced SQL for Data Scientists (Coursera - LinkedIn Learning)
○​ Analyzing Big Data with SQL (edX - Cloudera)
○​ Data Science: Querying with SQL (edX - HarvardX)

3.​ Advanced: Choose based on your specific interests:


○​ Database Design and Basic SQL in PostgreSQL (Coursera - UMich) and/or PostgreSQL for
Everybody Specialization
○​ Database Systems Concepts & Design (edX - Georgia Tech)
○​ NoSQL systems (Coursera - UCSD) - if you want to broaden your database knowledge beyond
relational databases.

Important Considerations:

●​ Practice, Practice, Practice: SQL is a practical skill. The best way to learn is to write SQL queries. Use the
exercises provided in the courses, and also try to find your own datasets to work with. Sites like Kaggle,
data.world, and public government data portals are great resources.
●​ SQL Dialects: Be aware that there are different "dialects" of SQL (e.g., MySQL, PostgreSQL, SQL Server,
Oracle, SQLite). They share the core syntax, but there are differences in functions and features. The
courses above cover various dialects; choose ones that align with your career goals or the tools you're
likely to use.
●​ Projects: Build projects! Design your own database, populate it with data, and then write queries to
answer questions about it. This is the best way to solidify your skills and build a portfolio.
●​ Don't Be Afraid to Experiment: Try different queries, break things (in a safe environment!), and learn
from your mistakes. SQL is a very forgiving language to learn.
●​ Read Documentation: Become comfortable reading the official documentation for the specific SQL
dialect you are using. This is a crucial skill for any database professional.

You might also like