D426 Data Management Foundations Study Guide
Database Roles:
Database administrator: Secures database system against unauthorized users
(Controls user access availability).
Authorization: Limits what each user can access
Rules: Ensures that the database system is consistent with structural and business
rules
Query Processor: Reads and understands a query. Creates plans to determine the
best way to get or change data and sends the results back to the application.
o Query optimization: Finds the best way to run a query to retrieve or modify
data as efficiently as possible. The Query Processor completes this.
Storage Manager: Takes the query processor’s instructions and turns them into file
commands to get or change data. Uses indexes to quickly locate data
Transaction Manager: Makes sure transactions are properly executed. Prevents
conflicts between concurrent transactions and restores the database in the event of
failure.
SQL Datatypes
INT: Stores integer values (positive & negative).
Category Example Data type Storage Notes
Signed: 128 to 127
TINYINT 1 byte Unsigned: 0 to 255
Signed: -32,768 to
SMALLINT 2 bytes 32,767
Unsigned: 0 to 65,535
Signed: -8,388,608 to
8,388,607
Integer 34 and -739448 MEDIUMINT 3 bytes Unsigned: 0 to
16,777,215
Signed: -2,147,483,648
to 2,147,483,647
INT 4 bytes Unsigned: 0 to
4,294,967,295
Signed: -263 to 263 -1
BIGINT 8 bytes Unsigned: 0 to 264 -1
DECIMAL: Stores fractional numeric values
VARCHAR: Stores textual values
DATE: Stores year, month, and day (format: 1999-12-30)
Arithmetic and Comparison Operators
+: Adds two numeric values. Example: 4 + 3 = 7.
- (unary): Reverses the sign of one numeric value. Example: -(-2) = 2.
- (binary): Subtracts one numeric value from another. Example: 11 - 5 = 6.
% (modulo): Divides one numeric value by another and returns the integer remainder.
Example: 5 % 2 = 1.
=: Compares two values for equality. Example: 1 = 2 is FALSE.
>: Compares two values with >. Example: '2019-08-13' > '2021-08-13' is FALSE.
!=: Compares two values for inequality Example: 1 !=2
Database Design Phases:
Analysis: Outlines what the database needs without focusing on any specific system
o Entities: Things or people (customers, products, orders)
o Relationships: How entities are related (Customer places an order)
o Attributes: Details about the entity (Customer name or order date)
Logical Design: Turns database requirements into a specific system format
o Entities = tables Relationships = keys attributes = columns
Physical Design: Adds indexes to speed up data retrieval and table organizations to
optimize performance. Has data independence so it does not change the actual
results of queries only improves speed (Rule 7)
SQL Commands:
CREATE TABLE: Creates a table
INSERT: Inserts rows into a table
SELECT: Retrieves data from a table
UPDATE: Modifies data in a table
DELETE: Deletes rows from a table
ALTER TABLE: adds, deletes, or modifies columns on an existing table
TRUNCATE: Deletes all rows from a table. It does not log individual row deletions making it
quicker
MERGE: Selects data from one table (source) and inserts the data to another table (target).
RESTRICT: rejects an insert, update, or delete that violates referential integrity
SET NULL: sets invalid foreign keys to NULL
SET DEFAULT: Sets invalid foreign keys to the foreign key default value
CASCADE: Propagates primary key changes to foreign keys
SQL Operators
BETWEEN: Provides an alternative way to determine if a value is between two other values
LIKE: Used in a WHERE clause to match text against a pattern in a column. You use %
(multiple characters) or _ for single characters.
Finds all employees name who start with J
Alias: Temporary name assigned to a column or table using the AS keyword.
SQL Functions:
Function Description Example
Returns the absolute SELECT ABS(-5)
ABS(n)
value of n Returns: 5
SELECT LOWER
LOWER(s) Returns the lowercase s (‘MYSQL’)
Returns: ‘mysql’
Returnst he string s w/o
SELECT TRIM (‘ test’);
TRIM(s) leading and trailing
Returns: ‘test’
spaces
SELECT
HOUR('22:11:45');
returns 22
SELECT
HOUR(t) Returns the hour,
MINUTE('22:11:45');
MINUTE(t) minute, or second from
returns 11
SECOND(t) time t
SELECT
SECOND('22:11:45');
returns 45
Data Structure
Tuple: Ordered collection of elements enclosed in parentheses (a,b,c)
Table: has a name, fixed columns, and rows that can change
Column: has a name and specific data type (numbers or text)
Row: holds values that match the columns. Each value fits the column’s data type
Cell: a single column of a single row
A table must have at least one column but any number of rows. Table without rows is called
an empty table
SQL Elements
Literals: Explicit string, numeric, or binary values (‘String’, 123)
Keywords: Words with special meaning (SELECT, FROM, WHERE)
Identifiers: Objects from the database like tables, columns, etc. (City, Name,
Population)
Primary Key: Column or group of columns used to identify a row (typically the first
column)
o Simple Primary Key: Consists of a single column
o Composite primary Key: Consists of multiple columns
o Artificial key: Single column primary key used when there is no suitable single-
column or composite primary key exists. These are integers that are
automatically generated.
Candidate Key: Simple or composite column that is unique and minimal (table may
have several)
Auto-increment column: Numeric column that automatically increases by one each
time a new row is added
Foreign Key: A column or group of columns that refer to a primary key. The data type
must be the same!
o The database will reject insert, update, and delete statements that violate
referential integrity
Third normal form: Every non-key column A depends on a key column B (B must be
unique).
Boyce Codd normal form: Column A depends on B. So B must be unique Very similar
to Third normal for, but stricter because it will remove non-key term.
SQL Sublanguages
Data Definition Language (DDL): Defines and sets up the structure of the database
(Creating tables)
Data Query Language (DQL): Retrieves data from the database (SELECT)
Data Manipulation Language (DML): Changes or manipulates the data in the
database (INSERT, UPDATE, DELETE)
Data Control Language (DCL): Controls user permissions and access (GRANT,
REVOKE)
Data Transaction Language (DTL): manages database transactions (ROLLLBACK,
COMMIT)
Aggregate Functions
HAVING clause filters group results when used with GROUP BY
Used in the SELECT clause to calculate values from multiple rows. If you use a
WHERE clause then they only process rows that meet that condition. No WHERE
means all rows will be processed
o COUNT(): counts the number of rows in the set
o MIN(): finds the minimum value in the set
o MAX(): finds the maximum value in the set
o SUM(): sums all the values in the set
o AVG(): computes the arithmetic mean of all values in the set
JOIN and Union in SQL
JOIN is a SELECT statement that combines data from two tables known as left table
and right table into a single result. The columns must have comparable data types
o INNER JOIN: selects only matching left and right table rows
o FULL JOIN: selects all left and right table rows
o LEFT JOIN: selects all left table rows but only matching right table rows
o RIGHT JOIN: selects all right table rows, but only matching left table rows
o OUTER JOIN: selects unmatched rows, including left, right, and full joins
o UNION: combines the two results into one table
Equijoin: Compares columns of two tables with the =
Non-equijoin: Compares columns with < and >
Self-join: Joins a table to itself
Cross-join: Combines two tables without comparing columns (uses CROSS JOIN, not
ON)
Entity Relationship Model:
Known as ER diagram. It is a high-level diagram that shows how data entities relate
to each other. It focuses on data requirements and relationships without worrying
about how the data will be stored or implemented in a database.
Types usually become tables, foreign keys, and columns
o Entity Type: set of things (All employees in a company)
o Relationship Type: Set of related things (Employee-Manges Department)
o Attribute type: Set of values (Employee salaries)
Instance is an element of a set
o Entity instance: Individual thing (Employee Sam Snead)
o Relationship instance: Statement about entity instances (“Maria Rodriguez
manages Sales”)
o Attribute instance: Individual value (Salary is $35,000)
Steps:
Analysis Steps: Creates an ER model to outline data needs and relationships without
focusing on how it will be implemented
Logical Design: Transforms the ER model into tables, columns, and keys specific to a
database system
Physical Design: Adds indexes and organizes tables on storage media to optimize
performance
Table Structures:
Heap table: No specific order for rows. Good for fast and bulk loading because rows
are stored in the order they are inserted.
Sorted table: Ordered based on a specific column. This is fast
Hash table: Rows are placed into buckets using a hash function. This helps locate
rows quickly
Table cluster: Multiple tables share the same storage area. This can improve access
times for related data.
Other Definitions:
Application Programming Interface (API): Makes it easier to use SQL with other programming
languages. Has tools and rules that lets different programs communicate with each other.
MySQL Command-Line Client: Text-based tool included with MySQL Server. It shows error
codes and shows descriptions of the SQL statement if there are any issues or if it cannot be
executed.
Business Rules: Rules based on business policy and specific to a particular database.
Subquery: Is a query within another SQL query. This is also known as nested query or inner
query
Materialized View: Data is always stored. If a base table changes the corresponding view
tables can also change.
Cardinality: Refers to maxima and minima of relationships and attributes
Subtype entity: subset of a supertype entity. Ex: Managers are a subset of employees, so
Manager is a subtype entity of the Employee supertype entity. On ER diagrams, subtype
entities are drawn within the supertype. Vehicle Supertype entity and “car” “truck”
“motorcycle” subtype
Intangible entity: Documented in the data model, but not tracked with data in the database
Normalization: Eliminates redundancy by decomposing a table into two or more tables
Denormalization: Intentionally introducing redundancy by merging tables
Hit ratio: Measures percentage of table rows that a query retrieves.
Binary search: Database repeatedly splits the index in two until it finds the entry containing
the search value
Dense index: Contains an entry for every table row
Sparse index: contains an entry for every table block