My SQL for Beginners
Table of contents
• Overview
• Types of data
• Structured data
• Unstructured data
• Semi Structure data
• Difference b/n each data’s
• SQL Commands
• Data types:
• Constraints
• Install and Server connect
Overview
SQL stands for Structured Query Language which is a computer language for storing, manipulating and retrieving data stored in a
relational database. SQL was developed in the 1970s by IBM, it is a language to operate databases. It includes Database Creation,
Database Deletion, Fetching Data Rows, Modifying & Deleting Data rows, etc.
RDBMS stands for Relational Database Management System. RDBMS is the basis for SQL, and for all modern database systems
like MS SQL Server, IBM DB2, Oracle, MySQL, and Microsoft Access.
the most popular RDBMS are listed below −
MySQL
MS SQL Server
ORACLE
MS ACCESS
PostgreSQL
SQLite
SQL Applications
SQL is one of the most widely used Query Language over the databases.
SQL provides following functionality to the database programmers −
Executes different database queries against a database.
Defines the data in a database and manipulates that data.
Creates data in a relational database management system.
Accesses data from the relational database management system.
Creates and drops databases and tables.
Creates and maintains database users.
Creates views, stored procedures, functions in a database.
Sets permissions on tables, procedures and views.
Types of Data
Structured data Semi-Structured data Unstructured data
Characteri Defined is well organized, Data is organized to some Data is fully non organized
stics extent
Organized means – relational Partially organized, e.g by Based on character and binary data
database, XML/RDF
Matured transaction, multiple Transaction is adapted from Difficult but achievable transaction
concurrency techniques DBMS, but data concurrency management and data concurrency
can pose problems
Tuples, rows and tables Tuples or graphs are possible Versioning usually on whole data or
chunks
Schema dependent and less Data is more flexible than
flexible structured The most flexible
Query performance is the highest, Queries over anonymous nodes Schema on-read so query performance
structed query can be performed are possible is the lowest
allowing complex joins
Examples Transactional information, Names, XML/JSON Data, HTML, Emails, Documents - PDFs, Text files
Dates and Addresses Web pages •ViContracts
•Survey •Invoices •Letters
•Questionnaires •Purchase Orders •Articles
•Tests •Bills of Lading •Memos
•Claim Forms •Explanation of Benefits deos, Audio, Images files,
Structure
• Structured data is organized and easily searchable
with a predefined format, like data in a table.
• Definition: Data that conforms to a predefined data
model, typically organized in rows and columns, like in
a relational database.
• Examples: Dates, numbers, product SKUs, customer
information stored in a CRM system.
• Characteristics:
Well-defined schema and format. Pros of using structured data
Easily searchable using SQL (Structured Query •Easy for the average user to utilize and understand
Language). •Easy for machine learning algorithms to utilize
Data integrity is enforced through validation rules. •A greater number of analytics tools can use the data
Typically stored in data warehouses. •Requires less storage space
• Use Cases: Customer relationship management (CRM), Cons of using structured data
inventory management, financial transactions. •Limited to specific uses
• Benefits: Easy to analyze, manage, and retrieve •More limited storage options
information. •Difficult and expensive to make changes
• Drawbacks: Can be inflexible and may require Use cases for utilizing structured data
significant effort to adapt to new requirements. •Ecommerce: Product IDs, pricing data and customer account data
Most popular tools for the management of structured data •Healthcare: Patient forms, medical insurance data and medical billing data
MySQL: Embedding data in mass-deployed software. •Banking: Customer account data and financial transactions
•OLAP (Online Analytical Processing): Data analysis. •Customer relationship management (CRM) software: Names, phone
•SQLite: Relational database. numbers and addresses
•Oracle database: Advanced database management system. •Travel industry: Reservation data, ticket pricing information and dates
Unstructure
• Unstructured data lacks this predefined structure and can include
various forms like text, images, and audio, making it harder to search
and analyze directly. Most of the world's data is unstructured,
highlighting the need for specialized tools and techniques to manage
and extract insights from it.
• Definition: Data that doesn't conform to a predefined structure and
has no easily searchable format.
• Examples: Social media posts, images, videos, audio files, emails, text Pros of using unstructured data
documents. •Easier to store due to being in native format
• Characteristics: •Collecting and storing are faster
Lacks a predefined format or schema. •Cheaper to store unstructured data using data lakes
More difficult to search and analyze directly. •Provides more granular information
Often requires specialized tools and techniques for processing. Cons of using unstructured data
Can be stored in data lakes. •More complicated to work with
• Use Cases: Sentiment analysis, image recognition, fraud detection, •Requires highly specialized tools for organizing
personalized recommendations. •Expertise needed
• Benefits: Provides rich, contextual information and can reveal hidden Most popular tools for working with
patterns. unstructured data include:
• Drawbacks: Requires significant effort to process and extract •NoSQL (not only Structured Query
meaningful insights. Language) Database Management
System (non-tabular database)
•Ecommerce: Identify spending patterns and customer behavior •MongoDB •Video (WMV, MP4 and MOV)
•Healthcare: Determine treatment recommendations and •Apache Hadoop •Images (JPG, PNG and GIF)
forecast changes in a patient •Microsoft Azure •Aaudio (MP3, WAV and MPEG)
•Finance: Track markets and perform risk analysis •Amazon DynamoDB
Semistructure
Difference between each structure
Properties Structured data Unstructured data
Defined Undefined
Data Type relational non-relational
Data model Pre-defined/not flexible Not pre-defined/flexible Analysis • Classif ication • Data stacking
methods • Regression • Data mining
SQL NoSQL • Data clustering
Databases Relational databases Non-relational databases
Data nature Quantitative Qualitative • RDBMS • NoSQL DBMS
Tools and • CRM • Al-driven tools
Diverse structure for technologies • OLAP • Data storage
Organized information information
• OLTP architectures
Flexibility Not flexible flexible • Data visua lization tools
Storages Data ware houses Data lakes Business analysts Data scientists,
Specialists to Software engineers engineers, and ana lysts
Storage size less storage more storage handle data
Marketing analysts with deep expertise
Several formats
Formats A huge variety of formats
Uses Machine learning Natural language processing
Ease of search Easy to search Difficult to search and test mining
Source from online Examples SQL JPEG, DOC, PDF's, MOV, etc
Source from videos, emails,
Source location relational and tabular documents, social media, etc Vidéos, Images, text
SQL Commands
Types of SQL commands
There are 5 main types of commands Listed below:
DDL (Data Definition Language) commands
DML (Data Manipulation Language) commands
DCL (Data Control Language) commands
Transaction Control Language(TCL) commands
Data Query Language(DQL) commands
Data Definition Language (DDL)
CREATE Creates a new table, a view of a table, or other object in the database.
ALTER Modifies an existing database object, such as a table.
DROP Deletes an entire table, a view of a table or other objects in the database.
Transaction Control Language(TCL) commands TRUNCATE Truncates the entire table in a go.
All subsequent DML (Data Manipulation Language)
START TRANSACTION / BEGIN: statements
Data Manipulation Language (DML)
committed, the changes are irreversible and visible to other SELECT Retrieves certain records from one or more tables.
COMMIT: transactions. INSERT Creates a record.
changes made within the current transaction that have not UPDATE Modifies records.
ROLLBACK yet been committed DELETE Deletes records.
This allows for partial rollbacks, where you can undo changes
SAVEPOINT only up to a specific
Data Control Language (DCL)
This command rolls back the transaction to the specified GRANT Gives a privilege to user
ROLLBACK TO SAVEPOINT savepoint REVOKE Takes back privileges granted from user.
Contd…
Command Description
SELECT Retrieves data from one or more tables.
INSERT Adds new rows (records) to a table.
UPDATE Modifies existing data in a table.
DELETE Removes specific rows from a table.
CREATE TABLE Creates a new table in the database.
ALTER TABLE Modifies the structure of an existing table (e.g., add or remove columns).
DROP TABLE Permanently deletes a table and its data.
TRUNCATE Removes all rows from a table but keeps its structure intact.
WHERE Filters records based on a condition.
ORDER BY Sorts the result set in ascending or descending order.
GROUP BY Groups rows that have the same values in specified columns.
HAVING Filters grouped data (used with GROUP BY).
JOIN Combines rows from two or more tables based on a related column.
DISTINCT Removes duplicate values from the result set.
IN / BETWEEN /
Used for advanced filtering conditions.
LIKE
UNION Combines the result of two or more SELECT queries.
GRANT Gives user privileges or permissions.
REVOKE Removes user privileges.
COMMIT Saves all changes made in the current transaction.
ROLLBACK Undoes changes if something goes wrong in a transaction.
SAVEPOINT Sets a point in a transaction to roll back to if needed.
Data types:
CHAR(size): Stores fixed-length strings. If the string is shorter than size, it's padded with spaces.
VARCHAR(size) / NVARCHAR(size): Stores variable-length strings up to size. NVARCHAR is for Unicode Three SQL data types.
String Data types.
characters.
Numeric Data types.
TEXT / NTEXT: Stores very large strings. NTEXT is for Unicode. Date and time Data types.
CAST() or CONVERT() - You can convert numeric or date/time data to strings using functions like
CONCAT() function (widely supported): This function handles NULL values by treating them as empty strings, which can
be advantageous over operators.
|| operator (ANSI SQL standard, used in PostgreSQL, Oracle, SQLite):
+ operator (SQL Server).
INTEGER (INT, SMALLINT, BIGINT, TINYINT):
Used for storing whole numbers (integers) without fractional parts. Different variations (SMALLINT, BIGINT, TINYINT) offer varying
ranges and storage sizes to optimize space based on the expected magnitude of the numbers.
DECIMAL (DEC, NUMERIC):
Used for storing exact fixed-point numbers with a specified precision (total number of digits) and scale (number of digits after the
decimal point). This is crucial for maintaining accuracy in calculations involving decimal values.
Approximate Numeric Data Types: These types store numbers with approximate values, often used for scientific or engineering
calculations where a slight loss of precision is acceptable in exchange for a wider range or more efficient storage.
FLOAT (REAL, DOUBLE PRECISION): Used for storing floating-point numbers, which are represented in scientific notation. These
types offer a wider range of values but may introduce minor inaccuracies due to the nature of floating-point representation.
REAL typically refers to single-precision floating-point numbers, while DOUBLE PRECISION (or DOUBLE) refers to double-precision.
Constraints
Data Integrity
The following categories of data integrity exist with each RDBMS −
Entity Integrity − This ensures that there are no duplicate rows in a
table.
Domain Integrity − Enforces valid entries for a given column by
restricting the type, the format, or the range of values.
Referential integrity − Rows cannot be deleted, which are used by
other records.
User-Defined Integrity − Enforces some specific business rules that
do not fall into entity, domain or referential integrity.
Database Normalization
Database normalization is the process of efficiently organizing data
Constraints in a database. There are two reasons of this normalization process −
NOT NULL Ensures that a column cannot have a NULL value.
DEFAULT Provides a default value for a column when none is specified. Eliminating redundant data, for example, storing the same data in
UNIQUE Key Ensures that all the values in a column are different. more than one table.
PRIMARY KeyUniquely identifies each row/record in a database table. Ensuring data dependencies make sense.
FOREIGN KeyUniquely identifies a row/record in any other database table.
CHECK Ensures that all values in a column satisfy certain conditions. Third Normal Form is more than enough for a normal Database
INDEX Used to create and retrieve data from the database very quickly. Application.
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Install and Server connect
Thank you
Ref