0% found this document useful (0 votes)

28 views18 pages

Lecture 2.3.11

Uploaded by

hw941885

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views18 pages

Lecture 2.3.11

Uploaded by

hw941885

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

APEX INSTITUTE OF TECHNOLOGY

(CSE)
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Predictive Analytics (21CSH-340)

Faculty: Dr. Jitender Kaushal
Associate Professor
(E14621)
DISCOVER . LEARN . EMPOWER
SQL Server – How to handle NULL data 1
COURSE OBJECTIVES
Course Objectives
 Understand how analytics provided a solution to industries using real case
studies.
 Explain modeling, relationships, derive and reclassify fields, integrate and collect
data.
 Review and explore data to model.

2
COURSE OUTCOMES
On completion of this course, the students shall be
able to

Unit-2 To perform data collection and initial data handling by CO3

managing data structures and measurement levels.
Analyzing and transforming data for predictive modeling CO4
through various data transformation techniques.

3
The Problem with NULL
 There is a problem with NULL that has persisted since the
Relational
Model was proposed in the 1970’s.
 “The simple scientific fact is that an SQL table that contains a null isn’t a relation; thus,
relational theory doesn’t apply, and all bets are off. ” C.Date (2014).

 The presence of NULLs in a database ‘breaks’ the relational

model of
Boolean expressions on which SQL databases rely.

 In ‘real world’ applications of data structures NULLs are

often unavoidable.

 It confuses users, and designers and DBA’s hate it.

Three Valued Logic
 The SQL language is based on Relational Logic.

 Adding NULL Values to a database breaks the TRUE/ FALSE relations

implicit in the
model and leads to ‘TRUE’, ‘FALSE’ and ‘UNKNOWN’

 At best this leads to increased complexity by having to use horizontally

decomposed WHERE clauses, workaround syntax and inference.

 At worst leads to incomplete information, returned error codes,

interoperability problems, interpretation problems.

 Messes up Reporting, ETL, Business Intelligence and Data Science

initiatives.
Domain Knowledge
 SQL Databases are modelled as domains, and as such the designer

needs to be able to define what the domain encompasses by

defining the boundaries, identifying components and relationships.

 Almost by definition the designer will have incomplete

information about the information that is relevant, especially

when implementing new systems.

 NULL is stored as a flag, therefore is not part of any particular

domain or type and in making assumptions about NULLs is where

query complexity is introduced.
Null is not a value, it is not zero, it is unknown
NULL Data Scenarios
 Existence -Attribute does not exist in the domain, or domain understanding is
wrong. E.g eye
colour for a car.

 Missing – The information has not been given at the time a row was created. E.g. A

customer my decline to give their age.

 Not Yet – Data is contingent upon an unknown event in the future, E.g. Termination

date or Date of death.

 Does not apply- Is not applicable for this instance of a record . E.g. Hair colour for
bald people,
Number of pregnancies for male patients.

 Placeholders – Indicates that we know that a bit of data exists, but we don’t know
Handling NULL in Queries
 NULLIF
 Syntax: NULLIF (expression, expression)
 Returns NULL if both expressions are equal, else returns the first expression.

 ISNULL to check the state of a field

 Syntax: ISNULL (check_expression, replacement_value)
 Returns replacement value that must be implicitly convertible to check
expression data type.

 COALESCE to use the first non-null field.

 Syntax: COALESCE( exp1, exp2,…expn)
 Can use multiple input expressions.
 Returns the datatype of the expression with highest precedence.
 Slower than ISNULL.
Handling NULL in WHERE
Clauses
 Using Three Value Logic (True,False,Unknown). UNKNOWN is the logical
outcome and is
not the same as NULL.

 To compare values we have to use the IS NULL and IS NOT NULL operators in
the
WHERE clause, not the = operator.

 IS NULL
 SELECT * From Customers WHERE CustName IS NULL

 IS NOT NULL
 SELECT * From Customers WHERE CustName IS NOT NULL

 Use Horizontal Decomposition to add other conditionals


Environment and
Aggregate Settings

 ANSI_NULLS environment setting.

 When creating or altering stored procedures or User Defined Functions.
 This option specifies the setting for ANSI NULL comparisons. When this is on, any
query that compares a value with a null returns a 0. When off, any query that
compares a value with a null returns a null value.
 Keep at default value of ON.

 “Null value is eliminated by an aggregate or other SET

operation”
 Trying to do arithmetic or other operation on fields that contain NULLS.
 May lead to incomplete information returned.
 Use ISNULL or NULLIF to prevent this happening.
Table Design
Guidelines
 Design Integrity into your tables.

 Use NOT NULL CHECK() constraints where possible.

 Do not use as a Primary key if there is ANY possibility

that value could be NULL.

 Avoid in FOREIGN KEY relationships
 Consider using de-normalised separate tables to get around this.

 Use default field values where appropriate.

 Bear in mind arithmetic consequences of using 0, -99 as defaults.
App Design Guidelines
 Take steps to avoid NULL values from host
programs.
 Initialisation of variables
 Use defaults and appropriate auto filling of variable
values
 Deduce values
 Track missing data using companion codes
 Determine the impact of missing data
 Validate data and prevent audit difficulties
 Use consistent datatypes and nullability across apps
ETL Guidelines
 Where multiple fields may contain NULL, consider using

a check code field to indicate where records need

attention.
 Check the NULL status of each field using ISNULL

(Field,0) and build a count of the number of fields that

fail validation.
 Use this as part of the data cleansing process.

 Use in ETL and Scrubbing tables

Master Data Management
Management is about catching the information before it enters the
database and
cleaning up what is already there.
 MDS
 Master Data Services is included as part of SQL Server.
 Allows Models, Entities, Attributes, Rules and Versions to be defined and
implemented.
 Includes Excel add in. Allows Power users or analysts to define models and
rules.

 DQS
 To help ensure domain validity and knowledge driven data quality.
 Good for data correction, enrichment, standardization, and de-duplication.

 Other third party applications available

 Master Data Maestro, etc.
Performance Considerations
Time spent in designing appropriate data quality controls
will reduce the cost of maintaining the database because

 NULL slows down the working of indexes.

 Increases query retrieval times.
 Increase search times.
 Increases SQL code complexity.
 Decreases the confidence in the information gained
from the database.
Data Science
NULLS may provide the catalyst for development of data science to
discover why,
or what data or domain knowledge we are lacking.

 Null indicates value is not known or indicates missing or incomplete

data.

 May point to missing entities or uncaptured events.

 May skew the results of data tools that disregard NULL values.

 Known knowns, Unknown knowns and Unknown unknowns. Data

discovery and Knowledge begin by examining what it is that is

Summary
 The presence of NULL values in SQL databases has always happened, but this
degrades the
quality of the information that can be obtained from the data source.

 NULLs can have an adverse effect on downstream systems, in particular

Reporting, BI, Predictive Analytics or Machine Learning that rely on the

integrity of the data.

 Reduce the impact on your information by:

 Manage the quality of data going in
 Design tables with integrity constraints
 Design apps to validate the input
 Design queries to ensure correct results are returned

 Use NULLs as clues to pick up where domain knowledge is lacking.

THANK YOU

For queries
Email: [email protected]

What Is A Null Value
No ratings yet
What Is A Null Value
5 pages
Working With Nulls in Expressions: Chapter 3 - Video # 2
No ratings yet
Working With Nulls in Expressions: Chapter 3 - Video # 2
10 pages
Understanding NULLs in SQL
No ratings yet
Understanding NULLs in SQL
8 pages
What Is Null
No ratings yet
What Is Null
7 pages
Using Built-In Functions
No ratings yet
Using Built-In Functions
22 pages
SQL Case Statements for NULL Handling
No ratings yet
SQL Case Statements for NULL Handling
8 pages
2778a 02
No ratings yet
2778a 02
35 pages
Querying and Filtering Data
No ratings yet
Querying and Filtering Data
32 pages
SQL Server Built-In Functions Guide
No ratings yet
SQL Server Built-In Functions Guide
29 pages
Dbms Unit III 1 Part
No ratings yet
Dbms Unit III 1 Part
14 pages
SQL Data Querying and Filtering Guide
No ratings yet
SQL Data Querying and Filtering Guide
35 pages
SQL Advanced Queries
No ratings yet
SQL Advanced Queries
69 pages
English Assignment: Data Structures & Null Values
No ratings yet
English Assignment: Data Structures & Null Values
70 pages
Unit II
No ratings yet
Unit II
53 pages
SQL WHERE Clause Filtering Explained
No ratings yet
SQL WHERE Clause Filtering Explained
2 pages
Lesson 4 Programming
No ratings yet
Lesson 4 Programming
34 pages
Views in SQL, Data Defination Languge
No ratings yet
Views in SQL, Data Defination Languge
18 pages
Informaldesignguidelinesforrelationschemas 240831001812 622a78a1
No ratings yet
Informaldesignguidelinesforrelationschemas 240831001812 622a78a1
34 pages
Explicit Sets & Nulls
No ratings yet
Explicit Sets & Nulls
10 pages
15IT0YA - Database Management Systems: Sri Vinitha V Assistant Professor Department of IT
No ratings yet
15IT0YA - Database Management Systems: Sri Vinitha V Assistant Professor Department of IT
23 pages
Null Values: CHAPTER 5 (6/E) CHAPTER 8 (5/E)
No ratings yet
Null Values: CHAPTER 5 (6/E) CHAPTER 8 (5/E)
13 pages
SQL Database Management Guide
No ratings yet
SQL Database Management Guide
12 pages
SQL Joins and NULL Handling Guide
No ratings yet
SQL Joins and NULL Handling Guide
99 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
7 pages
Introduction To TSQL Unit 3: Modern Business Technology
No ratings yet
Introduction To TSQL Unit 3: Modern Business Technology
52 pages
SQL Explicit Sets and NULL Handling
No ratings yet
SQL Explicit Sets and NULL Handling
10 pages
Understanding NULL and Nothingness in QlikView
No ratings yet
Understanding NULL and Nothingness in QlikView
17 pages
3VL Approach
No ratings yet
3VL Approach
31 pages
Module 4 - Normalization
No ratings yet
Module 4 - Normalization
141 pages
Kroenke Dbp16e Chapter 4
No ratings yet
Kroenke Dbp16e Chapter 4
31 pages
SQL Date Functions Guide
No ratings yet
SQL Date Functions Guide
30 pages
45 Essential SQL Interview Questions and Answers
0% (1)
45 Essential SQL Interview Questions and Answers
22 pages
Scanario Based SQL
No ratings yet
Scanario Based SQL
36 pages
ITP15 Chapter 1
No ratings yet
ITP15 Chapter 1
34 pages
Comparing Null Values in SQL Server
No ratings yet
Comparing Null Values in SQL Server
2 pages
Advanced SQL Techniques by Bishnu Gautam
No ratings yet
Advanced SQL Techniques by Bishnu Gautam
21 pages
SQL Constraints and Functions Guide
No ratings yet
SQL Constraints and Functions Guide
11 pages
Abend
No ratings yet
Abend
7 pages
W4 DBMS Chapter07
No ratings yet
W4 DBMS Chapter07
38 pages
Querying With T-SQL - 01
No ratings yet
Querying With T-SQL - 01
24 pages
Db2 SQL Tuning
No ratings yet
Db2 SQL Tuning
26 pages
3 Notes of 3 Unit
No ratings yet
3 Notes of 3 Unit
36 pages
SQL UNION and JOIN Clauses Explained
No ratings yet
SQL UNION and JOIN Clauses Explained
32 pages
SQL Interview Questions Guide
No ratings yet
SQL Interview Questions Guide
13 pages
DP 5 2 Practice
No ratings yet
DP 5 2 Practice
4 pages
Dbms Unit II
No ratings yet
Dbms Unit II
101 pages
Ex11-Dealing With NULL Values
No ratings yet
Ex11-Dealing With NULL Values
5 pages
DBMS Cs-It
100% (1)
DBMS Cs-It
28 pages
Unit 3 DB
No ratings yet
Unit 3 DB
22 pages
Module 4
No ratings yet
Module 4
30 pages
Using XMLType in Oracle9i
No ratings yet
Using XMLType in Oracle9i
52 pages
SSMS Walkthrough Tutorial
No ratings yet
SSMS Walkthrough Tutorial
30 pages
Data Cleaning in SQL
100% (1)
Data Cleaning in SQL
21 pages
Database Best Practices for Developers
No ratings yet
Database Best Practices for Developers
8 pages
Jeffrey A. Hoffer, V. Ramesh, Heikki Topi: Modern Database Management 12 Edition
No ratings yet
Jeffrey A. Hoffer, V. Ramesh, Heikki Topi: Modern Database Management 12 Edition
32 pages
Data Cleaning
No ratings yet
Data Cleaning
21 pages
DBMS Lab 3 Tasks
No ratings yet
DBMS Lab 3 Tasks
2 pages
Telephonic Interview Preparation Guide
No ratings yet
Telephonic Interview Preparation Guide
4 pages
Assignment Set 1 Dbms
No ratings yet
Assignment Set 1 Dbms
10 pages
IT Unit 3
No ratings yet
IT Unit 3
47 pages
Swiggy (BA) - Technical Interview (
No ratings yet
Swiggy (BA) - Technical Interview (
5 pages
SQL - Roadmap
No ratings yet
SQL - Roadmap
5 pages
Customer Churn Prediction in Banking
No ratings yet
Customer Churn Prediction in Banking
5 pages
Nikhil Kumar: Senior Big Data Engineer
No ratings yet
Nikhil Kumar: Senior Big Data Engineer
7 pages
Salesforce Questions
No ratings yet
Salesforce Questions
19 pages
Data Science Components
No ratings yet
Data Science Components
4 pages
Db2 Mock Test III
No ratings yet
Db2 Mock Test III
6 pages
IT Director at Honeywell Pune
No ratings yet
IT Director at Honeywell Pune
1 page
Abhiram Kanumilli - Informatica Developer
No ratings yet
Abhiram Kanumilli - Informatica Developer
7 pages
Applies To:: OBIEE 12c: How To Enable Usage Tracking? (Doc ID 2366978.1)
100% (1)
Applies To:: OBIEE 12c: How To Enable Usage Tracking? (Doc ID 2366978.1)
6 pages
Understanding Database Management Systems
No ratings yet
Understanding Database Management Systems
42 pages
AWS & Azure Cloud Data Engineering Expert
No ratings yet
AWS & Azure Cloud Data Engineering Expert
6 pages
Difference Between File System and DBMS
No ratings yet
Difference Between File System and DBMS
1 page
Computer Architecture Presentation: Topic: Big Data
No ratings yet
Computer Architecture Presentation: Topic: Big Data
11 pages
Class Xii (Informatics Practices) Half Yearly QP Chennai Region
No ratings yet
Class Xii (Informatics Practices) Half Yearly QP Chennai Region
4 pages
Computer Practical Examination
No ratings yet
Computer Practical Examination
4 pages
Major Assignment: Sample Tables
No ratings yet
Major Assignment: Sample Tables
3 pages
Uzima Company
No ratings yet
Uzima Company
7 pages
Enterprise Data Integration Guide
No ratings yet
Enterprise Data Integration Guide
35 pages
PG - MCA - Computer Applications - 315 24 - RDBMS LAB
No ratings yet
PG - MCA - Computer Applications - 315 24 - RDBMS LAB
84 pages
DBMS Important Questions MSBTE
No ratings yet
DBMS Important Questions MSBTE
4 pages
IT Application Tools in Business
No ratings yet
IT Application Tools in Business
34 pages
Record Book 24-25 (Nn12 - 10) Final
No ratings yet
Record Book 24-25 (Nn12 - 10) Final
54 pages
W7D2 - Redis POC With AWS Setup & Spring (18june)
No ratings yet
W7D2 - Redis POC With AWS Setup & Spring (18june)
7 pages
Amazon RDS
No ratings yet
Amazon RDS
9 pages
Understanding Database Management Systems
No ratings yet
Understanding Database Management Systems
46 pages

Lecture 2.3.11

Uploaded by

Lecture 2.3.11

Uploaded by

APEX INSTITUTE OF TECHNOLOGY

Predictive Analytics (21CSH-340)

Unit-2 To perform data collection and initial data handling by CO3

 The presence of NULLs in a database ‘breaks’ the relational

 In ‘real world’ applications of data structures NULLs are

 It confuses users, and designers and DBA’s hate it.

 Adding NULL Values to a database breaks the TRUE/ FALSE relations

 At best this leads to increased complexity by having to use horizontally

decomposed WHERE clauses, workaround syntax and inference.

 At worst leads to incomplete information, returned error codes,

interoperability problems, interpretation problems.

 Messes up Reporting, ETL, Business Intelligence and Data Science

needs to be able to define what the domain encompasses by

 Almost by definition the designer will have incomplete

information about the information that is relevant, especially

 NULL is stored as a flag, therefore is not part of any particular

domain or type and in making assumptions about NULLs is where

customer my decline to give their age.

date or Date of death.

 ISNULL to check the state of a field

 COALESCE to use the first non-null field.

 Use Horizontal Decomposition to add other conditionals

 ANSI_NULLS environment setting.

 “Null value is eliminated by an aggregate or other SET

 Use NOT NULL CHECK() constraints where possible.

 Do not use as a Primary key if there is ANY possibility

that value could be NULL.

 Use default field values where appropriate.

a check code field to indicate where records need

(Field,0) and build a count of the number of fields that

 Use in ETL and Scrubbing tables

 Other third party applications available

 NULL slows down the working of indexes.

 Null indicates value is not known or indicates missing or incomplete

 May point to missing entities or uncaptured events.

 Known knowns, Unknown knowns and Unknown unknowns. Data

discovery and Knowledge begin by examining what it is that is

 NULLs can have an adverse effect on downstream systems, in particular

Reporting, BI, Predictive Analytics or Machine Learning that rely on the

 Reduce the impact on your information by:

 Use NULLs as clues to pick up where domain knowledge is lacking.

You might also like