UNIT 5
U N D E R S TA N D I N G
D ATA
D ATA C O L L E C T I O N
Data collection is the first step in data
processing, involving identifying or
gathering relevant data. In a grocery
store, sales data may need to be
digitized if recorded manually, or
processed directly if already in digital
format.
Data from various sources, like hospitals
or shopping malls, is continuously
generated and analyzed to improve
services, boost sales, or make informed
decisions, such as product placement or
D ATA S T O RA G E
After processing, data should be stored
for future use. Data storage, made easier
by the decreasing cost of devices like
HDDs, SSDs, and pen drives, involves
keeping data for later retrieval.
While files like images and documents
can be stored on computers, more
advanced storage and management can
be achieved through Database
Management Systems (DBMS).
D ATA
PROCESSING
Data is valuable for decision-making
but must be processed to be useful.
Raw data alone doesn’t lead to
conclusions; it requires processing and
analysis. Automated processing is used
in tasks such as online bill payments,
complaint registration, and ticket
booking, where data is verified and
results are generated.
S TAT I S T I C A L T E C H N I Q U E S F O R
D ATA P R O C E S S I N G
Measures of Variability
• Measures of variability, or
dispersion, show the
spread of values around
the mean and indicate
diversity within a data set.
Common measures include
Range and Standard
Deviation.
1.range
2.Standard deviation
RA N G E
The range is the difference between the
maximum and minimum values in a
data set, indicating the spread of data.
It's only applicable to numerical data
and is calculated by subtracting the
smallest value from the largest. While it
provides a measure of dispersion, the
range can be heavily influenced by
outliers.
S TA N D A R D D E V I AT I O N
Standard deviation measures the spread of data within a group by
considering all values in the data set, unlike the range, which only
considers the extremes. It is calculated as the square root of the
average of the squared differences between each value and the
mean. A smaller standard deviation indicates less spread in the data,
while a larger one indicates more spread. The standard deviation is
represented by the Greek letter σ (sigma).