Characteristics of The Data: Unprocessed, Unorganised and Discrete
Characteristics of The Data: Unprocessed, Unorganised and Discrete
- are any facts, numbers, letters or special symbols that can be processed by a
computer.
- unprocessed, unorganised and discrete.
2. Structure - The arrangement of and relations between the parts or elements of something
complex.
3. Data structure the organization of data.
- Is a specialized format for organizing and storing data.
- Is a group of data elements grouped together under one name. These data
elements, known asmembers, can have different types and
different lengths.
4. Algorithm a procedure to solve a problem.
-A specific and step-by-step set of instructions for carrying out a procedure or solving a problem,
usually with the requirement that the procedure terminate at some point.
5. Synthesis - The combination of ideas to form a theory or system.
6. Analysis - The process of separating something into its constituent elements.
- is the process of breaking a complex topic or substance into smaller parts to
gain a better understanding of it.
- Characteristics of the Data
- (i.) Measurement Level: Whether the variables are measured on a metric or non-metric
scale
(ii.) Number of Variables
- - data, like information, can be qualititative (opinion-based, subjective)
or quantitative (measurement-based, objective). The opinions of 1000 people
about a government policy would be qualitative. Rainfall measurements would
be quantitative.
- - data can be detailed or sampled. Detailed data would facts about every
occurrence of something (e.g. the weight of every packet of Twisties leaving the
factory). Sampled data would use typical measurements to represent the whole
(e.g. weighing every 100th packet of Twisties)
- - data can come in various forms: textual (e.g. names,
addresses), numeric (e.g. heights, ages), graphical (e.g. pictures of
faces),aural (e.g. Morse Code dots and dashes), visual (e.g. the individual
frames of a movie are data that are processed by the brain into moving picture
information when the frames are shown at 24 frames per second; fingerprints).
Data primarily needs to be understood for its two characteristics viz central tendency and
dispersion.
Measures of Central Tendency: Different types of data need different measures of central tendency. Some of the important measures,
commonly used are as follows:
Mean: This is most probably the arithmetic mean or simply the average of the data points involved. It
could also be the geometric or harmonic mean however that is unusual. This is the most popular
measure of central tendency. Many statistical techniques have evolved that use the mean as the
primary measure to understand the centrality of a given set of data points.
Median: If all the data points given in a particular data set were arranged in ascending or descending
order, the value in the centre is called the median. In case where data sets have an odd number of
elements like 7, the median is the 4th item because it has 3 data points on each side. In case the
number is even like 8, then the median is the average of 4th and 5th data point. Median is used where
there are outliers i.e. big numbers that impact the mean giving a false picture of the data involved.
Mode: This is the value of the most frequently occurring item in the data set. This is the value of the most
expected number to occur.
Measures of Dispersion: The degree of spread determines the probability and the level of confidence that one can
have on the results obtained from the measures of central tendency. Common measures of dispersion are as follows:
Range: The two endpoints between which all the values of a data set fall is called a range. It is important
because it exhaustively includes all the possibilities.
Quartiles: The data set is divided into 4 sets and the number of elements is each set is studied to give us
data about quartiles. Similar measures include the deciles and the percentiles. However quartiles
remain most widely used.
Standard Deviation: A complex formula is used to work out standard deviation of a given set of data.
However standard deviation is like the mean, it is the most important measure of dispersion and is used
exhaustively in almost every statistical technique.
Data are generally organized from the
smallest piece of data (a bit) to the
hierarchy of a database.
Building blocks of data hierarchy
1. bit (the smallest unit of data) has only two values - 1 or 0
2. bytes - 8 bits make up one byte, which represents one character like
the letter A
3. field (or in a database attribute), represents a combination of bytes that
make up one aspect of a business object (i.e. last name, invoice
number, age)
4. record - a collection of related data fields (i.e. name/address/phone
information for one student)
5. file (or in a database an entity) - a collection of related records (all
students in MIS213)
6. database - a group of similar items (all students and faculty in
Cameron School of Business)
Specific database terms
1. Entity - a collection of related records, (employees, inventory parts)
2. Attribute - a characteristic of a entity (like a field, an employee first
name)
3. Primary key - a field or fields that uniquely identify the record (social
security number, email address)
1 . H i e r a r c h y o f d a t a
Data Hierarchy
refers to the systematic organization of data, often in a hierarchical form. Dataorganization
involves fields, records, files and so on.
BIT
All data is stored in a computer's memory or storage devices in the form of binary digits
or bits. A bit can be either 'ON' of 'OFF' representing 1 or 0.
BYTE
is a group of 8 bits. One byte can represent one character or, in different contexts, other
data such as a sound, part of a picture etc.
FIELD
is a group of characters. e.g. data held about a person may be split into many fields
including ID Number, Surname, Initials, Title, Street, Town, etc
RECORD
is a group of fields holding all the information about one person or item
FILE
a collection of records. A stock file will contain a record for each item of stock, and so
on.
DATABASE
may consist of many different files, linked in such a way that information can be
retrieved from several files at once