Applied Statistics - Notes
Applied Statistics - Notes
Classes:
February 3
The exams are worth 8 points (each) and the exercise lists provided are worth 2.
points, each. The minimum average for approval is 6.
The exams have 06 questions, all with a weight of 1.6. The student can choose to answer
just 05 or answer all. In this case, the teacher will disregard the lowest question.
performance.
A prova do segundo bimestre (N2B) pode ser substituída por trabalho de pesquisa. O trabalho
it can be done by teams of up to 3 members. The written part will be worth 3 points and
presentation 5. The exercise list (2 points) will complete the grade.
Substitute exam: if the minimum average is not reached, the student will take a substitute exam.
with all the content, which will be worth 10 points and will replace the lowest bimonthly grade for
calculation of the average.
February 7th
Menu
1. Fundamental Concepts
2. Construction of tables and graphs (a)
3. Frequency distribution of data (b)
4. Measures of central tendency: mean, mode, median (c)
5. Separating measures
Observations:
Scientific notation: coefficient between 0<x<10 multiplied by a power of 10. Ex.: 2.5 x 105
(a) Tables: The rules of ABNT - Brazilian Association of Technical Standards must be observed.
Techniques. Example: in the tables, the sides are not closed, as happens in frames. In
the body of the tables does not mark lines; these are only allowed in the header.
Note: charts and tables are different things: charts are used to present
data that has not been measured. Example: class schedule. The data in a frame
are set by criteria of convenience.
Moda: indica a frequência de repetição de elementos. Ex.: (2, 2, 6, 8, 8, 10, 12). Moda = (2, 8)
February 10
Agenda - Continuation
Observations:
(a) The measures of dispersion verify the behavior of the data. Example 1: three students
from a classroom they get grades 2, 7, and 10. By calculating the arithmetic mean, we will have 6.3, that is,
The class average is above the minimum passing grade. However, we see that 1/3 of the
the class is doing poorly, despite this good average. Example 2: the per capita income of Brazilians revolves around
OVERVIEW
1. GENERAL CONSIDERATIONS
1.1 Definition
Statistics is the area of Mathematics applied to the study of collective phenomena, of wide
coverage in both applied sciences and market inferences.
It is the collection of methods for planning experiments, obtaining and organizing data, summarizing them,
analyze them, interpret them and draw conclusions from them (Triola, 1999)1.
The word statistics is associated with status: from Latin, state.
1.2 Method
It is the set of means organized in a way to achieve the desired end.
It is the most effective way to achieve a certain goal.
The methods are classified as: empirical or scientific.
1
The excerpts in blue are from the booklet 'Basic Statistics in Excel' by Bertolo or from the book.
Easy Statistics by Antônio Arnot Crespo.
interpretation of coefficients. This part is associated with average calculations,
variances, study of graphs, tables, etc.. It is the most well-known part.
Inductive or Inferential Statistics: corresponds to the analysis and interpretation of the
data, associated with a margin of uncertainty, whose methods that are
are based on probability theory. In it, we find the Estimation of
Parameters, Hypothesis Testing, Modeling, etc. Or still, it is the part of
statistics that aim to obtain and generalize conclusions for a
population based on a sample, through probability calculation.
1.3.1 Collection
Defined in the work planning what phenomenon is to be measured and its
causes of influences, measurement instruments are set up to quantify it.
This part of the research involves data collection that can be direct or indirect.
a) Direct: made on various records (births, marriages, imports,
school records, etc.). It is classified according to the time factor as: continuous,
periodic (censuses) or occasional.
b) Indirect: made based on information already measured through direct collection or by
knowledge of causes related to the phenomenon. Ex.: infant mortality.
1.3.2 Critique
Optional application stage, it encompasses the process of screening the instruments.
from the research in search of possible errors in its preparation, or still, in the verification of
tabulated data. It can be classified as internal or external, depending on the
purpose: errors in the instruments or in the data, respectively.
It aims to seek flaws and imperfections in order to avoid errors in the results.
Some common mistakes: leading questions, preservation of self-image (the
interviewed mind, thinking about self-protection), more samples, more questions (which
writing makes it difficult for the interviewee to fully understand.
February 17
OVERVIEW
1. GENERAL CONSIDERATIONS - continuation
Summarizing
tables
Critique of Analysis
Collection Organization Presentation
Data
graphics
One of the objectives of data analysis and interpretation is to seek a model for the
observations. These models can be essentially deterministic or non-deterministic
(probabilistic or stochastic).
In determinism, the conditions under which an experiment is conducted determine
the result of the experiment. Ex.: The current i can be determined by U/R (Ohm's Law)
Ohm) in an elementary resistive electrical circuit.
In non-deterministic models, a Probability Distribution is used. Ex.:
Parts are manufactured up to perfect pieces; the total number of
manufactured parts are counted. A distribution is used, in this case, the Geometric.
decision making.
2. FUNDAMENTAL CONCEPTS
2.1 Variables
It is commonly defined as the set of possible outcomes of a phenomenon. And
this can be classified as: qualitative or quantitative and also as parametric or non-parametric
parametric.
Variable
Non-parametric
Nominal
Qualitative
Ordinal
Variable
Discrete
Quantitative
Continuous
Variable Type
Estado: Perfeita ou defeituosa Nominal Qualitative
Quality: 1st, 2nd or 3rd category Ordinal Qualitative
Number of defective parts Discrete Quantitative
Diameter of the pieces Continuous Quantitative
The quantitative variable can be continuous when it takes any value between two limits (e.g.: weight,
height, measurements), or it can be discrete, when it can only take values belonging to a set
enumerable (e.g., number of children, counting in general). We denote the variables by the Latin letters, x, y, z,
etc. For example, let there be a population (or sample) {2,3,4,5,9}, denoting by x the variable related to
phenomenon that gave rise to the population of results above, we have: x∈ {2,3,4,5,9}
In general, measurements give rise to continuous variables and counts to discrete variables.
The population is the collection of all potential observations about a certain phenomenon or
about a set of individuals (having at least one common characteristic). The
Population is the entire set Universe, which can be finite or infinite.
Finite presents a limited number of observations, which can be counted.
Infinite - presents an unlimited number of observations that is impossible to count and
it is generally associated with processes.
February 21
OVERVIEW
2.2.2 Sample
It is a non-empty and finite subset, representative of the population.
Sampled population is the set of data that has been effectively observed or extracted. About the data from
a sample is that it develops studies, with the aim of making inferences about the population.
They must be chosen through appropriate processes that ensure randomness in the selection. It is called
we are demonstrating the process of collecting samples.
2.2.3 Parameter
It is a numerical characteristic obtained, that is, measured from all the elements of the
population.
2.2.4 Estimator
It is a numerical characteristic obtained, that is, measured from the sampled elements.
2.2.6 Role
It is an ordered sequence of raw data, and it can be either ascending or descending.
It is the arrangement of raw data in ascending (or descending) order.
3.2 Estimation
Understand the evaluation of the phenomenon from an estimator using for that calculation of
probabilities.[Related terms: margin of error, confidence level, etc.]
Through the table below we can verify the properties of these procedures.
survey.
Process of
Advantages Disadvantages
Survey
It's expensive, slow, almost always
Accepts zero procedural error
Census outdated and not always is
and has 100% reliability
viable.
Accepts positive procedural error
It is cheap, fast, updated and (greater than zero) and has
Estimation
always viable.
reliability lower than 100%
Notes:
Statistically, the precision of a numerical value is assessed through the binomial:
trust and procedural error.
Surveying is the scientific study of a part of a population with the aim of
to study attitudes, habits, and preferences of the population regarding events
circumstances and matters of common interest.
4. SAMPLING TECHNIQUES
When research is done by estimation, it is necessary to use some procedures in the
selection of the elements from the population that will make up the sample; these procedures
They are called Sampling Techniques.
Sampling is the starting point (in practice) for an entire Statistical Study. This technique
especially for sampling ensures, as much as possible, randomness in the choice of
elements that will compose the sample. Thus, each element of the population comes to have the
mesma chance de ser escolhido, o que garante à amostra o caráter de representatividade e
this is very important since the conclusions regarding the population will be based
in the results obtained from the samples of this population.
Intentional
Non-Probabilistic
Quotas or proportional
Disproportionate
Types of Sampling
Simple Random
Conglomerate
5. STATISTICAL SERIES
It is the name given to the tables that present the distribution of the observed data of
phenomenon in function: of the time, of the place or of the species. And this is called,
respectively, historical, geographical or specific series.
One of the objectives of Statistics is to summarize the values that one or more variables can take.
to assume, so that we have a global view of the variation of this or these variables. And this
she can, initially, present these values in tables and graphs.
5.1 Table
It is a framework that summarizes a set of observations.
A table consists of:
There are still the complementary elements of the table to consider, which are the source and the notes.
and the calls, preferably placed in your footer.
A table is a small board or frame made up of rows and columns, summarizing a set of
observations. The construction of a table depends on the collected data that will be summarized and
arranged in tabular form, which means they are placed in series and presented in
charts or tables. A table is the graphical arrangement of series according to a specific order.
classification.
{
(1,000 t)
1991 2.535
1992 2.666 { célula
1993 2.122
Body 1994 3.750 {linha
1995 2007
Source: IBGEfooter
_____________________________
Specific or categorical series, whose data are arranged according to specific items or categories,
at a certain time and place.
Time Series or Chronological - describe the values of the variable, at a certain location, detailed
according to variable time intervals.
March 3rd
OVERVIEW
6. GRAPH CONSTRUCTION
They are resources used to present statistical series quickly and dynamically, without loss.
of scientific rigor.
The main types of charts are diagrams, cartograms, and pictograms. Next
we will see the main diagrams.
This type of graph uses a polygonal line to represent the statistical series. The graph
in line constitutes an application of the process of representing functions in a system of
Cartesian coordinates. As we know, in this system we make use of the two lines.
perpendicular: the lines are the coordinate axes and the intersection point, the origin. The axis
the horizontal is called the x-axis (or abscissa) and the vertical, the y-axis (or ordinate)
y).
Example:
80.0
70,0
60.0
50.0
40.0
30.0
20.0
10.0
0,0
1987 1988 1989 1990 1991 1992
Example:
1.500
1.250
1.000
750
500
250
0
SP MG RS ES PR SC
STATES
Bar Chart
SC
PR
ES
RS
MG
SP
Sector Chart
119,39o
Minas Gerais
197,27o
Holy Spirit
18,09o Rio de Janeiro
25,24o
São Paulo
Exercises extracted from the book Easy Statistics by the author Antônio Arnot Crespo.
1. Complete: The experimental method is the most used by sciences such as:...
2. Human and social sciences, to obtain the data they seek, resort to what
method?
3. What is Statistics?
10. Cite three or more activities in business planning where Statistics is used.
necessary.
_______________________
Answers
1. Complete: The experimental method is the most used by sciences such as: Chemistry, Physics,
Biology, etc.
2. The human and social sciences, in order to obtain the data they seek, make use of what
method?
Although more difficult and less accurate, the statistical method.
3. What is Statistics?
Statistics is a part of Applied Mathematics that studies methods for collecting,
organization, description, analysis and interpretation of data. All of your study aims, among
others, the decision-making.
Statistics is a set of quantitative methods and processes used to study and
measure collective phenomena.
4. Cite the phases of the statistical method.
Data Collection, Data Critique, Data Compilation, Exposure or Presentation of
Data and Analysis of Results.
5. What does it mean to collect data for you?
Collecting data is obtaining information from the studied population related to the phenomenon that
if you want to verify.
The collection is direct when carried out on mandatory record information elements.
(births, marriages, and deaths, import and export of goods), elements
pertinentes aos prontuários dos alunos de uma faculdade ou, ainda, quando os dados são
collected by the researcher themselves through surveys and questionnaires, as is the case
of verification and examination notes, from the demographic census, etc.
The data collection can be classified in relation to the time factor as:
a. continuous (record) – when done continuously, such as that of births and deaths and the
student attendance in classes;
b. periodic - when done at constant time intervals, like censuses (every 10 years)
10 years) and the periodic evaluations of the students;
10. Cite three or more activities of business planning in which Statistics is involved.
necessary.
We can know the geographical and social reality, the natural and human resources, and
available finances, the community's expectations about the company, and establish
your goals, your objectives with a greater likelihood of being achieved in the short, medium or long term.
long term. In the selection and organization of the strategy to be adopted in the venture,
still, in the choice of techniques for verifying and evaluating the quantity and quality of
product and even of possible profits and/or losses.
____________________________
Proposed Exercises
1. Classify the variables as qualitative or quantitative (continuous or discrete)
a. População (ou Universo): alunos de uma escola. Variável: cor dos cabelos
b. População: casais residentes em uma cidade Variável: nº de filhos
c. População: as jogadas de um dado Variável: o ponto obtido em cada jogada
d. Population: pieces produced by a certain machine. Variable: number of pieces produced by
hour
e. População: peças produzidas por certa máquina. Variável: diâmetro externo
2. Diga quais das variáveis abaixo são discretas e quais são contínuas:
a. População (ou Universo): crianças de uma cidade. Variável: cor dos olhos
b. População: dados de uma estação meteorológica de uma cidade. Variável: precipitação
rainfall, during the year
c. Population: data from the São Paulo Stock Exchange - IBOVESPA. Variable: number of
traded shares
d. População: funcionários de uma empresa. Variável: salários
e. População: pregos produzidos por uma máquina. Variável: comprimento
f. População: casais residentes numa cidade. Variável: sexo dos filhos
g. População: propriedades agrícolas do Brasil. Variável: produção de soja
h. População: segmentos de reta. Variável: comprimento
i. População: bibliotecas de Catanduva. Variável: nº de volumes
j. Population: devices produced on an assembly line. Variable: number of defects per
unit
k. População: indústrias de uma cidade. Variável: retorno sobre o capital próprio empregado
______________________________
Answers
1. Classify the variables as qualitative or quantitative (continuous or discrete)
a. População (ou Universo): alunos de uma escola. Variável: cor dos cabelos -qualitativa
b. População: casais residentes em uma cidade Variável: nº de filhos -quantitativa discreta
c. População: as jogadas de um dado Variável: o ponto obtido em cada jogada- quantitativa
discrete
d. Population: pieces produced by a certain machine. Variable: number of pieces produced by
hour - discrete quantitative
e. População: peças produzidas por certa máquina. Variável: diâmetro externo -quantitativa
continuous
2. Diga quais das variáveis abaixo são discretas e quais são contínuas:
a. População (ou Universo): crianças de uma cidade. Variável: cor dos olhos- contínua
b. População: dados de uma estação meteorológica de uma cidade. Variável: precipitação
rainfall, throughout the year - continuous
c. Population: data from the São Paulo Stock Exchange - IBOVESPA. Variable: number of
traded actions - discreet
d. População: funcionários de uma empresa. Variável: salários- discreta
e. População: pregos produzidos por uma máquina. Variável: comprimento- contínua
f. População: casais residentes numa cidade. Variável: sexo dos filhos- discreta
g. População: propriedades agrícolas do Brasil. Variável: produção de soja- discreta
h. População: segmentos de reta. Variável: comprimento- contínua
i. População: bibliotecas de Catanduva. Variável: nº de volumes- discreta
j. Population: devices produced on an assembly line. Variable: number of defects per
discrete unit
k. População: indústrias de uma cidade. Variável: retorno sobre o capital próprio empregado
continuous
____________________________
Proposed Exercises
{"population":"What is population? And sample?"}
Population is the set of entities that share at least one common characteristic.
A sample is a finite, non-null, and representative subset of a population.
2. What is the difference between parameter and estimator?
_______________________________
Answers
1. What is a population? And a sample?
In the parameter, the numerical characteristic is obtained from all the elements of the
population, while in the estimator, only elements of the sample are used.
3. What is the difference between raw data and a list?
Raw data is the set of unorganized numerical data obtained directly from
observation of a collective phenomenon. Roll is the arrangement of raw data in ascending order or
decreasing.