0% found this document useful (0 votes)
5 views15 pages

Statistic

The document introduces statistics as a methodology for collecting, analyzing, and interpreting numerical data, emphasizing its importance in economics and social sciences. It distinguishes between descriptive and inferential statistics, highlighting the significance of representative sampling and the scales of measurement. Additionally, it categorizes data into categorical and quantitative types, explaining their implications for statistical analysis.

Uploaded by

dandenatamiru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views15 pages

Statistic

The document introduces statistics as a methodology for collecting, analyzing, and interpreting numerical data, emphasizing its importance in economics and social sciences. It distinguishes between descriptive and inferential statistics, highlighting the significance of representative sampling and the scales of measurement. Additionally, it categorizes data into categorical and quantitative types, explaining their implications for statistical analysis.

Uploaded by

dandenatamiru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Chapter one

Introduction to Statistics, Data and


Statistical Thinking
What is Statistics?
• In common usage people think of statistics as
numerical data—the unemployment rate last
month, total government expenditure last year, and
so forth.
• Although there is nothing wrong with viewing
statistics in this way, we are going to take a deeper
approach.
• We will view statistics the way professional
statisticians view it—as a methodology for
collecting, classifying, summarizing, organizing,
presenting, analyzing and interpreting numerical
information.
The Use of Statistics in Economics and Other Social
Sciences
• Businesses use statistical methodology and thinking
to make decisions about which products to
produce, how much to spend advertising them,
how to evaluate their employees, and nearly every
aspect of running their operations.
• The motivation for using statistics in the study of
economics and other social sciences is somewhat
different.
• The object of the social sciences and of economics
in particular is to understand how the social and
economic system functions.
• Views and understandings of how things work are called theories.
• They are composed of two parts—a logical structure which is tautological (tha
is, true by definition), and a set of parameters in that logical structure which
gives the theory empirical content (that is, an ability to be consistent o
inconsistent with facts or data).
• The logical structure, being true by definition, is uninteresting, except insofar as
it enables us to construct testable propositions about how the economic system
works. If the facts turn out to be consistent with the testable implications of the
theory, then we accept the theory as true until new evidence inconsistent with
it is uncovered.
• A theory is valuable if it is logically consistent both within itself and with othe
theories established as “true” and is capable of being rejected by, bu
nevertheless consistent with, available evidence.
• Its logical structure is judged on two grounds—internal consistency and
usefulness as a framework for generating empirically testable propositions. To
illustrate this, consider the statement: “People maximize utility.” This statemen
is true by definition—behavior is defined as what people do and utility is
defined as what people maximize when they choose to do one thing rather than
something else.
• These definitions and the associated utility maximizing approach form a usefu
• One can choose the parameters in this tautological utility maximization
structure so that the marginal utility of a good declines relative to the
marginal utility of other goods as the quantity of that good consumed
increases relative to the quantities of other goods consumed.
Downward sloping demand curves emerge, leading to the empirically
testable statement: “Demand curves slope downward.” This theory of
demand (which consists of both the utility maximization structure and
the proposition about how the individual’s marginal utilities behave)
can then be either supported or falsified by examining data on prices
and quantities and incomes for groups of individuals and commodities.
• The set of tautologies derived using the concept of utility maximization
are valuable because they are internally consistent and generate
empirically testable propositions such as those represented by the
theory of demand. If it didn’t yield testable propositions about the real
world, the logical structure of utility maximization would be of little
interest.
• Alternatively, consider the statement: “Canada is a wonderful country.”
This is not a testable proposition unless we define what we mean by
the adjective “wonderful”.
• Statistics is the methodology we use to confront theories like the theory
of demand and other testable propositions with the facts.
• It is the set of procedures and intellectual processes by which we decide
whether or not to accept a theory as true—the process by which we
decide what and what not to believe. In this sense, statistics is at the
root of all human knowledge.
• Unlike the logical propositions contained in them, theories are never
strictly true. They are merely accepted as true in the sense of being
consistent with the evidence available at a particular point in time and
more or less strongly accepted depending on how consistent they are
with that evidence.
• Given the degree of consistency of a theory with the evidence, it may or
may not be appropriate for governments and individuals to act as though
it were true. A crucial issue will be the costs of acting as if a theory is
true when it turns out to be false as opposed to the costs of acting as
though the theory were not true when it in fact is.
• As evidence against a theory accumulates, it is eventually rejected in
favor of other “better” theories—that is, ones more consistent with
• Statistics, being the set of analytical tools used to
test theories, is thus an essential part of the
scientific process.
• Theories are suggested either by casual
observation or as logical consequences of some
analytical structure that can be given empirical
content.
• Statistics is the systematic investigation of the
correspondence of these theories with the real
world. This leads either to a wider belief in the
‘truth’ of a particular theory or to its rejection as
inconsistent with the facts.
Descriptive and Inferential Statistics
• The application of statistical thinking involves two sets of processes. First,
there is the description and presentation of data. Second, there is the process
of using the data to make some inference about features of the environment
from which the data were selected or about the underlying mechanism that
generated the data.
• The first is called descriptive statistics and utilizes numerical and graphical
methods to find patterns in the data, to summarize the information it reveals
and to present that information in a meaningful way. The second, Inferential
statistics uses data to make estimates, decisions, predictions, or other
generalizations about the environment from which the data were obtained.
• Statistical inference essentially involves the attempt to acquire information
about a population or process by analyzing a sample of elements from that
population or process.
• A population includes the set of units that we are interested in learning about.
For example, we could be interested in the effects of schooling on earnings in
later life, in which case the relevant population would be all people working.
• A sample is a subset of the units comprising population. Because it is costly
to examine most populations of interest, and impossible to examine the
entire output of a process, statisticians use samples from populations and
processes to make inferences about their characteristics.
• Obviously, our ability to make correct inferences about a population based
on a sample of elements from it depends on the sample being
representative of the population. So the manner in which a sample is
selected from a population is of extreme importance.
• An example of the importance of representative sampling occurred in the
1948 presidential election in the United States. The Democratic incumbent,
Harry Truman, was being challenged by Republican Governor Thomas Dewey
of New York. The polls predicted Dewey to be the winner but Truman in fact
won.
• To obtain their samples, the pollsters telephoned people at random,
forgetting to take into account that people too poor to own telephones also
vote. Since poor people tended to vote for the Democratic Party, a sufficient
fraction of Truman supporters were left out of the samples to make those
samples unrepresentative of the population. As a result, inferences about the
proportion of the population that would vote for Truman based on the
proportion of those sampled intending to vote for Truman were incorrect.
• A process is a mechanism that produces output. We might
be interested in the effects of drinking on driving, in which
case the underlying process is the on-going generation of
car accidents as the society goes about its activities. Note
that a process is simply a mechanism which, if it remains
intact, eventually produces an infinite population.
• Finally, when we make inferences about the characteristics
of a population/process based on a sample, we need some
measure of the reliability of our method of inference.
• What are the odds that we could be wrong.
• We need not only a prediction as to the characteristic of the
population of interest (for example, the proportion by which
the salaries of college graduates exceed the salaries of those
that did not go to college) but some quantitative measure of
the degree of uncertainty associated with our inference.
Data sets
• Data are the facts and figures collected, analyzed, and
summarized for presentation and interpretation.
• All the data collected in a particular study are referred to
as the data set for the study.
• Elements are the entities on which data are collected.
• A variable is a characteristic of interest for the elements.
• Measurements collected on each variable for every
element in a study provide the data set. The set of
measurements obtained for a particular element is
called an observation.
Scales of Measurement
• The scale of measurement determines the amount of information
contained in the data and indicates the most appropriate data
summarization and statistical analyses.
• When the data for a variable consist of labels or names used to identify
an attribute of the element, the scale of measurement is considered a
nominal scale (sex)
• The scale of measurement for a variable is called an ordinal scale if the
data exhibit the properties of nominal data and the order or rank of the
data is meaningful (service).
• The scale of measurement for a variable is an interval scale if the data
have all the properties of ordinal data and the interval between values is
expressed in terms of a fixed unit of measure. Interval data are always
numeric (Mark).
• The scale of measurement for a variable is a ratio scale if the data have all
the properties of interval data and the ratio of two values is meaningful.
This scale requires that a zero value be included to indicate that nothing
Categorical and Quantitative Data
Data can be classified as either categorical or quantitative. Data that can
be grouped by specific categories are referred to as categorical data.
Categorical data use either the nominal or ordinal scale of
measurement. Qualitative data cannot be measured on a naturally
occurring numerical scale but can only be classified into one of a group
of categories.
Data that use numeric values to indicate how much or how many are
referred to as quantitative data. Quantitative data are obtained using
either the interval or ratio scale of measurement.
If the variable is categorical, the statistical analysis is limited. We can
summarize categorical data by counting the number of observations in
each category or by computing the proportion of the observations in
each category. However, even when the categorical data are identified by
a numerical code, arithmetic operations such as addition, subtraction,
multiplication, and division do not provide meaningful results.
Arithmetic operations provide meaningful results for quantitative
variables
• There are three general kinds of data sets—cross-
sectional, time-series and panel.
• Cross-sectional data are data collected at the same
or approximately the same point in time.
• Time series data are data collected over several
time periods.
• Some data sets are both time-series and cross-
sectional. Imagine, for example a data set
containing wage and gender data for each of a
series of years. These are called panel data.
Group Assignment

descriptive statistics
nominal
numerical

You might also like