0% found this document useful (0 votes)
24 views16 pages

Introduction To Econometricsdocx

Uploaded by

nahom595921
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views16 pages

Introduction To Econometricsdocx

Uploaded by

nahom595921
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Chapter One

Introduction
1.1 Definition and scope of econometrics
The economic theories we learn in various economics courses suggest many relationships among
economic variables. For instance, in microeconomics we learn demand and supply models in
which the quantities demanded and supplied of a good depend on its price. In macroeconomics,
we study ‘investment function’ to explain the amount of aggregate investment in the economy as
the rate of interest changes; and ‘consumption function’ that relates aggregate consumption to
the level of aggregate disposable income.
Each of such specifications involves a relationship among economic variables. As economists,
we may be interested in questions such as: If one variable changes in a certain magnitude, by
how much will another variable change? Also, given that we know the value of one variable;
can we forecast or predict the corresponding value of another? The purpose of studying the
relationships among economic variables and attempting to answer questions of the type raised
here is to help us understood the real economic world we live in.
However, economic theories that postulate the relationships between economic variables have to
be checked against data obtained from the real world. If empirical data verify the relationship
proposed by economic theory, we accept the theory as valid. If the theory is incompatible with
the observed behavior, we either reject the theory or in the light of the empirical evidence of the
data, modify the theory. To provide a better understanding of economic relationships and a better
guidance for economic policy making we also need to know the quantitative relationships
between the different economic variables. We obtain these quantitative measurements taken from
the real world. The field of knowledge which helps us to carry out such an evaluation of
economic theories in empirical terms is econometrics.
Having said the background statement in our attempt for defining ‘ECONOMETRICS’, we may
now formally define what econometrics is.
WHAT IS ECONOMETRICS?
The term “econometrics” is believed to have been crafted by Ragnar Frisch (1895-1973) of
Norway, one of the three principal founders of the Econometric Society, first editor of the
journal Econometrica, and co-winner of the first Nobel Memorial Prize in Economic Sciences in
1969

1
Literally interpreted, econometrics means “economic measurement”, but the scope of
econometrics is much broader as described by leading econometricians. Various econometricians
used different ways of wordings to define econometrics. But if we distill the fundamental
features/concepts of all the definitions, we may obtain the following definition.
“Econometrics is the science which integrates economic theory, economic statistics, and
mathematical economics to investigate the empirical support of the general schematic law
established by economic theory. It is a special type of economic analysis and research in which
the general economic theories, formulated in mathematical terms, is combined with empirical
measurements of economic phenomena. Starting from the relationships of economic theory, we
express them in mathematical terms so that they can be measured. We then use specific methods,
called econometric methods in order to obtain numerical estimates of the coefficients of the
economic relationships.”
Measurement is an important aspect of econometrics. However, the scope of econometrics is
much broader than measurement. As D.Intriligator rightly stated the “metric” part of the word
econometrics signifies ‘measurement’, and hence econometrics is basically concerned with
measuring of economic relationships. In short, econometrics may be considered as the
integration of economics, mathematics, and statistics for the purpose of providing numerical
values for the parameters of economic relationships and verifying economic theories.
Econometrics is based upon the development of statistical methods for estimating economic
relationships, testing economic theories, and evaluating and implementing government and
business policy. The most common application of econometrics is the forecasting of such
important macroeconomic variables as interest rates, inflation rates, and gross domestic product.
While forecasts of economic indicators are highly visible and are often widely published,
econometric methods can be used in economic areas that have nothing to do with
macroeconomic forecasting. For example, we will study the effects of political campaign
expenditures on voting outcomes. We will consider the effect of school spending on student
performance in the field of education.
Econometrics has evolved as a separate discipline from mathematical statistics because the
former focuses on the problems inherent in collecting and analyzing no experimental economic
data. Non experimental data are not accumulated through controlled experiments on
individuals, firms, or segments of the economy. (Non-experimental data are sometimes called

2
observational data to emphasize the fact that the researcher is a passive collector of the data.)
Experimental data are often collected in laboratory environments in the natural sciences, but
they are much more difficult to obtain in the social sciences. While some social experiments can
be devised, it is often impossible, prohibitively expensive, or morally repugnant to conduct the
kinds of controlled experiments that would be needed to address economic issues.
1.2 Economic models vs. econometric models
i) Economic models:
Any economic theory is an observation from the real world. For one reason, the immense
complexity of the real world economy makes it impossible for us to understand all
interrelationships at once. Another reason is that all the interrelationships are not equally
important as such for the understanding of the economic phenomenon under study. The sensible
procedure is therefore, to pick up the important factors and relationships relevant to our problem
and to focus our attention on these alone. Such a deliberately simplified analytical framework is
called on economic model. It is an organized set of relationships that describes the functioning of
an economic entity under a set of simplifying assumptions. All economic reasoning is ultimately
based on models. Economic models consist of the following three basic structural elements.
1. A set of variables
2. A list of fundamental relationships and
3. A number of strategic coefficients
ii) Econometric models:
The most important characteristic of econometric models is that they contain a random element
which is ignored by mathematical economic models which postulate exact relationships between
economic variables.
Example: Economic theory postulates that the demand for a commodity depends on its price, on
the prices of other related commodities, on consumers’ income and on tastes. This is an exact
relationship which can be written mathematically as:
Q x=b0 +b1 P x +b2 P 0 +b 3 Y +b 4 t
The above demand equation is exact. However, many more factors may affect demand. In
econometrics the influence of these ‘other’ factors is taken into account by the introduction into
the economic relationships of random variable. In our example, the demand function studied
with the tools of econometrics would be of the stochastic form:

3
Q x=b0 +b1 P x +b2 P 0 +b 3 Y +b 4 t+u
where u stands for the random factors which affect the quantity demanded.

Econometrics vs. mathematical economics


Mathematical economics states economic theory in terms of mathematical symbols. There is no
essential difference between mathematical economics and economic theory. Both state the same
relationships, but while economic theory use verbal exposition, mathematical symbols. Both
express economic relationships in an exact or deterministic form. Neither mathematical
economics nor economic theory allows for random elements which might affect the relationship
and would make it stochastic. Furthermore, they do not provide numerical values for the
coefficients of economic relationships.
Econometrics differs from mathematical economics in that, although econometrics presupposes,
the economic relationships to be expressed in mathematical forms, it does not assume exact or
deterministic relationship. Econometrics assumes random relationships among economic
variables. Econometric methods are designed to take into account random disturbances which
relate deviations from exact behavioral patterns suggested by economic theory and mathematical
economics. Furthermore, econometric methods provide numerical values of the coefficients of
economic relationships.
Econometrics vs. statistics
Econometrics differs from both mathematical statistics and economic statistics. An economic
statistician gathers empirical data, records them, tabulates them or charts them, and attempts to
describe the pattern in their development over time and perhaps detect some relationship
between various economic magnitudes. Economic statistics is mainly a descriptive aspect of
economics. It does not provide explanations of the development of the various variables and it
does not provide measurements the coefficients of economic relationships.

Mathematical (or inferential) statistics deals with the method of measurement which are
developed on the basis of controlled experiments. But statistical methods of measurement are
not appropriate for a number of economic relationships because for most economic relationships
controlled or carefully planned experiments cannot be designed due to the fact that the nature of
relationships among economic variables are stochastic or random. Yet the fundamental ideas of

4
inferential statistics are applicable in econometrics, but they must be adapted to the problem
economic life. Econometric methods are adjusted so that they may become appropriate for the
measurement of economic relationships which are stochastic. The adjustment consists primarily
in specifying the stochastic (random) elements that are supposed to operate in the real world and
enter into the determination of the observed data.

1.3 Methodology of Econometrics


How do econometricians proceed in their analysis of an economic problem? That is, what is their
methodology? The traditional /classical econometric methodology follows the following steps of
lines.
1. Statement of theory or Hypothesis
2. Specification of mathematical model of the theory.
3. Specification of econometric model of the theory
4. Obtaining the data
5. Estimation of the parameters of the econometric model
6. Hypothesis testing
7. Forecasting /Prediction
8. Using the model for control or Policy purpose
To illustrate these steps let us consider the well-known Keynesian theory of consumption

Step 1: Statement of theory or hypothesis


Economic theory or hypothesis is simply a qualitative expression of economic relationships.
Keynes stated that consumption increases as the disposable income of the consumers increase,
but not as much as the increase in their income. In short MPC, the rate of change of
consumption for a unit (say a dollar) change in income, is greater than zero but less than 1.

Step 2: Specification of the mathematical model


Although Keynes postulated a positive relationship between consumption and income, he did not
specify the precise form of the functional relationship between the two. For simplicity a
mathematical economist may suggest the following consumption function.
Y =B1 + B 2 X 0< B2 <1−−−−−−− , ----- (1)
Where y = consumption expenditure

5
X= income
B1 and B2 are parameters of the model with
B1 = intercept
B2 = Slope coefficient.
The slope coefficient, B2, measures the MPC. Thus equation 1 is an example of mathematical
model of the r/ship b/n con and income that is called consumption function.
So in general specification of the model involves the determination of;
 The dependent and explanatory variables
 A priori theoretical expectations about the sign and size of parameters
 The mathematical form of the model, i.e. no of equations, linearity or non – linearity of
equations and so on.
If the model has only one equation, it is called single equation model, if it has two equations, it is
called simultaneous equation model. Thus from the above Keynesian consumption function
 Consumption expenditure (Y) is dependent variable
 Income (X) independent (explanatory) variable
 The sign of b2 is expected to be positive and its magnitude or size is
Between 0 and 1
Step 3: Specification of Econometric model.
The purely mathematical model given in equation (1) is of limited interest for econometricians
because it assumes exact relationship between income and consumption. But in areal world
relationship between variables are generally inexact (inaccurate). Thus if we obtain the data and
plot on X – Y plain, we would not expect all the data to fall on the straight line.
To allow for the inexact relationship between economic variables, the econometrician would
modify the exact r/ship as follows.
Y = B1 + B2X + ui ---------------- (2)
Where

 ui is known as disturbance term or error term or a random (stochastic) variable.


The disturbance term, ui may well represent all those factors that affect consumption but not
explicitly shown in this model.

6
Therefore equation (2) is an example of ECONOMETRIC MODEL. It hypothesizes that the
dependent variable Y is linearly related to explanatory variable (X) but that the r/ship b/n the two
is not exact; it is subject to individual variation.

Step 4: Obtaining Data


To estimate the econometric model given in equation (2), i.e., to obtain the numerical values of
B1 and B2 we need a data. (We collect aggregate personal Consumption expenditure) and GDP
as a measure of aggregate income
If we plot the data, it would be as follows;
Y (consumption expenditure)
. .
. . . .

. . . .
. . . .
. .
.
X (income)
Step 5: Estimation of Econometric model.
Now we have the data, and our next step (task) is to estimate the parameters (B 1 and B2). The
actual technique of estimate is given in the next chapter. For now note that using regression
analysis B1 and B2 are -231.8 and 0.7194, respectively, for the following data
Year Y X
2004 2447.1 3776.3
2005 2476.9 3843.1
2006 2503.7 3760.3
2007 2619.4 3906.6
2008 2746.1 4148.5
2009 2865.8 4279.8
2010 2969.1 4404.5
2011 3052.2 4539.9
2012 3162.4 4718.6
2013 3223.3 4838.0
2014 3260.4 4877.5
2015 3240.8 4821.0
Thus the estimated consumption function will be;

7
Y
= - 231.8 + 0.7194 X ----------------- 3

The HAT on y indicates that it is an estimate.

From the equation, for the year 2004 – 2015, the slope coefficient (MPC) was about 0.72,
suggesting that an increase in real income of one birr (dollar) led on average to increase of about
72 cents in real consumption expenditure. We say an average b/c the r/ship b/n consumption
expenditure and income is inexact.

Step 6: Hypothesis testing

Now once we have estimated the parameters, the next would be to develop a criterion for
checking whether the estimates obtained in steps 3 are in accordance with the expectation of the
theory being tested. As noted earlier, Keynes expected MPC to be positive falling between 0 and
1.

Thus we check whether 0.72 is statistically less than 1. If it is, we support the Keynes
postulate/hypothesis. Such confirmation or refutation of economic theories on the basis of
sample evidence is based on the branch of statistical theory known as statistical inference
(hypothesis testing).

Step 7: Forecasting/Prediction

If the chosen model confirms the theory, we may use it to predict the future values of the
dependent variable, Y on the basis of known or exact future values of explanatory variable X. To
illustrate this, suppose income (X) is expected to be $ 6000 in 2020. What is the forecasted
consumption expenditure in 2020? If we believe that equation 3 will continue to hold in 2020,
we can answer the question simply as:

Y
= -231.8 + 0.7194 (6000)

Y=4084.6
Another use of estimated model of equation 3 is to see the effect of policy changes on income
and there by consumption expenditure.

8
If there is change in income which leads to change in investment expenditure the effect which is
given by the income multiplier (M) defined as.

1
M=
1−mpc

1
M=
1−0⋅72
M=3 . 57
If we use MPC of (0.72), the M (income Multiplier) would be

This means that a change in a dollar investment will eventually lead to about a fourfold change
in income.

Thus quantitative estimates of MPC provide valuable information for policy purposes. Knowing
MPC, one can predict the future course of income and consumption expenditure following a
change in the government’s fiscal policy.

Step 8: Using the model for policy or control purpose

Suppose government believes that an expenditure level of $ 3052.2 (2011) will keep
unemployment rate at its current level of about 6.5 (2020), what level of income will guarantee
the target amount of consumption expenditure?

If the consumption function in equation 3 is acceptable 3052.2 = -231.8 + 0.7196X

X = 4565 dollar (approximately)

That is, income level of 4565, given MPC of about 0.72 will produce an expenditure of $ 3052.2.

As the calculation suggests, an estimated model may be used for control or policy purpose. By
using appropriate fiscal and monetary policy mix, the government can manipulate the control
variable X to produce the desired level of the target variable Y.

Desirable properties of an econometric model


An econometric model is a model whose parameters have been estimated with some appropriate
econometric technique. The ‘goodness’ of an econometric model is judged customarily
according to the following desirable properties.

9
1. Theoretical plausibility. The model should be compatible with the postulates of
economic theory. It must describe adequately the economic phenomena to which it
relates.
2. Explanatory ability. The model should be able to explain the observations of the actual
world. It must be consistent with the observed behaviour of the economic variables
whose relationship it determines.
3. Accuracy of the estimates of the parameters. The estimates of the coefficients should be
accurate in the sense that they should approximate as best as possible the true parameters
of the structural model. The estimates should if possible possess the desirable properties
of unbiasedness, consistency and efficiency.
4. Forecasting ability. The model should produce satisfactory predictions of future values
of the dependent (endogenous) variables.
5. Simplicity. The model should represent the economic relationships with maximum
simplicity. The fewer the equations and the simpler their mathematical form, the better
the model is considered, ceteris paribus (that is to say provided that the other desirable
properties are not affected by the simplifications of the model).
Goals of Econometrics
Three main goals of Econometrics are identified:
i) Analysis i.e. testing economic theory
ii) Policy making i.e. Obtaining numerical estimates of the coefficients of
economic relationships for policy simulations.

iii) Forecasting i.e. using the numerical estimates of the coefficients in order to
forecast the future values of economic magnitudes.

1.4 The Sources, Types and Nature of Data


In most, if not all, studies we collect data to obtain information about an area of research in
which we have an interest. For example, we might want to know the level of dental caries in our
area. In order to discover this we might need to observe a number of different variables, which
could include, age, sex, number of teeth, cavities, fillings, extraction, pain, sepsis and quality of
life. This information or data is normally obtained from a sample of the population which can

10
then be summarized, analyzed and conclusions drawn. This collection, summarizing and analysis
of data are what statistics and statistical technique are all about.
TYPES OF DATA
Discrete versus Continuous Data
Although we refer both gender and height as variables, it’s obvious that they are different from
one another with respect to the type and number of values they can assume. One way to
differentiate between types of variables is to decide whether the values are discrete or
continuous.
Discrete data have values that can assume only whole numbers. Discrete variables can have only
one of a limited set of values. This would include variables such as gender, hair and eye color,
political preference, and which treatment a person received. Continuous data may take any
value, within defined range. The point is that height, like weight, blood pressure, time, and many
other variables are really continuous.

Sources of Data
Primary Data Vs Secondary data
Primary data are originated by a researcher for the specific purpose of addressing the problem
at hand. Primary data are information collected by a researcher specifically for a research
assignment. In other words, primary data are information that a company must gather because no
one has compiled and published the information in a forum accessible to the public. Companies
generally take the time and allocate the resources required to gather primary data only when a
question, issue or problem presents itself that is sufficiently important or unique that it warrants
the expenditure necessary to gather the primary data. Primary data are original in nature and
directly related to the issue or problem and current data. Primary data are the data which the
researcher collects through various methods like interviews, surveys, questionnaires etc. The
primary data have own advantages and disadvantages

Secondary data are data which have already been collected for purposes other than the problem
at hand. These data can be located quickly and inexpensively. Secondary data are the data
collected by a party not related to the research study but collected these data for some other
purpose and at different time in the past. If the researcher uses these data then these become
secondary data for the current users. These may be available in written, typed or in electronic

11
forms. A variety of secondary information sources is available to the researcher gathering data
on an industry, potential product applications and the market place. Secondary data is also used
to gain initial insight into the research problem. Secondary data is classified in terms of its source
either internal or external. Internal, or in-house data, is secondary information acquired within
the organization where research is being carried out. External secondary data is obtained from
outside sources. There are various advantages and disadvantages of using secondary data.

Economic data sets come in a variety of types. While some econometric methods can be applied
with little or no modification to many different kinds of data sets, the special features of some
data sets must be accounted for or should be exploited. Three types of data may be available for
empirical analysis.
1. Cross-Sectional Data
A cross-sectional data set consists of a sample of individuals, households, firms, cities, states,
countries, or a variety of other units, taken at a given point in time. An important feature of
cross-sectional data is that we can often assume that they have been obtained by random
sampling from the underlying population. For example, if we obtain information on wages,
education, experience, and other characteristics by randomly drawing 500 people from the
working population, then we have a random sample from the population of all working people.
Random sampling is the sampling scheme covered in introductory statistics courses, and it
simplifies the analysis of crosssectional data
Cross-sectional data are widely used in economics and other social sciences. In economics, the
analysis of cross-sectional data is closely aligned with the applied microeconomics fields, such
as labor economics, state and local public finance, industrial organization, urban economics,
demography, and health economics. Data on individuals, households, firms, and cities at a given
point in time are important for testing microeconomic hypotheses and evaluating economic
policies.
For example, we want to measure current obesity levels in a population. We could draw a
sample of 1,000 people randomly from that population (also known as a cross section of that
population), measure their weight and height, and calculate what percentage of that sample is
categorized as obese. For example, 30% of our sample was categorized as obese. This cross-
sectional sample provides us with a snapshot of that population, at that one point in time. Note

12
that we do not know based on one cross-sectional sample if obesity is increasing or decreasing;
we can only describe the current proportion.
2. Time Series Data
A time series data set consists of observations on a variable or several variables over time.
Examples of time series data include stock prices, money supply, consumer price index, gross
domestic product, annual homicide rates, and automobile sales figures. Because past events can
influence future events and lags in behavior are prevalent in the social sciences, time is an
important dimension in a time series data set. Unlike the arrangement of cross-sectional data, the
chronological ordering of observations in a time series conveys potentially important
information.
A key feature of time series data that makes it more difficult to analyze than crosssectional data
is the fact that economic observations can rarely, if ever, be assumed to be independent across
time. Most economic and other time series are related, often strongly related, to their recent
histories. For example, knowing something about the gross domestic product from last quarter
tells us quite a bit about the likely range of the GDP during this quarter, since GDP tends to
remain fairly stable from one quarter to the next. While most econometric procedures can be
used with both cross-sectional and time series data, more needs to be done in specifying
econometric models for time series data before standard econometric methods can be justified. In
addition, modifications and embellishments to standard econometric techniques have been
developed to account for and exploit the dependent nature of economic time series and to address
other issues, such as the fact that some economic variables tend to display clear trends over time.

Another feature of time series data that can require special attention is the data frequency at
which the data are collected. In economics, the most common frequencies are daily, weekly,
monthly, quarterly, and annually. Stock prices are recorded at daily intervals (excluding Saturday
and Sunday). The money supply in the U.S. economy is reported weekly. Many macroeconomic
series are tabulated monthly, including inflation and employment rates. Other macro series are
recorded less frequently, such as every three months (every quarter). Gross domestic product is
an important example of a quarterly series. Other time series, such as infant mortality rates for
states in the United States, are available only on an annual basis. Many weekly, monthly, and
quarterly economic time series display a strong seasonal pattern, which can be an important

13
factor in a time series analysis. When econometric methods are used to analyze time series data,
the data should be stored in chronological order.

Pooled Cross Sections


Some data sets have both cross-sectional and time series features. For example, suppose that two
cross-sectional household surveys are taken in the United States, one in 1985 and one in 1990. In
1985, a random sample of households is surveyed for variables such as income, savings, family
size, and so on. In 1990, a new random sample of households is taken using the same survey
questions. In order to increase our sample size, we can form a pooled cross section by
combining the two years. Because random samples are taken in each year, it would be a fluke if
the same household appeared in the sample during both years. (The size of the sample is usually
very small compared with the number of households in the United States.) This important factor
distinguishes a pooled cross section from a panel data set.
Pooling cross sections from different years is often an effective way of analyzing the effects of a
new government policy. The idea is to collect data from the years before and after a key policy
change. As an example, consider the following data set on housing prices taken in 1993 and
1995, when there was a reduction in property taxes in 1994. Suppose we have data on 250
houses for 1993 and on 270 houses for 1995. A pooled cross section is analyzed much like a
standard cross section, except that we often need to account for secular differences in the
variables across the time. In fact, in addition to increasing the sample size, the point of a pooled
cross-sectional analysis is often to see how a key relationship has changed over time.
For example, annual labor force surveys are repeated cross-sections, because every year, a new
random sample is taken from the population. In this case, there is a time component, so these are
not cross-sectional data, but every year, new individuals are surveyed, so these are also not panel
data. That's why these are called repeated cross-sections.
3. Panel or Longitudinal Data
A panel data (or longitudinal data) set consists of a time series for each cross-sectional member
in the data set. As an example, suppose we have wage, education, and employment history for a
set of individuals followed over a ten-year period. Or we might collect information, such as
investment and financial data, about the same set of firms over a five-year time period. Panel
data can also be collected on geographical units. For example, we can collect data for the same
set of counties in Ethiopia on immigration flows, tax rates, wage rates, government expenditures,

14
etc., for the years 2013, 2014, and 2015. The key feature of panel data that distinguishes it from a
pooled cross section is the fact that the same cross-sectional units (individuals, firms, or
counties) are followed over a given time period.
Both pooled cross sectional data and pure panel data collect data over time (this can range from 2
time periods to any large number). They key difference between the two is the "units" we follow.
In pooled cross section, we will take random samples in different time periods, of different units,
i.e. each sample we take, will be populated by different individuals. This is often used to see the
impact of policy or programmes. For example we will take household income data on
households X, Y and Z, in 2010. And then we will take the same income data on households G,
F and A in 2015. Although we are interested in the same data, we are taking different samples
(using different households) in different time periods. In pure panel data, we are following the
same units i.e. the same households or individuals over time. For example we will follow the
same set of households X, Y and Z, for each time period we collect data i.e. in 2010 and we will
also interview the same households in 2015. Therefore the fundamental difference is simply the
units we observe the data for.
Observational Data
A common econometric question is to quantify the impact of one set of variables on another
variable. For example, a concern in labor economics is the returns to schooling — the change in
earnings induced by increasing a worker’s education, holding other variables constant. Another
issue of interest is the earnings gap between men and women. Ideally, we would use
experimental data to answer these questions. To measure the returns to schooling, an experiment
might randomly divide children into groups, mandate different levels of education to the
different groups, and then follow the children’s wage path after they mature and enter the labor
force. The differences between the groups would be direct measurements of the effects of
different levels of education. However, experiments such as this would be widely condemned as
immoral! Consequently, in economics non-laboratory experimental data sets are typically narrow
in scope.
Instead, most economic data is observational. To continue the above example, through data
collection we can record the level of a person’s education and their wage. With such data we can
measure the joint distribution of these variables, and assess the joint dependence. But from
observational data it is difficult to infer causality, as we are not able to manipulate one variable

15
to see the direct effect on the other. For example, a person’s level of education is (at least
partially) determined by that person’s choices. These factors are likely to be affected by their
personal abilities and attitudes towards work. The fact that a person is highly educated suggests a
high level of ability, which suggests a high relative wage. This is an alternative explanation for
an observed positive correlation between educational levels and wages. High ability individuals
do better in school, and therefore choose to attain higher levels of education, and their high
ability is the fundamental reason for their high wages. The point is that multiple explanations are
consistent with a positive correlation between schooling levels and education. Most economic
data sets are observational, not experimental. This means that all variables must be treated as
random and possibly jointly determined. Furthermore, if the data is randomly gathered, it is
reasonable to model each observation as a random draw from the same probability distribution.
In this case we say that the data are independent and identically distributed or iid. We call this a
random sample

16

You might also like