PGIS Unit 1 Crash Course Contents
PGIS Unit 1 Crash Course Contents
IT
UNIT-I
Chapter 01
A gentle introduction to GIS
Q. Define GIS.
A GIS is a computer-based system that provides the following four sets of
capabilities to handle georeferenced data:
1. Data capture and preparation
2. Data management, including storage and maintenance
3. Data manipulation and analysis
4. Data presentation
This implies that a GIS user can expect support from the system to enter
(georeferenced) data, to analyse it in various ways, and to produce
presentations (including maps and other types) from the data. This would include
support for various kinds of coordinate systems and transformations between them,
options for analysis of the georeferenced data, and obviously a large degree of
freedom of choice in the way this information is presented (such as colour scheme,
symbol set, and medium used).
Data capture and preparation: In the El Ni ̃ no case, data capture refers to the
collection of sea water temperatures and wind speed measurements. This is
achieved by placing buoys with measuring equipment at various places in the ocean.
Each buoy measures a number of things: wind speed and direction; air temperature
and humidity; and sea water temperature at the surface and at various depths down
to 500 metres. For the sake of our example we will focus on sea surface temperature
(SST) and wind speed (WS).
A typical buoy is illustrated in Figure, which shows the placement of various sensors
on the buoy. For monitoring purposes, some 70 buoys were deployed at strategic
places within 10◦ latitude of the Equator, between the Galapagos Islands and
Papua New Guinea.
Data management:
For our example application, data management refers to the storage and
maintenance of the data transmitted by the buoys via satellite communication. This
phase requires a decision to be made on how best to represent our data, both in
terms of their spatial properties and the various attribute values which we need to
store.
Data manipulation and analysis:
It appears that the following steps took place for the upper two figures:
1. For each buoy, the average SST for each month was computed, using the daily
SST measurements for that month. This is a simple computation.
2. For each buoy, the monthly average SST was taken together with the geographic
location, to obtain a georeferenced list of averages.
3. From this georeferenced list, through a method of spatial interpolation, the
estimated SST of other positions in the study are were computed. This step
was performed as often as needed, to obtain a fine mesh of positions with measured
or estimated SSTs from which the maps of Figure were eventually derived.
4. We assume that previous to the above steps we had obtained data about average
SST for the month of December for a series of years. This too may have been
spatially interpolated to obtain a ‘normal situation’ December data set of a fine
resolution.
Data presentation:
After the data manipulations discussed above, our data is prepared for producing
output. The data presentation phase deals with putting it all together into a format
that communicates the result of data analysis in the best possible way.
Steps:
• The message we wanted to portray is what are the El Ni n ̃ o and La Ni ̃na events,
both in absolute figures, but also in relative figures, i.e. as differences from a normal
situation.
• The audience for this data presentation.
• The medium was this book, (printed matter of A4 size) and possibly a website.
• The rules of aesthetics demanded many things: the maps should be printed north-
up; with clear georeferencing; with intuitive use of symbols et cetera.
• The techniques that we used included the use of a colour scheme and isolines, plus
a number of other techniques.
GIS software can (generically) be applied to many different applications. When there
is no risk of ambiguity, people sometimes do not make the distinction between a
‘GIS’ and a ‘GIS application’. Project-based GIS applications usually have a clear-cut
purpose, and these applications can be short-lived: the research is carried out by
collecting data, entering data in the GIS, analysing the data, and producing
informative maps. An example is rapid earthquake damage assessment. Institutional
GIS applications, on the other hand, usually have as their goal the continued
administration of spatial change and the sustained availability of spatial base data.
1. Maps: Maps have been used for thousands of years to represent information
about the real world, and continue to be extremely useful for many applications in
various domains. Their conception and design has developed into a science with a
high degree of sophistication. A disadvantage of the traditional paper map is that it
is generally restricted to two-dimensional static representations, and that it is always
displayed in a fixed scale. Cartography, as the science and art of map making,
functions as an interpreter, translating real world phenomena (primary data) into
correct, clear and understandable representations for our use. Maps also become a
data source for other applications, including the development of other maps. With
the advent of computer systems, analogue cartography developed into digital
cartography, and computers play an integral part in modern cartography. Alongside
this trend, the role of the map has also changed accordingly, and the dominance of
paper maps is eroding in today’s increasingly ‘digital’ world.
2. Databases
A database is a repository for storing large amounts of data. It comes with a number
of useful functions:
1. A database can be used by multiple users at the same time—i.e. it allows
concurrent use,
2. A database offers a number of techniques for storing data and allows the use of
the most efficient one—i.e. it supports storage optimization,
3. A database allows the imposition of rules on the stored data; rules that will be
automatically checked after each update to the data—i.e. it supports data integrity,
4. A database offers an easy to use data manipulation language, which allows the
execution of all sorts of data extraction and data updates—i.e. it has a query facility,
5. A database will try to execute each query in the data manipulation language in the
most efficient way—i.e. it offers query optimization.
In order to better understand both our representation of the phenomena, and our
eventual output from any analysis, we can use the GIS to create visualizations from
the computer representation, either on-screen, printed on paper, or otherwise. It is
crucial to understand the fundamental differences between these notions. The real
world, after all, is a completely different domain than the ‘GIS’ world, in which we
build models or simulations of the real world.
Given the complexity of real world phenomena, our models can by definition never
be perfect. We have limitations on the amount of data that we can store, limits on
the amount of detail we can capture, and (usually) limits on the time we have
available for a project.
Any geographic phenomenon can usually be represented in various ways; the choice
of which representation is best depends mostly on two issues. Firstly, what original,
raw data (from sensors or otherwise) is available, and secondly, what sort of data
manipulation is required or will be undertaken.
Essentially, these two types of fields differ in the type of cell values. A discrete field
like land use type will store cell values of the type ‘integer’. Therefore it is also called
an integer raster. Discrete fields can be easily converted to polygons, since it is
relatively easy to draw a boundary line around a group of cells with the same value.
A continuous raster is also called a ‘floating point’ raster. A field-based model
consists of a finite collection of geographic fields: we may be interested in elevation,
barometric pressure, mean annual rainfall, and maximum daily evapotranspiration,
and thus use four different fields to model the relevant phenomena within our study
area.
Q. Which are the different Data types used in GIS?
1. Nominal data values are values that provide a name or identifier so that we can
discriminate between different values, but that is about all we can do. Specifically,
we cannot do true computations with these values. An example is the names of
geological units. This kind of data value is called categorical data when the values
assigned are sorted according to some set of non-overlapping categories. For
example, we might identify the soil type of a given area to belong to a certain (pre-
defined) category.
2. Ordinal data values are data values that can be put in some natural sequence but
that do not allow any other type of computation. Household income, for instance,
could be classified as being either ‘low’, ‘average’ or ‘high’. Clearly this is their natural
sequence, but this is all we can say—we can not say that a high income is twice as
high as an average income.
3. Interval data values are quantitative, in that they allow simple forms of
computation like addition and subtraction. However, interval data has no arithmetic
zero value, and does not support multiplication or division. For instance, a
temperature of 20◦ C is not twice as warm as 10◦C, and thus centigrade temperatures
are interval data values, not ratio data values.
4. Ratio data values allow most, if not all, forms of arithmetic computation.
Shape is usually important because one of its factors is dimension. This relates to
whether an object is perceived as a point feature, or a linear, area or volume feature.
The petrol stations mentioned above apparently are zero-dimensional, i.e. they are
perceived as points in space; roads are one-dimensional, as they are considered to
be lines in space. In another use of road information—for instance, in multi-purpose
cadastre systems where precise location of sewers and manhole covers matters—
roads might well be considered to be two-dimensional entities, i.e. areas within
which a manhole cover may fall.
In all regular tessellations, the cells are of the same shape and size, and the
field attribute value assigned to a cell is associated with the entire area occupied by
the cell. The square cell tessellation is by far the most commonly used, mainly
because georeferencing a cell is so straightforward. These tessellations are known
under various names in different GIS packages, but most frequently as rasters.
A raster is a set of regularly spaced (and contiguous) cells with associated (field)
values. The associated values represent cell values, not point values. This means that
the value for a cell is assumed to be valid for all locations within the cell.
The size of the area that a single raster cell represents is called the raster’s resolution.
Sometimes, the word grid is also used, but strictly speaking, a grid refers to values at
the intersections of a network of regularly spaced horizontal and perpendicular lines
The field value of a cell can be interpreted as one for the complete tessellation cell, in
which case the field is discrete, not continuous or even differentiable.
To improve on this continuity issue, we can do two things:
•Make the cell size smaller, so as to make the ‘continuity gaps’ between the cells
smaller, and/or
•Assume that a cell value only represents elevation for one specific location in the
cell, and to provide a good interpolation function for all other locations that has the
continuity characteristic.
The location associated with a raster cell is fixed by convention, and may be
the cell centroid (mid-point) or, for instance, its left lower corner. Values for
other positions than these must be computed through some form of interpolation
function, which will use one or more nearby field values to compute the value at the
requested position. This allows us to represent continuous, even differentiable,
functions.
An important advantage of regular tessellations is that we know how they partition
space, and we can make our computations specific to this partitioning. This leads to
fast algorithms.
Q. What is Irregular tessellations. How will you create quad tree? Explain with
example.
These are partitions of space into mutually disjoint cells, but now the cells may vary
in size and shape, allowing them to adapt to the spatial phenomena that they
represent.
A well-known data structure in this family—upon which many more variations have
been based—is the region quad tree. It is based on a regular tessellation of square
cells, but takes advantage of cases where neighbouring cells have the same field
value, so that they can together be represented as one bigger cell. A simple
illustration is provided in Figure. It shows a 8×8 raster with three possible field
values: white, green and blue. The quad tree that represents this raster is
constructed by repeatedly splitting up the area into four quadrants, which are
called NW, NE, SE, SW for obvious reasons. This procedure stops when all the
cells in a quadrant have the same field value. The procedure produces an upside-
down, tree-like structure, known as a quad tree.
Quadtrees are adaptive because they apply the spatial auto correlation principle, i.e.
that locations that are near in space are likely to have similar field values. When a
conglomerate of cells has the same value, they are represented together in the
quadtree, provided boundaries coincide with the predefined quadrant
boundaries.
Some tessellations are better than others, in the sense that they make smaller errors
of elevation approximation. For instance, if we base our elevation computation for
location P on the left hand shaded triangle, we will get another value than from the
right hand shaded triangle.
A TIN clearly is a vector representation: each anchor point has a stored
georeference. Yet, we might also call it an irregular tessellation, as the chosen
triangulation provides a partitioning of the entire study space. However, in this case,
the cells do not have an associated stored value as is typical of tessellations, but
rather a simple interpolation function that uses the elevation values of its three
anchor points.
Collections of (connected) lines may represent phenomena that are best viewed as
networks. With networks, specific types of interesting questions arise that have to do
with connectivity and network capacity. These relate to applications such Networks
as traffic monitoring and watershed management. With network elements—i.e. the
lines that make up the network—extra values are commonly associated like distance,
quality of the link, or carrying capacity.
Area representations:
When area objects are stored using a vector approach, the usual technique is
to apply a boundary model. This means that each area feature is represented by
some arc/node structure that determines a polygon as the area’s boundary.
Common sense dictates that area features of the same kind are best stored Polygons
in a single data layer, represented by mutually non-overlapping polygons. In
essence, what we then get is an application-determined (i.e. adaptive) partition of
space.
Observe that a polygon representation for an area object is yet another example of a
finite approximation of a phenomenon that inherently may have a curvi-linear
boundary.
A simple but naive representation of area features would be to list for each polygon
simply the list of lines that describes its boundary. Each line in the list would, as
before, be a sequence that starts with a node and ends with one, possibly with
vertices in between.
The line that makes up the boundary between them is the same, which means that
using the above representation the line would be stored twice, namely once for each
polygon. This is a form of data duplication—known as data redundancy—which is (at
least in theory,) unnecessary, although it remains a feature of some systems. There is
another disadvantage to such polygon-by-polygon representations. If we want to
find out which polygons border the bottom left polygon, we have to do a rather
complicated and time-consuming analysis comparing the vertex lists of all boundary
lines with that of the bottom left polygon.
The boundary model is an improved representation that deals with these
disadvantages. It stores parts of a polygon’s boundary as non-looping arcs and
indicates which polygon is on the left and which is on the right of each arc. A simple
example of the boundary model is provided in Figure. It illustrates which additional
information is stored about spatial relationships between lines and polygons.
Obviously, real coordinates for nodes (and vertices) will also be stored in another
table.
Q. What do you mean by Topology and spatial relationships.
General spatial topology:
Topology deals with spatial properties that do not change under certain
transformations. For example, features drawn on a sheet of rubber can be made to
change in shape and size by stretching and pulling the sheet. However, some
properties of these features do not change:
•Area E is still inside area D,
•The neighbourhood relationships between A,B,C,D, and E stay intact, and their
boundaries have the same start and end nodes, and
•The areas are still bounded by the same boundaries, only the shapes and lengths of
their perimeters have changed.
Topological relationships are built from simple elements into more complex
elements: nodes define line segments, and line segments connect to define lines,
which in turn define polygons.
In what follows below, we will look at aspects of topology in two ways. Firstly, using
simplices, we will look at how simple elements (points) can be combined to define
more complex ones (lines and polygons). Secondly, we will examine the logical
aspects of topological relationships using set-theory.
Topological relationships:
The mathematical properties of the geometric space used for spatial data can be
described as follows:
•The space is a three-dimensional Euclidean space where for every point we can
determine its three-dimensional coordinates as a triple (x,y,z) of real numbers. In this
space, we can define features like points, lines, polygons, and volumes as geometric
primitives of the respective dimension. A point is zero-dimensional, a line one-
dimensional, a polygon two-dimensional, and a volume is a three-dimensional
primitive.
•The space is a metric space, which means that we can always compute the distance
between two points according to a given distance function. Such a function is also
known as a metric.
•The space is a topological space, of which the definition is a bit complicated.
In essence, for every point in the space we can find a neighbourhood around it that
fully belongs to that space as well.
•Interior and boundary are properties of spatial features that remain invariant under
topological mappings. This means, that under any topological mapping, the interior
and the boundary of a feature remains unbroken and intact.
Figure shows all eight spatial relationships: disjoint, meets, equals, inside, covered by,
contains, covers, and overlaps. These relationships can be used in queries against a
spatial database, and represent the ‘building blocks’ of more complex spatial queries.
Q. Write about the set of rules that defines the topological consistency.
Q. Write a note the temporal dimension.
Geographic phenomena are also dynamic; they change over time.
Examples of the kinds of questions involving time include:
• Where and when did something happen?
• How fast did this change occur?
• In which order did the changes happen?
Representing time in GIS:
• Spatiotemporal data models are ways of organizing representations of space
and time in a GIS.
• The most common of these is a ‘snapshot’ state that represents a single point
in time of an ongoing natural or man-made process.
• We may store a series of these snapshot states to represent change
Different ‘concepts’ of time:
• Discrete and continuous time : Time can be measured along a discrete or
continuous scale.
• Discrete time is composed of discrete elements (seconds, minutes, hours,
days, months, or years).
• In continuous time, no such discrete elements exist, and for any two different
points in time, there is always another point in between. Derive temporal
relationships between events and periods such as ‘before’, ‘overlap’, and
‘after’.
• Valid time and transaction time: Valid time (or world time) is the time when an
event really happened, or a string of events took place. Transaction time (or
database time) is the time when the event was stored in the database or GIS.
• Linear, branching and cyclic time: Time can be considered to be linear,
extending from the past to the present (‘now’), and into the future. Branching
time—in which different time lines from a certain point in time onwards are
possible—and cyclic time—in which repeating cycles such as seasons or days
of a week are recognized.
• Time granularity: When measuring time, granularity is the precision of a time
value in a GIS or database (e.g. year, month, day, second, etc.). Different
applications may obviously require different granularity.
• Absolute and relative time: Time can be represented as absolute or relative.
Absolute time marks a point on the time line where events happen (e.g. ‘6 July
1999 at 11:15 p.m.’). Relative time is indicated relative to other points in time
(e.g. ‘yesterday’, ‘last year’, ‘tomorrow’, which are all relative to ‘now’, or ‘two
weeks later’.
• Change detection: Studies of this type are usually based on some ‘model of
change’, which includes knowledge and hypotheses of how change occurs for
the specific phenomena being studied. It includes knowledge about speed of
tree growth.
• Spatiotemporal analysis: we consider changes of spatial and thematic
attributes over time. We can keep the spatial domain fixed and look only at
the attribute changes over time for a given location in space.
• On the other hand, we can keep the attribute domain fixed and consider the
spatial changes over time for a given thematic attribute.
• This may lead to notions of object motion, a subject receiving increasing
attention in the literature. Applications of moving object research include
traffic control, mobile telephony, wildlife tracking, vector-borne disease
control, and weather forecasting.
• Object identity: When does a change or movement cause an object to
disappear and become a new one? With wildlife this is quite obvious.