Lecture Note 317
Lecture Note 317
Surv. Okoli, F. U
So, what is GIS? What can they do? To give you some idea, consider an example in natural resources
management. Assume that you have been given the following tasks for a particular region (ie. local
government area, state, country, etc.)
The more you think about it, the more complex it becomes. Just imagine what you may need: lots (I mean
lots!) of data, access to a range of departments and agencies, various software and hardware, many
personnel, etc. Well...it can be done - you guessed it - using GIS!
System:
• decision-oriented reporting
• effective processing of data
• effective management of data
• adequate flexibility
• a satisfying user environment
"An organised collection of computer hardware, software, geographic data, and personnel designed to
efficiently capture, store, update, manipulate, analyse, and display all forms of geographically referenced
data."
The advantages of GIS are many and relate to the fact that GIS is an integrating technology - one that
brings together many different applications, data and users. One word that can be used to describe the
benefit of GIS is synergy. In particular, the following can be sited as advantages of GIS:
• Integrates spatial and other (aspatial) data across a diverse range of applications
• Identifies connections between activities based on geographic proximity
• Manipulate and display geographic knowledge
• Provides access to administrative records
• A tool for enhancing decision making
• Increases ability to model science and management problems
• A catalyst to further development
The applications of GIS technology can be categorised into four broad areas:
Infrastructure
GIS have developed over time across a wide range of disciplines. As a matter of fact, the whole
foundational concept of GIS is multi-disciplinary.
Disciplines involved:
Computer science
Remote sensing
Cartography
Statistics,
Geodesy
Photogrammetry
Surveying
Geography
Geosciences - geology, geophysics, minerals and petroleum, etc.
Mathematics: geometry, graph theory
Information systems
Urban and regional planning etc.....
• CAD:
Primarily used for creating and modifying 2D and 3D drawings and models, CAD software is
essential for the design and planning of various projects.
• BIM:
Building Information Modelling goes beyond CAD by creating digital representations of physical
and functional characteristics of a facility. It allows for collaboration, analysis, and simulation of
design, construction, and operational phases.
• GIS:
Geographic Information Systems provide a framework for managing and analysing spatial data,
including maps, imagery, and other geospatial information. GIS enables integration of project
data with its location and surroundings, facilitating spatial analysis and decision-making.
Integration Benefits:
• Integrating CAD, BIM, and GIS allows for:
• Improved design coordination and collaboration between stakeholders.
• Enhanced spatial analysis and visualization for better decision-making.
• Streamlined workflows and reduced errors throughout the project lifecycle.
• Creation of a "digital twin" of a physical structure for improved design, construction, and
operations.
IV. Methods/Procedure
Method: A successful GIS operates according to a well-designed plan, which are the models and operating
practices unique to each task. There are various techniques used for map creation and further usage for
any project. The map creation can either be automated raster to vector creator or it can be manually
vectored using the scanned images. The source of these digital maps can be either map prepared by any
survey agency or satellite imagery.
V. People or Expertise
GIS users range from technical specialists who design and maintain the system to those who use it to help
them perform their everyday work. GIS operators solve real time spatial problems. They plan, implement
and operate to draw conclusions for decision making.
VI. Network
Networks in GIS can be represented using different data structures, such as vector-based or graph-based
representations. Vector-based networks use lines or polylines to represent edges and points for nodes,
while graph-based networks use nodes and edges to form a graph structure. Graph-based representations
are often more flexible and efficient for network analysis tasks.
GIS software provides tools and functionalities to build, manage, and analyze networks. These tools allow
users to define network connectivity rules, assign attributes to nodes and edges, calculate distances and
routes, perform network-based queries, and visualize network relationships.
Almost everything that happens, happens somewhere. Largely, we humans are confined in our activities
to the surface and near-surface of the Earth. Human activities revolve within the earth surface which can
be above, on or beneath the surface of the earth. Keeping track of all of this activity is important, and
knowing where it occurs can be the most convenient basis for tracking. Knowing where something
happens is of critical importance if we want to go there ourselves or send someone there, to find other
information about the same place, or to inform people who live nearby. In addition, most (perhaps all)
decisions have geographic consequences, e.g., adopting a particular funding formula creates geographic
winners and losers, especially when the process entails zero sum gains. Therefore, geographic location is
an important attribute of activities, policies, strategies, and plans. Geographic information systems are a
special class of information systems that keep track not only of events, activities, and things, but also of
where these events, activities, and things happen or exist.
Since the real-world problems are geographic in nature, how do we distinguish between one geographic
problem to another. There are several ways to that but we will focus on three major categories.
1. Spatial scale:
The engineers design of a building can presents geographic problems, as in disaster management, but only
at a very detailed or local scale. The information needed to service the building is also local – the size and
shape of the parcel, the vertical and subterranean extent of the building, the slope of the land, and its
accessibility using normal and emergency infrastructure. The global diffusion of the 2019 COVID, or of
bird flu in 2004 were problems at a much broader and coarser scale, involving information about entire
national populations and global transport patterns.
The complexity of the real world, as well as the broad spectrum of its interpretations, suggests that GIS
system designs will vary according to the capabilities and preferences of their creators. This human factor
can introduce an element of constraint, as data compiled for a particular application may be less useful
elsewhere. Using GIS to solve problems in the real world requires interaction between the real world, the
GIS and the users. The real-world problems can be described only in terms of models that delineate the
concepts and procedures needed to translate real world observations into data that are meaningful in GIS.
The real-world problem needs to be represented within a GIS. The users perceived the real world in a
manner related to their problem, and hence need to be able to communicate with the GIS in terms related
to their problem (ie. data, functionality, etc.) modelling the real problem into GIS environment. GIS solves
the problem of the real world which been modelled and give a represented of the problem in various forms
like maps for decision making.
Geographic features in the real world can be represented in a number of ways as follows:
1. Analog map
2. Digital map
3. GIS
• A geographic database involves much more than a cartographic database (ie. much more than
simple a map or maps)
• The emphasis is on the structure and management of data and their relationships
• Based on the analytical paradigm - focus is on analysis
• The concepts of GIS extend far beyond the map!
The process for obtaining a representation of the real world follows the cartographic process for
abstraction and generalization. The process involves the steps of selection, classification, simplification
and symbolization.
The process for obtaining a GIS representation must consider the purpose, content and detail of the
database. This is similar to the cartographic map-making process in which the purpose, content,
cartographic scale and presentation must be considered in producing a map.
The steps of the process of abstraction and generalization are described as follows:
• Selection. Involves decisions regarding the geographic space to be mapped, map scale, map
coordinates and projection, data variables to be mapped, data gathering/sampling techniques.
• Classification. Process in which objects are placed in groups according to similar properties.
This reduces the complexity and improves the organization of a map.
• Simplification. Map features can be simplified by smoothing curves and straightening paths
to eliminate unnecessary detail. For example, a straight line between two cities could indicate
the connectivity between cities rather than the exact positional location of a road which may
be irrelevant for a particular application.
In many ways, GIS have retained the notion of the map and many map concepts are found back in GIS.
However, the manner in which GIS handle and analyse data is very different from that for maps. This is
despite the fact that much data input into GIS is derived from maps.
Within GIS, data is often structured in a layered fashion representing the way in which maps have
traditionally been handled. Each layer, also known as a coverage, contains some specific data such as a
theme (eg. roads, vegetation cover, soils, etc.), time period (eg. years 1970, 1980, 1990) or vertical slices
(eg. ground floor, first floor, etc. of a building).
• data
• functionality, and
• a user interface.
The database is the heart of the GIS. It must be structured so that the data can be accessed by functions
initiated by users. In the following sections, we will consider the structure of the data as well as the
functions that operate on the data.
The spatial component consists of locational information (ie. absolute or relative X, Y coordinates),
geometry (ie. shape of point, line and polygon features [or raster cells)) and topology (ie. relationships
between points, lines and polygons - adjacency, connectivity, and containment). Attribute data can consist
of both descriptive data and cartographic attributes (eg. line color and thickness, point symbol, etc.). A
third component is temporal data which is sometimes considered as a further dimension (eg. fourth
dimension) but is often included as another attribute of the data. Never forget Metadata.
WHAT
Phenomenon
In relation to our geographic location, the ‘what’ becomes the object (phenomenon), the ‘where’ becomes
space (location) and the when becomes time (Date). The predefined and measured one depend on the
view which is based on field based and Object based models.
Object-Based Model:
The object is a spatial feature and has some characteristics like spatial boundary, application relevant and
feature description (attributes). Spatial objects represent discrete features with well-defined or identifiable
boundaries, for example, buildings, parks, forest lands, geomorphological boundaries, soil types, etc. In
this model, data can be obtained by field surveying methods (chain-tape, theodolite and total station
surveying, GPS/DGPS survey) or laboratory methods (aerial photo interpretation, remote sensing image
analysis and onscreen digitization). Depending on the nature of the spatial objects we may represent them
as graphical elements of points, lines and polygons.
Field-Based Model:
Spatial phenomena are real world features that vary continuously over space with no specific boundary.
Data for spatial phenomena may be organized as fields which are obtained by direct or indirect sources.
Source of direct data is from aerial photos, remote sensing imagery, scanning of hard copy maps, and field
investigations made at selected sample locations. We can obtain or generate the data by using
mathematical functions such as interpolation, sampling or reclassification from selected sample locations.
This approach comes under indirect data source. For example, Digital Elevation Model (DEM) can be
generated from topographic data such as spot heights and contours that are usually obtained by indirect
measurements. Spatial database may be organized as either object-based model or the field-based model.
In object-based databases, the spatial units are discrete objects which can be obtained from field-based
data by means of object recognition and mathematical interpolation. In the object-based model, spatial
data is mostly represented in the form of coordinate’s lists (i.e. vector lines) and generally called as the
vector data model. When a spatial phenomenon database is structured on the field-based model in the
form of grid of square or rectangular cells then the representation is generally called as the raster data
model. Geospatial database possesses two distinct components such as locations and attributes.
Geographical features in the real world are very difficult to capture and may requires large scale database.
GIS can organize reality through the data models. Each model tends to fit certain types of data and
applications better than others. All spatial data models fall into two basic categories: raster and vector.
Fig.
Sphere and Ellipsoid
Assuming that the earth is a perfect sphere greatly simplifies mathematical calculations and works
well for small-scale maps (maps that show a large area of the earth). However, when working at
larger scales, an ellipsoid representation of earth may be desired if accurate measurements are
needed. An ellipsoid is defined by two radii: the semi-major axis (the equatorial radius) and the
semi-minor axis (the polar radius).
The reason the earth has a slightly ellipsoidal shape has to do with its rotation which induces a
centripetal force along the equator. This results in an equatorial axis that is roughly 21 km longer
than the polar axis.
Geoid
Representing the earth’s true shape, the geoid, as a mathematical model is crucial for a GIS
environment. However, the earth’s shape is not a perfectly smooth surface. It has undulations
resulting from changes in gravitational pull across its surface. These undulations may not be
visible with the naked eye, but they are measurable and can influence locational measurements.
Fig Geoid
Datums
Datums provide a reference surface for measuring positions on the Earth and are the basis for
coordinate systems. So how are we to reconcile our need to work with a (simple) mathematical
model of the earth’s shape with the undulating nature of the earth’s surface (i.e. its geoid)? The
solution is to align the geoid with the ellipsoid (or sphere) representation of the earth and to map
the earth’s surface features onto this ellipsoid/sphere. The alignment can be local where the
ellipsoid surface is closely fit to the geoid at a particular location on the earth’s surface (such as
the state of Kansas) or geocentric where the ellipsoid is aligned with the centre of the earth. How
one chooses to align the ellipsoid to the geoid defines a datum.
This projection is often used in mapping polar regions but can be used for any location on the
earth’s surface (in which case they are called oblique planar projections).
Cylindrical Projection
A cylindrical map projection maps the earth surface onto a map rolled into a cylinder (which can
then be flattened into a plane). The cylinder can touch the surface of the earth along a single line
of tangency (a tangent case), or along two lines of tangency (a secant case).
Distortion is minimized along the tangent or secant lines and increases as the distance from these
lines increases. When distance or area measurements are needed for the contiguous 48 states, use
one of the conical projections such as Equidistant Conic (distance preserving) or Albers Equal
Area Conic (area preserving).
Conical projections are also popular PCS’ in European maps such as Europe Albers Equal Area
Conic and Europe Lambert Conformal Conic.
Spatial Properties
All projections distort real-world geographic features to some degree. The four spatial properties
that are subject to distortion are: shape, area, distance and direction. A map that preserves shape
is called conformal; one that preserves area is called equal-area; one that preserves distance is
Data formats
In Geographic Information Systems (GIS), data formats refer to the structured ways in which
spatial and attribute data are stored, processed, and exchanged. These formats can be broadly
categorized into vector, raster, and non-spatial/tabular formats, each serving different types of
spatial analysis and visualization.
Vector data represents geographic features using points, lines, and polygons.
• Shapefile (.shp, .shx, .dbf) – A popular Esri format; stores geometry and attribute data
separately.
• GeoJSON (.geojson) – A lightweight, web-friendly format using JavaScript Object
Notation (JSON).
• GPKG (GeoPackage) – An open standard based on SQLite; stores vector and raster data in
one file.
• KML/KMZ (.kml/.kmz) – Used in Google Earth for sharing geospatial data with styling
and metadata.
• File Geodatabase (.gdb) – Esri’s modern format that supports advanced data management
and large datasets.
Raster data is used to represent continuous surfaces like elevation, temperature, or satellite
imagery.
• TIFF/GeoTIFF (.tif) – Stores raster images with embedded spatial reference information.
• JPEG (.jpg) – Compressed raster image; often used for backgrounds or base maps (less
accurate).
Lecture Note on SGI 317 Principles of GIS by Surv. Okoli, F.U
• IMG (.img) – A format used by ERDAS Imagine for storing remote sensing imagery.
• GRID – An Esri raster format, either in binary or ASCII version.
These store attribute data or metadata, which can be linked to spatial features.
• CSV (.csv) – Comma-separated values; commonly used for attribute tables and coordinate
lists.
• XLS/XLSX (.xls/.xlsx) – Excel spreadsheet formats that may include spatial data or links
to it.
• TXT (.txt) – Plain text files with tabular or coordinate data.
• DBF (.dbf) – A database file used with shapefiles to store attribute data.
4. Web-Based Formats
These formats are optimized for sharing and visualizing data online.
• Web Feature Service (WFS) – Provides vector features over the web.
• Web Map Service (WMS) – Serves map images generated from spatial data.
• Mapbox Vector Tiles (.mvt) – Compressed vector data for fast web rendering.
Importance of Topology
Topology requires additional data files to store the spatial relationships. This naturally raises the
question: What are the advantages of having topology built into a data set?
1. It ensures data quality and integrity.
This implies that Topology enables detection of lines that do not meet and polygons that do not
close properly.
2. Topology can enhance GIS analysis.
In Geographic Information Systems (GIS), spatial relationships refer to the ways in which
geographic features interact with, relate to, or are positioned relative to each other in space.
Understanding these relationships is fundamental to spatial analysis, as it allows users to model,
query, and interpret real-world phenomena based on location, proximity, and arrangement.
During the buffering process, the growth of buffers can be restricted by:
• Barriers
o prevent any movement through the barrier
▪ two types exist:
• absolute barrier - prevents movement entirely (e.g cliff, lake, fence,
forest, etc.)
• relative barrier - restricts movement at particular locations or times
(e.g narrow bridge, dried-up Salt Lake in summer, shallow streams,
etc.)
o Friction surfaces
▪ movement is restricted across a surface representing "cost of movement"
▪ a cost is incurred for movement, in effect slowing or restricting movement
while not preventing it entirely (e.g up-hill or down-hill slopes, swamps,
sandy soils, etc. may all contribute to reducing or increasing the buffer size)
▪ a layer of impedance values (providing cost of movement) can be used to
represent the friction surface.
Spatial overlay
An overlay operation combines the geometries and attributes of two feature layers to create the output.
The geometry of the output represents the geometric intersection of features from the input layers. The
spatial overlay function involves combining information (cells) from two or more layers to form a new
layer. Of course, the layers (and cells) must be geographically aligned (georeferenced) with each other in
order for the overlay to take place. Two types of overlay operations exist:
Weighted Overlay
The weighted overlay operation allows values other than binary to be included as cell values. The cell
values are then combined using arithmetic, statistical or merge operators (as already indicated). For
example, consider the problem: Can we predict our crop yield based on fertiliser rates and last year's crop
yields?
Display
Pattern Analysis
Feature Measurement
Feature manipulation in vector analysis refers to the range of operations used to alter, combine,
or extract spatial features represented as points, lines, and polygons in a vector dataset. This
process is fundamental in the management and analysis of geospatial data within Geographic
Information Systems (GIS). It supports effective decision-making in areas such as urban planning,
infrastructure development, environmental management, and resource allocation. The operations
described above are routinely performed using GIS software platforms such as ArcGIS and QGIS,
as well as spatial database extensions like PostGIS and other tools including GRASS GIS.
Vector data typically represents real-world phenomena in three forms: points for discrete locations
such as GPS stations or boreholes, lines for linear features like roads and rivers, and polygons for
area features such as land parcels and lakes. Through feature manipulation, these data types can
be modified or analyzed to reveal spatial patterns, support decisions, or integrate with other
datasets.
One basic form of feature manipulation is selection, which involves extracting features based on
specific criteria from the attribute table or spatial relationship. For example, selecting all road
features within a defined administrative region enables focused analysis. Another common
operation is clipping, which restricts the extent of features to a given boundary, such as limiting
land cover data to a study area. Union operations allow for the combination of two polygon layers,
preserving all input features and their attributes, often used in integrating thematic datasets like
land use and administrative zones.
Network Analysis
Network analysis requires a network that is vector-based and topologically connected. Perhaps
the most common network analysis is shortest path analysis, which is used, for example, in in-
vehicle navigation systems to help drivers find the shortest route between an origin and a
destination. Network analysis also includes the traveling salesman problem, vehicle routing
problem, closest facility, allocation, and location-allocation.
A network is a system of linear features that has the appropriate attributes for the flow of objects.
For example, a road system is a familiar network. Other networks include railways, public transit
lines, bicycle paths, and streams. A network is typically topology-based: lines meet at
intersections, lines cannot have gaps, and lines have directions.
A network with the appropriate attributes can be used for a variety of applications. Some
applications are directly accessible through GIS tools. Others require the integration of GIS and
specialized software in operations research and management science. Shortest path analysis finds
the path with the minimum cumulative impedance between nodes on a network. Because the link
impedance can be measured in distance or time, a shortest path may represent the shortest route
or fastest route. Shortest path analysis typically begins with an impedance matrix in which a value
represents the impedance of a direct link between two nodes on a network and an ∞ (infinity)
means no direct connection. Link refers to a road segment defined by two end points also called
edges or arcs. Links are the basic geometric features of a network. A Link impedance is the cost
of traversing a link, which may be measured by the physical length or the travel time.
A. Rasterization is the process of converting vector data (points, lines, and polygons) into
raster format (a grid of cells or pixels). In this process, vector features are overlaid on a
grid, and each cell of the raster is assigned a value based on whether it intersects with the
vector feature and what attribute is selected for conversion. For example, a polygon map
of land use can be rasterized so that each pixel carries a code representing a land use type
such as agriculture, forest, or urban. This method is especially useful when preparing data
for analysis techniques that require raster input, such as terrain modelling, hydrological
modelling, or image classification.
Rasterization requires careful consideration of cell size (resolution), as it affects the
precision and detail of the output raster. A small cell size gives a more detailed raster but
increases file size and processing time, while a larger cell size may result in a loss of detail
or misrepresentation of narrow or small features.
An error is a discrepancy between the encoded and actual value of a particular attribute for a given
entity. “Actual value” implies the existence of an objective, observable reality. However, reality
may be:
• Unobservable (e.g., historical data)
• Impractical to observe (e.g., too costly)
• Perceived rather than real (e.g., subjective entities such as “neighbourhoods")
In fact, it is not necessary to posit an objective reality in order to assess accuracy, since all
geographical data are collected with the aid of a model that specifies -- implicitly or explicitly--
the required level of abstraction and generalization.
• This is the database “specification” and is closely related to the “terrain nominal” concept
of perceived reality (Salgé, 1995).
• The specification serves as the standard against which accuracy is assessed. Thus the
“actual” value is the value we would expect based on the specification (Brassel et al., 1995).
• Accuracy is always a relative measure, since it is always measured relative to the
specification.
• to judge fitness-for-use, one must judge the data relative to the specification, and also
consider the limitations of the specification itself (CEN, 1995).
Efficient data management typically requires the use of a computer database. The collection of
data, usually referred to as the database, which contains information relevant to an enterprise. It
is also a shared, integrated computer structure that stores a collection of:
i) End-user data, that is, raw facts of interest to the end user.
ii) Metadata, or data about data, through which the end-user data are integrated and
managed.
Underlying the structure of a database is the data model: a collection of conceptual tools for
describing data, data relationships, data semantics, and consistency constraints. A data model
provides a way to describe the design of a database at the physical, Logical, and View level. The
quest for better data management has led to several models that attempt to solve the problem of
the disadvantages of the file system. These models represent different scholars’ school of thought
as to what a database is, what is its purpose, the types of structures that it should employ, and the
technology that would be used to implement these structures.
Briefly, we will discuss various types of data models but will extensively give more explanation
on relational and ER data model. There are a number of different data models that we will cover
in the text.
Hierarchical model.
This model was first implemented by IBM company which was designed for the Apollo program
in 1966. The hierarchical structure contains levels, or segments. A segment is the equivalent of a
file system’s record type. Within the hierarchy, a higher layer is perceived as the parent of the
segment directly beneath it, which is called the child. The hierarchical model depicts a set of one-
to-many (1:M) relationships between a parent and its children’s segments. It can also be easily
stored on tape media. Hierarchical data model has its own demerits that it does not depict (N:M)
relationships and has no data independence.
Relational Model
This model as published by Edgar F “Ted” Codd in 1970, after several years of work with IBM.
The relational model uses a collection of tables to represent both data and the relationships among
those data. Each table has multiple columns, and each column has a unique name. The relational
model is an example of a record-based model. Record-based models are so named because the
database is structured in fixed-format records of several types. Each table contains records of a
particular type. Each record type defines a fixed number of fields, or attributes. The columns of
the table correspond to the attributes of the record type. The relational data model is the most
widely used data model, and a vast majority of current database systems are based on the relational
model.
Hybrid DBMS
Hybrid DBMS are the emerging trend that retain the advantages of the relational model and at the
same time provide programmers with an object-oriented view of the underlying data. These types
of databases preserve the performance characteristics of the relational model and the semantically
rich programmatic support of the object-oriented model. An example of a Hybrid Model is
ARC/INFO, ESRI Shape File, etc.