0% found this document useful (0 votes)
68 views33 pages

Lecture 3

This document discusses GIS data sources, data input methods, data quality, and data structures. It describes primary and secondary data sources, methods for digitizing spatial data including manual and automated digitizing, factors that influence data quality such as accuracy and precision, and considerations for choosing appropriate data sources based on cost, time, and accuracy requirements. Remote sensing is presented as a primary data source that allows rapid collection of up-to-date geographical data over large areas.

Uploaded by

Qateel Jutt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views33 pages

Lecture 3

This document discusses GIS data sources, data input methods, data quality, and data structures. It describes primary and secondary data sources, methods for digitizing spatial data including manual and automated digitizing, factors that influence data quality such as accuracy and precision, and considerations for choosing appropriate data sources based on cost, time, and accuracy requirements. Remote sensing is presented as a primary data source that allows rapid collection of up-to-date geographical data over large areas.

Uploaded by

Qateel Jutt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 33

Lecture 3

GIS Data Sources, Data Input Methods, Data


Quality And Data Structures

1
GIS Data Sources
 Data is commonly defined as raw or unorganized form (such as
alphabets, numbers, or symbols) that refers to, or represents, conditions,
ideas, or objects
 GIS Data Sources refers to the origin of data required to meet the needs
of a specific GIS application.
 In GIS data may be textual or numeric values that refer to, or represent
type, condition of real world geographical phenomenon.
 For example, the number of floors in an academic building, size of a
parking lot and dominant rock type of a geological formation,

2
Classification of Data Sources
 Data sources can be classified into Primary data sources and secondary
data sources.
 Primary data sources are datasets obtained directly from the real world
for a specific GIS application. The most important part is that data on
real world features is obtained by direct measurements on the feature
itself.
 The following are various ways in which data on geographical
phenomenon can be obtained by measuring or observing directly in the
real world.
 Remote-sensing. Digital satellite images, Digital aerial photographs.
 GPS measurements.
 Survey measurements.
3
Classification of Data Sources
 Secondary data sources are digital and analogue datasets that were
originally captured for another purpose and need to be converted into a
suitable digital format for a specific GIS application.
 Examples of secondary data sources include :
 Maps Hardcopy, scanned maps
 Tabular data may be standard lists, such as census reports. The data
might be typed into the GIS or copied by a scanner.
 Textual data. Text discussions may not be easily reduced to GIS
format, but the information can be important, thereby requiring
translation by a user.
 Digital products which are processed data sets - sometimes complete
GIS databases and coverages compiled by another organization
4
Suitability/appropriateness of Data Sources
 Since data can be obtained from primary sources or secondary sources.
One may be confronted with a situation to choose one of the two data
sources. There are three primary dimensions that can be used to evaluate
the suitability of a data source for a particular GIS projects.
 The cost of obtaining data should be confined to the budget limits. Thus
in order to choose a data source it is crucial to consider the costs
associated by primary or secondary data sources.
 GIS projects are conducted within a specified duration or time frames.
The choice of data source should allow project completion within these
prespecified times.
 Accuracy is very paramount if the data is to be used to create useful
information products. In order to choose between primary and secondary
data sources one should evaluate the accuracy of already available 5
secondary data. Field survey for accuarcy
Geoportal and its types
 A geoportal is a type of web portal used to find and access geographic
information via the Internet. Geoportals provide a single point of access
for searching and downloading GIS data from a multitude of sources.
They provide capabilities to query metadata records for relevant data and
services, and then link directly to the on-line content services
themselves.
 There are three basic types of spatial portal:
 Catalogue portals.
They create and maintain indexes or ‘catalogues’ of metadata that describe the
nature and location of resources
 Application portals
application portals provide more structured interfaces that include specific tools and
applications relevant to user’s domain interests
 Enterprise portals 6
Obtaining data from Geoportal
 Cost or requirements are related to:
 Availability
Open/free data is downloaded from geoportals free of charge. Whereas some data
requires payment in order to have access for downloading.
 Distribution costs
Some organizations do not transfer data through the internet. In such cases costs to
obtain data are incurred through postages and freight.
 Registration
Many sites require a user to sign up before having access to download data
 The methods of obtaining data include:
 Immediate download
 Ordered and sent via disk or tape
 Received from business partners of data catalogue
7
Geoportals in Pakistan (Limited)

8
Metadata
 It is important when searching for secondary data. Metadata can be used
to evaluate the usefulness of geospatial data obtained from internet. It
refers to Information about data. Metadata files include the following:
 general descriptions about the contents of the file
 Spatial reference information
 definitions for the various terms used to identify records (rows) and
fields (fields)
 the range of values for fields
 the quality or reliability of the data and measurements,
 how the data were collected, when the data were collected , who
collected the data

9
Remote Sensing As A Source Of Geographical Data
 In comparison to other methods such as field survey remote sensing
ensures rapid collection of up to date data over large geographical area.
 Also, remote sensing is the best data source to obtain data on regional
phenomenon such as geological structure and forests.
 Moreover, data is acquired by remotely operated sensors. Remote
sensing is the only practical way of obtaining data from inaccessible
regions such as the Antarctica and Amazon.
 Most of remote sensing data is acquired through space based satellite
platforms. Due to temporal resolution data for a certain location is
acquired on a regular basis.
 Remote sensing images are acquired over broad range of wavelengths
allowing detection of certain phenomenon that will not be possible with
the human eye. 10
Data Input Methods
 Data input methods refers to procedures that create digital spatial data. In
other words, it refers to various mechanisms on which digital
representation of spatial data such as roads, academic buildings and
parking lots on campus are created. The digital representation of this
spatial data is then stored in a GIS database

11
Digitizing
 Digitizing is the process where features on a map or image are converted
into digital format for use by a GIS. These are the primary methods for
digitizing spatial information:
 Manual Methods include: Tablet Digitizing , On screen digitizing
/Heads-up Digitizing ,
 An Automated Method includes: Scanning
 To ensure that maps are digitized most efficiently and accurately, 4 steps
need to be followed:
 Use good base maps (Reliable and current).
 Define your procedures (digitization, Naming).
 Prepare your maps (Minimize digitization frequency and edits).
 Digitize your maps.
12
Advantages and disadvantages of manual Digitizing
 The ability to correct errors or distortions in the original maps at the time
of data capture.
 Highly reliable human recognition of map objects.
 The ability to interpret ambiguous or incomplete information and select
the relevant required information at the time of data capture.
 Can be performed on inexpensive equipment; Requires little training.
 Manual digitizing labor intensive and therefore very time-consuming and
costly.
 The quality of results is highly dependent on the operator experience.
 The results may be inconsistent due to varying operator conditions,
stress, and fatigue.

13
Data Quality
  Cost of data is a reflection of that precision:
 Because lower-quality data tend to be cheaper and more available, a very
common problem in GIS is the inappropriate use of data A critical step
in developing a GIS is deciding “what is accurate enough?” This is
function of needs, cost, accessibility and time User needs determine
accuracy and, in general, accuracy determines price.
 What is accuracy? “the degree to which information on a map or in a
digital database matches true or accepted values.
  It is also a reflection of how close a measurement represent the actual
quantity measured .
 Accuracy is a reflection of the number and severity of errors in a dataset
or map.
14
Data Quality
 Quality is also a function of “precision”.
 Precision is the intensity or level of preciseness, or exactitude in
measurements. The more precise a measurement is, the smaller the unit
which you intend to measure
 Hence, a measurement down to a fraction of a cm is more precise than a
measurement to a cm
 However, data with a high level of precision can still be inaccurate—this
is due to errors
 Each application requires a different level of precision.
 Engineering and surveying applications typically require highly levels of
precision; they may be measuring to a millimeter

15
Data Quality
 Positional Accuracy and Precision
 One of the primary types of error in GIS is positional error—that is,
errors in 2D (x,y) and in the 3 rd dimension (height)
 Positional accuracy and precision are functions of the scale at which the
digital layer was created
 If created from digitizing a paper map, the minimum usable scale of the
digital layer is considered the scale of that map
 Scale is a function of the map’s resolution.
 Positional accuracy standards specify that acceptable positional error
varies with scale
 Data can have high level of precision but still be positionally inaccurate
 Positional error is inversely related to precision and to amount of
processing 16
Data Quality
 Measurement of Accuracy
 Accuracy is often stated as a confidence interval: e.g. 104.2 cm +/-.01
means true value lies between 104.21 and 104.19 .
 One of the key measurements of positional accuracy is root mean
squared error (MSE); equals squared difference between observed and
expected value for observation i divided by total number of
observations, summed across each observation i
 This is just a standardized measure of error—how close the predicted
measure is to observed 
 Different agencies have different standards for positional error Example:
USGS horizontal positional requirements state that 90% of all points
must be within 1/30th of an inch for maps at a scale of 1:20,000 or
larger, and 1/50th of an inch for maps at scales smaller than 1:20,000 17
Data Quality
 A critical point is to remember that “zooming” in a digital map does not
increase the level of accuracy
 The accuracy and precision are based on the scale of the digital layer’s
original parent source.
 Attribute accuracy and precision refer to quality of non-spatial, attribute
data
 Precision for numeric data means lots of digits
 Example: recording income down to cents, rather than just dollars
  Conceptual Accuracy Misclassification result from differences in
judgment or in the automated classification tools
 The accuracy of classifications will depend on the precision. The less
precise your classifications, the less likely there will be errors If just
classifying as “land and water”, that is not very precise, and not likely18to
result in an error .
Data Quality
 Logical Consistency Do data follow rules of logic?
 Attribute Example: is something classified as both water and as
commercially zoned land?
 Geospatial example: Do lines intersect when they should not?
 Completeness Is a data layer complete or lacking in coverage?
 Examples: does a layer on roads leave out some roads? If so, does it do so
systematically or randomly?
 Does a database of buildings in a city leave out some buildings?
 Conflation When one layer is better in one way and another is better in
another and you wish to get the best of both
 Way of reconciling best geometric and attribute features from two layers
into a new one.  
19
Data Source Errors
 Error can be systematic or random
 Systematic error can be rectified if discovered, because its source is
understood
 A common example is where an remote sensing instrument consistently
measures data erroneously because of bad calibration
 if the problem in calibration can be understood and accounted for, then
that error is called systematic Another example: projecting map data
using the wrong zone would result in consistently wrong data .
 Random error cannot be controlled for because its source is not
understood. Random errors are often introduced in little bits at each
stage of data collection and processing

20
Data Source Errors
 Error propagation and cascading
 These can accumulate and cascade through processing steps; each
succeeding layer that uses the erroneous processing method compounds
the error Propagation: where one error leads to another
 Cascading: Refers to when errors are allowed to propagate unchecked
from one layer to the next and on to the final set of products or
recommendations
 Cascading error can be managed to a certain extent by conducting
“sensitivity analysis” on different data layers to see how slight changes
in one or several layers would affect the final outcome or product
Cascading can occur with positional as well as with attribute errors.

21
Data Source Errors
 Field data: The skills and motivation of the surveyor influences the
errors in spatial data collected through a field survey. In addition, errors
in data collected from field survey can be caused by miscalibrated data
collection instruments. Data collection instruments should be calibrated
to ensure a standard scale of recordings.
 Remote sensing: Raw digital images usually contain geometric
distortions so significant that they cannot be used as maps. The source of
these distortions range from variations in the altitude, attitude, and
velocity of the sensor platform, to factors such as panoramic distortion,
earth curvature, atmospheric refraction, relief displacement, and
nonlinearities in the sweep of a sensor’s Instant Field of View.

22
Data Source Errors
 Remote sensing: Atmospheric constituents affect the electromagnetic
radiation that is recorded by sensors due to scattering and absorption.
Each sensor has detectors that are responsible for recording
electromagnetic radiation reflected by earth surface features. Remote
sensing data can have error due to malfunction of these detectors.
 The errors in remote sensing data are corrected various image pre-
processing procedures. Geo-rectification procedures correct geometric
errors and atmospheric correction and radiometric correction procedures
are meant to correct for defects due atmospheric effects and detector
malfunctioning respectively.

23
Data Input Errors
 Scanned maps and images can have errors that may arise due to
properties of the scanner and skills of the operating personnel. Size of a
scanner may have negative effects.
 In most cases inaccuracies during georeferencing are a result of human
errors. In order to accurately georeference an image or map the operator
should correctly identify control points.
 Attribute data errors are more difficult to identify than locational errors
as they are not apparent until later on in the data processing analysis.
Attribute data errors may include:
 Incorrect assignment of features unique identifiers.
 Missing data records or too many records.
 Missing attribute
 Incorrect attribute value 24
Representing Geographic Features:
How do we describe geographical features?
• by recognizing two types of data:
– Spatial data which describes location (where)
– Attribute data which specifies characteristics at that location
(what, how much, and when)
How do we represent these digitally in a GIS?
• by grouping into layers based on similar characteristics (e.g hydrography,
elevation, water lines, sewer lines, grocery sales) and using either:
– vector data model (coverage in ARC/INFO, shapefile in ArcView)
– raster data model (GRID or Image in ARC/INFO & ArcView)
• by selecting appropriate data properties for each layer with respect to:
– projection, scale, accuracy, and resolution
How do we incorporate into a computer application system?
• by using a relational Data Base Management System (DBMS)
GIS Data Structures
• Spatial data types and Attribute data types
• Relational database management systems (RDBMS): basic
concepts
• DBMS and Tables
• Relational DBMS
• raster data structures: • vector data structures:
represents geography via grid represents geography via coordinates
cells – whole polygon
– tessellations – point and polygon
– run length compression – node/arc/polygon
– quad tree representation – Tins
– BSQ/BIP/BIL
– File formats
– DBMS representation
– File formats
Spatial Data Types
• continuous: elevation, rainfall, ocean salinity
• areas:
unbounded: landuse, market areas, soils, rock type
bounded: city/county/state boundaries, ownership
parcels, zoning
moving: air masses, animal herds, schools of fish
• networks: roads, transmission lines, streams
• points:
–fixed: wells, street lamps, addresses
–moving: cars, fish, deer
Attribute data types
Categorical (name): Numerical
Known difference between values

– nominal – interval
• no inherent ordering • No natural zero
• can’t say ‘twice as much’
• land use types, county • temperature (Celsius or
names Fahrenheit)
– ordinal – ratio
• natural zero
• inherent order
• ratios make sense (e.g. twice as
• road class; stream class much)
• income, age, rainfall
• often coded to numbers
• may be expressed as integer
eg SSN but can’t do [whole number] or floating
arithmetic point [decimal fraction]
Attribute data tables can contain locational information, such as addresses or a list of X,Y coordinates.
ArcView refers to these as event tables. However, these must be converted to true spatial data (shape file), for
example by geocoding, before they can be displayed as a map.
Data Base Management Systems (DBMS)
Parcel Table
Parcel # Address Block $ Value
8 501 N Hi 1 105,450
9 590 N Hi 2 89,780
entity 36 1001 W. Main 4 101,500
75 1175 W. 1st 12 98,000

Key field Attribute

Contain Tables or feature classes in which:


– rows: entities, records, observations, features:
• ‘all’ information about one occurrence of a feature
– columns: attributes, fields, data elements, variables, items
(ArcInfo)
• one type of information for all features
The key field is an attribute whose values uniquely identify each row
Relational DBMS:
Tables are related, or joined, using a common record identifier
(column variable), present in both tables, called a secondary (or
foreign) key, which may or may not be the same as the key field.

Parcel Table Goal: produce map


Parcel # Address Block $ Value of values by district/
8 501 N Hi 1 105,450 neighborhood
9 590 N Hi 2 89,780 Problem: no district
36 1001 W. Main 4 101,500 code available in Parcel
75 1175 W. 1st 12 98,000 Table

Secondary or foreign key


Solution: join Parcel Table, Geography Table
containing values, with Block District Tract City
Geograpahy Table, containing 1 A 101 Dallas
location codings, using Block 2 B 101 Dallas
as key field 4 B 105 Dallas
12 E 202 Garland
Advantages/ Disadvantage of Raster Data Structures
 Simple data structures
 Overlay and combination of maps and remote sensed images easy
 Some spatial analysis methods simple to perform
 Simulation easy, because cells have the same size and shape
 Technology is cheap.
 The use of large cells to reduce data volumes means that
phenomenonologically recognizable structures can be lost and there can be
a serious loss of information
 Crude raster maps are considerably less beautiful than line maps
 Network linkages are difficult to establish
 Projection transformations are time consuming unless special algorithms
or hardware is used.
31
Advantages/Disadvantage of Vector Data Structures
 Good representation of phenomenonology
 Compact
 Topology can be completely described
 Accurate graphics
 Retrieval, updating and generalization of graphics and attributes possible.
 Complex Data Structures
 Combination of several vector polygon maps through overlay creates
difficulties
 Simulation is difficult because each unit has a different topological form
 Display and plotting can be expensive, particularly for high quality color
 The technology is expensive, particularly for the more sophisticated
software and hardware
 Spatial analysis and filtering within polygons are impossible 32
Reading Assignment No:1

 How is GIS database different from conventional database?


 Is there a difference between CAD and AutoCAD?

33

You might also like