Digital Soil Mapping /
Pedometrics
3 November 2020
Kostiantyn Viatkin
FAO
Global Soil Partnership
What is
Digital Soil Mapping (DSM) /
Pedometrics?
Soil mapping – creation of cartographic models of soils and
soil properties
Pedometrics – application of mathematical and statistical
methods for the study of the distribution and genesis of soils
Soil mapping – creation of cartographic models of soils and
soil properties
Pedometrics – application of mathematical and statistical
methods for the study of the distribution and genesis of soils
Digital Soil Mapping (pedometric mapping, predictive soil
mapping) is data-driven generation of soil property and
class maps that is based on use of statistical methods
(T. Hengl, 2003)
Soil mapping – creation of cartographic models of soils and
soil properties
DSM
Spatial phenomena
Spatial phenomena can generally be thought of as either :
● discrete objects with clear boundaries (e.g. river, road,
town)
● or as a continuous phenomenon that can be observed
everywhere, but does not have natural boundaries
(e.g. elevation, temperature, and air quality).
A lake is a discrete
object:
it has clear boundaries
Elevation is a continuous
phenomena: it exists in every point
Spatial phenomena
Question:
● Is soil a discrete or continuous
phenomenon?
● Are soil properties (carbon, texture, pH,
etc...) discrete or continuous?
● Are soil measurements discrete or
continuous?
Spatial phenomena
Answers:
● Soil is a continuous
phenomenon. It covers
almost all land surface.
● However we classify soils
into discrete soil types
that have boundaries
Spatial phenomena
Answers:
● Most of the soil properties
are continuous. They exist
in every point of the soil.
● Soil measurements are
discrete. We sample soil
and measure soil properties
in discrete locations.
We have:
● Discrete soil observations as point data in sampling locations
We need:
● Continuous estimation of soil properties in every point of
the land surface
Task of the predictive soil mapping:
● To predict continuous soil properties (or soil classes) in every
point of the land surface based on discrete measurements of
soil sampling.
But how to do it?
Digital Mapping of Soil Properties
Vector Soil Maps
Gridded maps (rasters)
1 2
3 1
• Gridded maps (rasters) are optimal to represent
continuous data, such as soil properties.
• A raster is a regular grid of cells (pixels) with a
value of soil property in each cell.
• Raster resolution defines cell size and accuracy
• Higher resolution = smaller cell size = better
accuracy = bigger files = higher computational
demands
• E.g. cell size 1x1 km – it is good for global and
national mapping.
• Most common raster data format:
GeoTiff (.tif, .tiff);
How to create a digital maps
of soils properties?
Soil Property Data
V. Dokuchaev (1883):
Genesis and evolution of soils is
the result of the interaction of a
number of environmental
parameters:
● Climate
● Organisms
● Parent Material
● Relief
● Time
Drivers of Soil Formation
State equation of soil formation
Hans JENNY (1941):
● Conceptualization of soil as an state equation of soil
formation.
● Soil and soil properties are a function of a number of
environmental parameters named soil forming factors:
S = ƒ(cl, o, r, p, t) CLORPT MODEL
climate organisms relief parent
material
time
Digital Soil Mapping (DSM)
Definition of Digital Soil Mapping (DSM)
The creation and population of spatial soil information systems by numerical
models inferring the spatial and temporal variations of soil types and soil properties
from soil observations and knowledge and from related environmental variables
(Lagacherie and McBratney, 2007).
McBRATNEY, 2012: Conceptualization of forming factors. Soil and
soil properties are a function of a number of environmental
parameters named soil forming factors:
S = ƒ(s, c, o, r, p, a, n)+ ε SCORPAN MODEL
reliefsoil
properties
climate organisms parent
material
age locationsoil attribute to predict function residuals (errors)
Digital Soil Mapping Workflow
Source : McBratney, 2015
Digital Soil Mapping Workflow
Soil Data
Predictors
How to do it on practice?
R – a powerful and versatile tool for DSM
• R is a programming language which
allows everyone to develop scripts
with maximum flexibility;
• It is free, enables the development
of Science even for budget limited
organizations;
• Full access to algorithms;
• Possibility to modify existing
functions and packages;
• Developed by a huge community of
experts in many different fields;
• More than 10,000 R packages
available for download;
• Lot’s of free learning material
R packages (examples)
aqp
● Algorithms for quantitative pedology
● We will use it to restructure our soil dataset into
a soil profile collection
GSIF ● Tools, functions and sample datasets for digital
soil mapping, e.g. depth harmonization
raster ● Reading, writing, manipulating, analyzing and
modeling of gridded spatial data (raster data)
rgdal ● Provides access to the 'Geospatial' Data
Abstraction Library ('GDAL') to
projection/transformation operations from the
'PROJ.4'
soilassesment ● Functions used in digital mapping of soil
properties
Digital Soil Mapping in R
• Relief
• Climate
• Vegetation
• Geology
• Remote
sensing data
1. Prepare Predictors
Digital Soil Mapping in R
2. Harmonize Predictors
Removing collinearity using Principal Components Analysis
Digital Soil Mapping in R
3. Prepare soil data
• Identifiers
• Coordinates
• Depth layers
(horizons)
• Measured soil
properties
Digital Soil Mapping in R
3. Prepare soil data
See Lesson 2 – Data Organization and Software installation
Digital Soil Mapping in R
4. Harmonize soil data
● Profile data has soil parameters
measured for every horizon (depth
layer)
● We need to estimate mean value for
target depth: e.g. 0-30cm, 30-100cm
● For that we can use equal-area
splines. This technique is based on
fitting continuous depth functions for
modeling the variability of soil
properties with depth.
Depth harmonization
Digital Soil Mapping in R
4. Harmonize soil data
Statistical distribution
● Normal distribution, also known as the Gaussian
distribution, is a probability distribution that is
symmetric about the mean, showing that data
near the mean are more frequent in occurrence
than data far from the mean.
The Normal Distribution has:
● mean = median = mode
● symmetry about the center
● 50% of values less than the mean and 50% greater
than the mean
In statistical analyses it is often assumed that the data has normal
distribution. If it does not, it may be useful to transform the data.
Digital Soil Mapping in R
5. Overlay soil data and predictors
Digital Soil Mapping in R
6. Select a regression model for mapping
Regression
● Regression a statistical method that allows us to
summarize and study relationships between two
variables:
● variable X, is regarded as the predictor,
explanatory, or independent variable.
● variable Y, is regarded as the response,
outcome, or dependent variable.
The goal is to build a mathematical formula that defines
Y as a function of the X variable.
Once, we built a statistically significant model, it’s
possible to use it for predicting future outcome on the
basis of new X values.
Example: linear regression
The mathematical formula of the linear regression can be written as follow:
● the best-fit regression line is in blue
● the intercept (b0) and the slope (b1) are
shown in green
● the residuals (errors) - e are represented
by vertical red lines
Multiple linear regression can have several predictors (X variables):
Regression
However, most relationships in nature are not linear!
Relationships between soil
properties and environmental
factors can be very complicated
and require a complex model.
Digital Soil Mapping in R
8. Validate the map
Calculate map quality measures
Digital Soil Mapping in R
9. Estimate uncertainty
● Uncertainty is an acknowledgement of error: we
are aware that our representation of reality may
differ from reality and express this by being
uncertain
● In the presence of uncertainty, we cannot identify
a single, true values for each pixel of the map.
● But we can identify all possible values and a
probability for each one - to characterise the
uncertain variable with a probability distribution.
● If the distribution is normal, it is easy to construct a
confidence interval, where e.g. we are certain
with 95% confidence that the true value will be
within 2 standard deviations from the mean
(prediction)
Digital Soil Mapping in R
9. Create uncertainty maps
If the distribution is not
normal, confidence interval
still can be constructed
through bootstrapping.
95% confidence interval:
We are certain that the 95%
of the unknown values lie
within the prediction width.
Thank you for your
attention!

Digital Soil Mapping/ Pedomterics

  • 1.
    Digital Soil Mapping/ Pedometrics 3 November 2020 Kostiantyn Viatkin FAO Global Soil Partnership
  • 2.
    What is Digital SoilMapping (DSM) / Pedometrics?
  • 3.
    Soil mapping –creation of cartographic models of soils and soil properties
  • 4.
    Pedometrics – applicationof mathematical and statistical methods for the study of the distribution and genesis of soils Soil mapping – creation of cartographic models of soils and soil properties
  • 5.
    Pedometrics – applicationof mathematical and statistical methods for the study of the distribution and genesis of soils Digital Soil Mapping (pedometric mapping, predictive soil mapping) is data-driven generation of soil property and class maps that is based on use of statistical methods (T. Hengl, 2003) Soil mapping – creation of cartographic models of soils and soil properties DSM
  • 6.
    Spatial phenomena Spatial phenomenacan generally be thought of as either : ● discrete objects with clear boundaries (e.g. river, road, town) ● or as a continuous phenomenon that can be observed everywhere, but does not have natural boundaries (e.g. elevation, temperature, and air quality). A lake is a discrete object: it has clear boundaries Elevation is a continuous phenomena: it exists in every point
  • 7.
    Spatial phenomena Question: ● Issoil a discrete or continuous phenomenon? ● Are soil properties (carbon, texture, pH, etc...) discrete or continuous? ● Are soil measurements discrete or continuous?
  • 8.
    Spatial phenomena Answers: ● Soilis a continuous phenomenon. It covers almost all land surface. ● However we classify soils into discrete soil types that have boundaries
  • 9.
    Spatial phenomena Answers: ● Mostof the soil properties are continuous. They exist in every point of the soil. ● Soil measurements are discrete. We sample soil and measure soil properties in discrete locations.
  • 10.
    We have: ● Discretesoil observations as point data in sampling locations We need: ● Continuous estimation of soil properties in every point of the land surface Task of the predictive soil mapping: ● To predict continuous soil properties (or soil classes) in every point of the land surface based on discrete measurements of soil sampling. But how to do it? Digital Mapping of Soil Properties
  • 11.
  • 12.
    Gridded maps (rasters) 12 3 1 • Gridded maps (rasters) are optimal to represent continuous data, such as soil properties. • A raster is a regular grid of cells (pixels) with a value of soil property in each cell. • Raster resolution defines cell size and accuracy • Higher resolution = smaller cell size = better accuracy = bigger files = higher computational demands • E.g. cell size 1x1 km – it is good for global and national mapping. • Most common raster data format: GeoTiff (.tif, .tiff);
  • 13.
    How to createa digital maps of soils properties?
  • 14.
  • 15.
    V. Dokuchaev (1883): Genesisand evolution of soils is the result of the interaction of a number of environmental parameters: ● Climate ● Organisms ● Parent Material ● Relief ● Time Drivers of Soil Formation
  • 16.
    State equation ofsoil formation Hans JENNY (1941): ● Conceptualization of soil as an state equation of soil formation. ● Soil and soil properties are a function of a number of environmental parameters named soil forming factors: S = ƒ(cl, o, r, p, t) CLORPT MODEL climate organisms relief parent material time
  • 17.
    Digital Soil Mapping(DSM) Definition of Digital Soil Mapping (DSM) The creation and population of spatial soil information systems by numerical models inferring the spatial and temporal variations of soil types and soil properties from soil observations and knowledge and from related environmental variables (Lagacherie and McBratney, 2007). McBRATNEY, 2012: Conceptualization of forming factors. Soil and soil properties are a function of a number of environmental parameters named soil forming factors: S = ƒ(s, c, o, r, p, a, n)+ ε SCORPAN MODEL reliefsoil properties climate organisms parent material age locationsoil attribute to predict function residuals (errors)
  • 18.
    Digital Soil MappingWorkflow Source : McBratney, 2015
  • 19.
    Digital Soil MappingWorkflow Soil Data Predictors
  • 20.
    How to doit on practice?
  • 21.
    R – apowerful and versatile tool for DSM • R is a programming language which allows everyone to develop scripts with maximum flexibility; • It is free, enables the development of Science even for budget limited organizations; • Full access to algorithms; • Possibility to modify existing functions and packages; • Developed by a huge community of experts in many different fields; • More than 10,000 R packages available for download; • Lot’s of free learning material
  • 22.
    R packages (examples) aqp ●Algorithms for quantitative pedology ● We will use it to restructure our soil dataset into a soil profile collection GSIF ● Tools, functions and sample datasets for digital soil mapping, e.g. depth harmonization raster ● Reading, writing, manipulating, analyzing and modeling of gridded spatial data (raster data) rgdal ● Provides access to the 'Geospatial' Data Abstraction Library ('GDAL') to projection/transformation operations from the 'PROJ.4' soilassesment ● Functions used in digital mapping of soil properties
  • 23.
    Digital Soil Mappingin R • Relief • Climate • Vegetation • Geology • Remote sensing data 1. Prepare Predictors
  • 24.
    Digital Soil Mappingin R 2. Harmonize Predictors Removing collinearity using Principal Components Analysis
  • 25.
    Digital Soil Mappingin R 3. Prepare soil data • Identifiers • Coordinates • Depth layers (horizons) • Measured soil properties
  • 26.
    Digital Soil Mappingin R 3. Prepare soil data See Lesson 2 – Data Organization and Software installation
  • 27.
    Digital Soil Mappingin R 4. Harmonize soil data ● Profile data has soil parameters measured for every horizon (depth layer) ● We need to estimate mean value for target depth: e.g. 0-30cm, 30-100cm ● For that we can use equal-area splines. This technique is based on fitting continuous depth functions for modeling the variability of soil properties with depth. Depth harmonization
  • 28.
    Digital Soil Mappingin R 4. Harmonize soil data Statistical distribution ● Normal distribution, also known as the Gaussian distribution, is a probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. The Normal Distribution has: ● mean = median = mode ● symmetry about the center ● 50% of values less than the mean and 50% greater than the mean In statistical analyses it is often assumed that the data has normal distribution. If it does not, it may be useful to transform the data.
  • 29.
    Digital Soil Mappingin R 5. Overlay soil data and predictors
  • 30.
    Digital Soil Mappingin R 6. Select a regression model for mapping
  • 31.
    Regression ● Regression astatistical method that allows us to summarize and study relationships between two variables: ● variable X, is regarded as the predictor, explanatory, or independent variable. ● variable Y, is regarded as the response, outcome, or dependent variable. The goal is to build a mathematical formula that defines Y as a function of the X variable. Once, we built a statistically significant model, it’s possible to use it for predicting future outcome on the basis of new X values.
  • 32.
    Example: linear regression Themathematical formula of the linear regression can be written as follow: ● the best-fit regression line is in blue ● the intercept (b0) and the slope (b1) are shown in green ● the residuals (errors) - e are represented by vertical red lines Multiple linear regression can have several predictors (X variables):
  • 33.
    Regression However, most relationshipsin nature are not linear! Relationships between soil properties and environmental factors can be very complicated and require a complex model.
  • 34.
    Digital Soil Mappingin R 8. Validate the map Calculate map quality measures
  • 35.
    Digital Soil Mappingin R 9. Estimate uncertainty ● Uncertainty is an acknowledgement of error: we are aware that our representation of reality may differ from reality and express this by being uncertain ● In the presence of uncertainty, we cannot identify a single, true values for each pixel of the map. ● But we can identify all possible values and a probability for each one - to characterise the uncertain variable with a probability distribution. ● If the distribution is normal, it is easy to construct a confidence interval, where e.g. we are certain with 95% confidence that the true value will be within 2 standard deviations from the mean (prediction)
  • 36.
    Digital Soil Mappingin R 9. Create uncertainty maps If the distribution is not normal, confidence interval still can be constructed through bootstrapping. 95% confidence interval: We are certain that the 95% of the unknown values lie within the prediction width.
  • 38.
    Thank you foryour attention!