Antoine Stevens(1), Marco Nocita(1,2), & Bas van
Wesemael(1)
ANALYSIS OF LARGE SCALE SOIL
SPECTRAL LIBRARIES
1 Georges Lemaître Centre for Earth and Climate Research, Earth and Life Institute, UCLouvain, Place Louis Pasteur, 3,
1348 Louvain-la-Neuve, Belgium
2 SOIL Action, Land Resource Management Unit, Institute for Environment and Sustainability, Joint Research Centre of the
European Commission, Via E. Fermi 2749, 21027 Ispra (VA), Italy
PART I:
Large scale soil spectral libraries
State of the Art
• Shepherd & Walsh (2002): 1,000 samples from
eastern and southern Africa (305 citations!)
• Brown et al (2006): 3,768 in US and 400 in the
rest of the world (top 10 in terms of citations!)
• ICRAF-ISRIC : 4,436 samples from 785 soil
profiles distributed across the five continents
• Viscarra Rossel & Webster (2012): 21,500
samples from 4,000 profiles in Australia
• Stevens et al. (2013): LUCAS database
containing 20,000 samples collected over 23
countries of the EU
Large Spectral Libraries: State of the Art
• Rapid Carbon Assessment (2013): 144,833 samples in
6,017 locations across conterminous US
• Africa Soil Information Service (2013): 17,000 so far
from 60 sentinel sites of 100 square km in sub-
Saharan Africa
• National/Regional Spectral libraries:
– France (Goge et al., 2012): 2,200 samples
– Denmark (Knadel et al., 2012): 2,851 samples
– Czech Republic (Brodsky et al., 2011): 500+ samples
– Florida (Vasques et al., 2010): 7,120 samples
– Many others ….
• Local spectral librairies and spectral librairies made
for a specific research objective: impossible to count!
Large Spectral Libraries: State of the Art
• Most of samples have been scanned with an
ASD
• Some of them are based on legacy soil
databases and others have been build from
scratch, on purpose
• Soil analytical measures have been obtained
with different methods
• Big spectral libraries are useful to build robust
predictions over large areas
Large Spectral Libraries: State of the Art
• Often RPD values are high (~2) for properties
having a direct link with the soil chromophores
• However, RMSE are often too high for most
applications:
– World: RMSE = 7.9-9.9 g C kg-1 for OC
– Europe: RMSE = 4-15 g C kg-1 for OC
– Florida: RMSE = 6-7 g C kg-1 for OC
• … compared to a SEL of 1-2 g C kg-1 (dry
combustion)
• So, what factors influence model performance
of BIG libraries ?
Large Spectral Libraries: Prediction Performance
1/ Laboratory conditions!
2/ Reference measurements
Brown et al., 2005
Ben Dor et al. (1999)
3/ Nature of soil spectra
Diff in albedo due to OM
OM
OM
H2O
H2O
Mineralogy
CaC03
Fe ox
3/ Nature of soil spectra
Fe ox
3/ Nature of soil spectra
Soil samples in the LUCAS database having 2 % C
3/ Nature of soil spectra
3/ Nature of soil spectra
Spectroscopic models relying on cross-correlation with other
properties will be highly unstable !
4/ A problem of sampling density?
Example for the LUCAS database: 250 spectral nearset
neighbours of a sample located in France
4/ A problem of sampling density?
Soil spectral library of the Walloon region (Genot et al 2011) :
Selecting neighbours with sufficient correlation
Genot et al. (2011)
Reported root mean square error (RMSE) of vis–NIR based predictions
against the standard deviation (of the soil attribute) in the validation sets.
4/ A problem of sampling density?
• Factors affecting model performance of large spectral
databases:
– Variations in measuring conditions within library
– Variations in soil analytical methods
– Complexity of the soil spectra-soil properties
relationship at large scale
– Low representativity of the soil diversity
• All these factors can be better controlled for small
scale databases !
• Is there any solutions?
(1) better protocols: garbage in, garbage out!
(2) appropriate data mining tools
(3) let’s share !
Part I : summary
PART II:
Modeling a complex soil spectral
Library
Predicting OC content in the LUCAS dataset
Collected in the framework of the Land Use/Cover Area frame Statistical Survey
under the supervision of the JRC to assess the state of soil across Europe.
Current status:
 23 European countries
Metadata:
 Clay, silt, sand, OC, pH, CEC,
CaCO3 content
 Geographical coordinates,
land use, etc
 ~20,000 spectral readings
in the vis-NIR region (400-
2500 nm)
Modeling a complex Soil Spectral Library
one of the largest, most diverse
and complete soil spectral library
Modeling a complex Soil Spectral Library
Spectrometer: FOSS XDS Rapid Content
Analyzer
Modeling a complex Soil Spectral Library
Description of the soil properties
Modeling a complex Soil Spectral Library
Loadings
Modeling a complex Soil Spectral Library
Scores of the three first PC’s in Europe
Modeling a complex Soil Spectral Library
Model performance as a function of the multivariate
calibration method
Modeling a complex Soil Spectral Library
Model performance as a function of the variables used
Modeling a complex Soil Spectral Library
Effect of sand
content
Here, we used measured sand content to improve prediction accuracy.
When not available, legacy data or digital soil maps could be used to
assign sand content ranges to the soil samples
Texture Land use Mineralogy
Modeling a complex Soil Spectral Library
Modeling a complex Soil Spectral Library
Predicted-observed plot
𝑅𝑀𝑆𝐸𝑃2 ≈ 𝑏𝑖𝑎𝑠2 + 𝑆𝐸𝑃−𝑏
2
Modeling a complex Soil Spectral Library
𝑋𝑟, 𝑌𝑟 = {𝑥𝑟𝑖 , 𝑦𝑟𝑖 }𝑖=1
𝑛
(spectral library)
𝑋𝑝, 𝑌𝑝 = {𝑥𝑝𝑖 , 𝑦𝑝𝑖 }𝑖=1
𝑚
(samples to predict)
1. for each sample to predict pi i = 1,2,..., m do
2. Compute di, the distance vector between xpi and Xr
3. Find the most similar samples in Xr as the k ones
minimizing di, i.e the k-nearest neighbours
4. [Optional] Assign weights to the k nearest
neighbours
5. Fit a multivariate model with the k nearest
neighbours
6. Choose the optimal model parameters for prediction
of pi, e.g. appropriate number of Latent Variable
(LV) for a PLSR model
7. Predict sample pi and compute squared error
8. end
Pseudo-code of a local regression algorithm:
Modeling a complex Soil Spectral Library
Local regression approach
Effect of combining spectral + covariate distance
without sand….
Modeling a complex Soil Spectral Library
with sand….
Effect of combining spectral + covariate distance
Modeling a complex Soil Spectral Library
Modeling a complex Soil Spectral Library
Model performance as a function of predictors
• The relationship between spectra and soil properties is
scale-dependent and inherently local
• Metadata are crucial to partition the data into sub-groups
where the relationship between spectra and soil properties
are less complex.
• Level of accuracy of the models may be acceptable for a
rough screening of the soil properties but still insufficient for
most applications and in particular the spatial or temporal
monitoring of SOC.
• Possible ways for improvement:
– Data mining tools should be developed that are capable of
identifying local patterns of spectral variations with the help of
readily available covariates linked with pedogenetic factors
such as mineralogy, climate and land cover.
– Local modeling approach
– Increase sampling density ?
Part II: Summary
Contact details
Antoine Stevens
Postdoctoral Researcher
Georges Lemaître Centre for Earth and Climate
Research Earth and Life Institute
UCLouvain
Place Pasteur, 3
1348 Louvain-La-Neuve, Belgium
antoine.stevens@uclouvain.be

Analysis of large scale soil spectral libraries

  • 1.
    Antoine Stevens(1), MarcoNocita(1,2), & Bas van Wesemael(1) ANALYSIS OF LARGE SCALE SOIL SPECTRAL LIBRARIES 1 Georges Lemaître Centre for Earth and Climate Research, Earth and Life Institute, UCLouvain, Place Louis Pasteur, 3, 1348 Louvain-la-Neuve, Belgium 2 SOIL Action, Land Resource Management Unit, Institute for Environment and Sustainability, Joint Research Centre of the European Commission, Via E. Fermi 2749, 21027 Ispra (VA), Italy
  • 2.
    PART I: Large scalesoil spectral libraries State of the Art
  • 3.
    • Shepherd &Walsh (2002): 1,000 samples from eastern and southern Africa (305 citations!) • Brown et al (2006): 3,768 in US and 400 in the rest of the world (top 10 in terms of citations!) • ICRAF-ISRIC : 4,436 samples from 785 soil profiles distributed across the five continents • Viscarra Rossel & Webster (2012): 21,500 samples from 4,000 profiles in Australia • Stevens et al. (2013): LUCAS database containing 20,000 samples collected over 23 countries of the EU Large Spectral Libraries: State of the Art
  • 4.
    • Rapid CarbonAssessment (2013): 144,833 samples in 6,017 locations across conterminous US • Africa Soil Information Service (2013): 17,000 so far from 60 sentinel sites of 100 square km in sub- Saharan Africa • National/Regional Spectral libraries: – France (Goge et al., 2012): 2,200 samples – Denmark (Knadel et al., 2012): 2,851 samples – Czech Republic (Brodsky et al., 2011): 500+ samples – Florida (Vasques et al., 2010): 7,120 samples – Many others …. • Local spectral librairies and spectral librairies made for a specific research objective: impossible to count! Large Spectral Libraries: State of the Art
  • 5.
    • Most ofsamples have been scanned with an ASD • Some of them are based on legacy soil databases and others have been build from scratch, on purpose • Soil analytical measures have been obtained with different methods • Big spectral libraries are useful to build robust predictions over large areas Large Spectral Libraries: State of the Art
  • 6.
    • Often RPDvalues are high (~2) for properties having a direct link with the soil chromophores • However, RMSE are often too high for most applications: – World: RMSE = 7.9-9.9 g C kg-1 for OC – Europe: RMSE = 4-15 g C kg-1 for OC – Florida: RMSE = 6-7 g C kg-1 for OC • … compared to a SEL of 1-2 g C kg-1 (dry combustion) • So, what factors influence model performance of BIG libraries ? Large Spectral Libraries: Prediction Performance
  • 7.
  • 8.
  • 9.
    Ben Dor etal. (1999) 3/ Nature of soil spectra
  • 10.
    Diff in albedodue to OM OM OM H2O H2O Mineralogy CaC03 Fe ox 3/ Nature of soil spectra Fe ox
  • 11.
    3/ Nature ofsoil spectra Soil samples in the LUCAS database having 2 % C
  • 12.
    3/ Nature ofsoil spectra
  • 13.
    3/ Nature ofsoil spectra Spectroscopic models relying on cross-correlation with other properties will be highly unstable !
  • 14.
    4/ A problemof sampling density? Example for the LUCAS database: 250 spectral nearset neighbours of a sample located in France
  • 15.
    4/ A problemof sampling density? Soil spectral library of the Walloon region (Genot et al 2011) : Selecting neighbours with sufficient correlation Genot et al. (2011)
  • 16.
    Reported root meansquare error (RMSE) of vis–NIR based predictions against the standard deviation (of the soil attribute) in the validation sets. 4/ A problem of sampling density?
  • 17.
    • Factors affectingmodel performance of large spectral databases: – Variations in measuring conditions within library – Variations in soil analytical methods – Complexity of the soil spectra-soil properties relationship at large scale – Low representativity of the soil diversity • All these factors can be better controlled for small scale databases ! • Is there any solutions? (1) better protocols: garbage in, garbage out! (2) appropriate data mining tools (3) let’s share ! Part I : summary
  • 18.
    PART II: Modeling acomplex soil spectral Library Predicting OC content in the LUCAS dataset
  • 19.
    Collected in theframework of the Land Use/Cover Area frame Statistical Survey under the supervision of the JRC to assess the state of soil across Europe. Current status:  23 European countries Metadata:  Clay, silt, sand, OC, pH, CEC, CaCO3 content  Geographical coordinates, land use, etc  ~20,000 spectral readings in the vis-NIR region (400- 2500 nm) Modeling a complex Soil Spectral Library one of the largest, most diverse and complete soil spectral library
  • 20.
    Modeling a complexSoil Spectral Library Spectrometer: FOSS XDS Rapid Content Analyzer
  • 21.
    Modeling a complexSoil Spectral Library Description of the soil properties
  • 22.
    Modeling a complexSoil Spectral Library Loadings
  • 23.
    Modeling a complexSoil Spectral Library Scores of the three first PC’s in Europe
  • 24.
    Modeling a complexSoil Spectral Library Model performance as a function of the multivariate calibration method
  • 25.
    Modeling a complexSoil Spectral Library Model performance as a function of the variables used
  • 26.
    Modeling a complexSoil Spectral Library Effect of sand content
  • 27.
    Here, we usedmeasured sand content to improve prediction accuracy. When not available, legacy data or digital soil maps could be used to assign sand content ranges to the soil samples Texture Land use Mineralogy Modeling a complex Soil Spectral Library
  • 28.
    Modeling a complexSoil Spectral Library Predicted-observed plot
  • 29.
    𝑅𝑀𝑆𝐸𝑃2 ≈ 𝑏𝑖𝑎𝑠2+ 𝑆𝐸𝑃−𝑏 2 Modeling a complex Soil Spectral Library
  • 30.
    𝑋𝑟, 𝑌𝑟 ={𝑥𝑟𝑖 , 𝑦𝑟𝑖 }𝑖=1 𝑛 (spectral library) 𝑋𝑝, 𝑌𝑝 = {𝑥𝑝𝑖 , 𝑦𝑝𝑖 }𝑖=1 𝑚 (samples to predict) 1. for each sample to predict pi i = 1,2,..., m do 2. Compute di, the distance vector between xpi and Xr 3. Find the most similar samples in Xr as the k ones minimizing di, i.e the k-nearest neighbours 4. [Optional] Assign weights to the k nearest neighbours 5. Fit a multivariate model with the k nearest neighbours 6. Choose the optimal model parameters for prediction of pi, e.g. appropriate number of Latent Variable (LV) for a PLSR model 7. Predict sample pi and compute squared error 8. end Pseudo-code of a local regression algorithm: Modeling a complex Soil Spectral Library Local regression approach
  • 31.
    Effect of combiningspectral + covariate distance without sand…. Modeling a complex Soil Spectral Library
  • 32.
    with sand…. Effect ofcombining spectral + covariate distance Modeling a complex Soil Spectral Library
  • 33.
    Modeling a complexSoil Spectral Library Model performance as a function of predictors
  • 34.
    • The relationshipbetween spectra and soil properties is scale-dependent and inherently local • Metadata are crucial to partition the data into sub-groups where the relationship between spectra and soil properties are less complex. • Level of accuracy of the models may be acceptable for a rough screening of the soil properties but still insufficient for most applications and in particular the spatial or temporal monitoring of SOC. • Possible ways for improvement: – Data mining tools should be developed that are capable of identifying local patterns of spectral variations with the help of readily available covariates linked with pedogenetic factors such as mineralogy, climate and land cover. – Local modeling approach – Increase sampling density ? Part II: Summary
  • 35.
    Contact details Antoine Stevens PostdoctoralResearcher Georges Lemaître Centre for Earth and Climate Research Earth and Life Institute UCLouvain Place Pasteur, 3 1348 Louvain-La-Neuve, Belgium [email protected]