High-Resolution Satellite Imagery Analysis For Terrain and Surface Data Extraction Techniques and Applications
High-Resolution Satellite Imagery Analysis For Terrain and Surface Data Extraction Techniques and Applications
Pimpri Chinchwad College of Engineering (PCCOE), Pune, India. Aug 18-19, 2023
and Applications
1st Sameeksha Sukate 2nd Sakshi Wajage 3rd Yogesh Verma
Computer Science and Engineering Computer Science and Engineering Computer Science and Engineering
MIT ADT University MIT ADT University MIT ADT University
Pune, India Pune, India Pune, India
[email protected] [email protected] [email protected]
2
Authorized licensed use limited to: Somaiya University. Downloaded on April 17,2025 at 17:20:24 UTC from IEEE Xplore. Restrictions apply.
In recent years, remote sensing technology has made it pos- for the handling of large datasets and GPU computation.
sible to extract valuable information from satellite images. One Google Earth Engine (GEE) allows individuals to visualize
such application is the generation of digital elevation models and examine satellite images of Earth for various purposes,
(DEMs) using satellite images and Shuttle Radar Topography including remote sensing - [4].
Mission (SRTM) DEM data - [1]. As show in fig-3, we used
the SWISSIMAGE - [3] and SRTM data catalog, both based on
the area of Switzerland, from Google Earth Engine to generate
images that we used to train our U-Net model. The U-Net
model was able to learn from these images and accurately
extract DEM information. The SWISSIMAGE catalog consists
of a mosaic of orthophotos with a ground resolution of 10
cm in flat regions, created from new color digital images
that encompass the entirety of Switzerland in flat areas and
main alpine valleys, and 25 cm in the Alps. These images
are updated every 3 years and are provided by Topography
Swisstopo. Orthophotos are distortion-free images with a uni-
form scale throughout, and they have numerous applications
in fields such as geology, forestry, environmental protection, Fig. 4. SWISSIMAGE and SRTM datasets, 2000 images each, Switzerland.
spatial planning, and natural hazards. SWISSIMAGE provides
the latest orthophotos by default, while older photos are stored It is a Google initiative aimed at archiving all spatial
and easily accessible. This allows for a detailed and up-to- images captured by satellites for research purposes. It has
date background map for various purposes. Along with this, been collecting such data since 1972 and was made available
we have a Shuttle radar topography mission (SRTM) a global for public research in 2008. The Earth Engine utilizes cloud
project that provides us with a high-resolution digital elevation technology to store massive datasets that necessitate substan-
model (DEM) of the Earth, which is of high quality and tial computational resources to handle vast geospatial data.
available for free. The project was a collaboration between The Earth Engine’s data catalog stores and pre-processes vast
the National Aeronautics and Space Administration (NASA), datasets. Moreover, the datasets have pre-built code snippets
the National Geospatial-Intelligence Agency (NGA), and the that users can utilize in the Earth Engine’s code editor to
space agencies of Germany and Italy. The SRTM technology explore and analyze the data. Additionally, the GEE offers an
was used to create the high-resolution digital-topographic application programming interface (API) for prototyping and
radar data of the Earth, which was used to generate the first visualizing the results. Google Colab is a cloud-based platform
worldwide set of Earth’s topographic data. Additionally, the that provides free access to a Jupyter Notebook environment
SRTM data is used to fill in areas with missing data to provide for writing and executing Python code. - [5]. It provides a
accurate digital elevation data. The SRTM was mounted on virtual machine with access to powerful GPUs, TPUs, and
a space shuttle and used remote sensing to acquire Earth’s other resources, enabling users to train machine learning
surface data, which was then converted into height data and models, perform data analysis, and more. Google Colab also
used to generate a three-dimensional map of larger areas of the supports popular libraries like TensorFlow and PyTorch. The
Earth. This high-resolution digital topography map can be used project’s success was demonstrated by the accuracy of the
in various fields and is publicly available. With the assistance extracted DEM, which was verified against reference data. The
of Google Collab’s Earth-Engine library, we have modified results showed that the project achieved high accuracy and
the SRTM data into 9 distinct colours in our project. The demonstrated the power of using Google Earth Engine and
SWISSIMAGE dataset comprises a total of 2000 images with Google Colab for large-scale satellite image analysis projects.
various tile sizes, covering over 70 percent of Switzerland’s The combination of these tools offers a cost-effective and
total area. To correspond with these 2000 images, we have efficient solution for researchers and data scientists working
generated 2000 masks (SRTM data) of the corresponding on similar projects.
Swiss Image using Google Earth-Engine library. These masks
are known as DEM tiles and are displayed in fig-4 Google B. U-Net
Earth Engine and Google Colab are powerful tools that enable The U-Net model architecture, introduced in 2015 by the
data scientists and researchers to work on large-scale satellite Computer Science Department of the University of Freiburg,
imagery analysis projects. In this project, Google Earth Engine Germany - [2], is characterized by its ”U” shape. Primarily
was used to collect the data, and SWISSIMAGE and SRTM designed for biomedical image segmentation, U-Net employs
data catalogs were utilized to extract information. The data an encoder-decoder architecture with skip connections. The
from Google Earth Engine was pre-processed on Google encoder section of U-Net compresses the spatial dimensions
Colab, which enabled the team to write code and perform and augments the channel count in each layer, while the
necessary tasks on the data. Google Colab was chosen due to decoder section expands the spatial dimensions and reduces
its ability to leverage Google’s infrastructure, which allowed the channel count in each layer. Neural Networks can “forget”
3
Authorized licensed use limited to: Somaiya University. Downloaded on April 17,2025 at 17:20:24 UTC from IEEE Xplore. Restrictions apply.
certain features during the training process, and skip connec- C. CNN (Convolutional Nural Network)
tions are used in U-Net to make sure details are not lost. CNNs (Convolutional Neural Networks) have a broad range
Unlike CNN, we do not have fully connected layers in the of uses, from voice recognition and facial recognition to image
end, since the desired output should be of the desired size. processing and nearly all computer vision tasks [8]. These
The encoder section, also referred to as the contracting path, networks function by mapping features, using a technique
consists of conventional convolutional layers. On the other that involves backpropagating results through various layers,
hand, the decoder section, also known as the expanding path, including pooling layers, convolutional layers, and fully con-
is comprised of transposed 2D convolutional layers, which nected layers. The three key ideas of CNN as identified by
are commonly known as deconvolutional layers. By layer- Goodfellow et al. [7] One approach involves utilizing small
wise concatenation of the encoder’s output with the decoder’s feature detectors to identify edges within an image, particularly
input, every deconvolutional layer in the decoder is connected in cases where the image is large, such as satellite imagery.
to a convolutional layer of an encoder. During training, the This technique, referred to as ”sparse interactions,” aims to
model learns to segment the input image into distinct regions, overcome the challenge of detecting edges in such images.
such as terrain and non-terrain areas, based on the learned Another strategy is to implement ”parameter sharing,” which
features. This makes the U-Net architecture an effective tool helps manage the number of weights required for processing
for DEM extraction from satellite images. In the context of an input image. By reducing the number of parameters, this
DEM extraction from satellite images, the U-Net architecture technique can significantly decrease the computational power
can be used to identify and extract the elevation information needed. Lastly, “Equivariant representation” implies that ob-
of the terrain from the input image. The encoder part of ject detection remains unaffected by variations in illumination
the network can be trained to identify various features of and changes in position, while the internal representation of
the image, such as vegetation, water bodies, and buildings, the detected object remains equivariant to such alterations.
that can affect the elevation of the terrain. The decoder part By utilizing these points, we can overcome the challenge of
of the network can then use the features identified by the processing large satellite images that are difficult to manage
encoder to reconstruct the elevation map of the terrain. By in their original dimensions. This operation involves only
incorporating skip connections between the encoder and de- a minimal number of parameters, streamlining the training
coder sections, the U-Net architecture effectively captures both process and accelerating the network’s performance. CNN
types of information. These connections enable the decoder utilizes multiple convolutional layers to extract characteristics
to access the features extracted by the encoder at various from input images. Each convolutional layer incorporates a
spatial resolutions, facilitating the preservation of both local collection of adaptable filters that are employed on localized
and global information in the ultimate output. To summarize, sections of the input image. Through convolution, these filters
the U-Net architecture stands as a robust and adaptable CNN interact with the input image, generating a collection of
architecture, well-suited for satellite image-based Digital El- feature maps that encapsulate various attributes of the original
evation Model (DEM) extraction. Its capacity to handle both image. Following each convolutional layer, a pooling layer
local and global information within the input image renders it is employed to downsize the feature maps while retaining
suitable for a diverse array of image segmentation tasks. With vital information. This serves to enhance network efficiency
its high accuracy and efficiency, the U-Net architecture has and prevent overfitting. The ultimate outcome of a CNN is
the potential to revolutionize the field of remote sensing and typically directed through one or more fully connected layers,
satellite image analysis. which employ the features derived from the convolutional
layers to classify the input image. CNNs have exhibited
remarkable success in a broad spectrum of applications, in-
cluding image classification, object detection, and image seg-
mentation. They have revolutionized the domain of computer
vision, establishing themselves as an indispensable tool across
various domains of machine learning and artificial intelligence.
To summarize, convolutional neural networks (CNNs) have
emerged as a critical component in the realm of deep learning,
particularly within the field of computer vision. By effectively
extracting image features and discerning patterns, CNNs have
found applications ranging from object recognition to medical
image analysis. However, there is still much to be explored
and improved upon in the realm of CNNs, such as optimizing
the architecture and improving interpretability.
D. Flow Chart
Fig. 5. U-Net for DEM extraction via satellite imagery: a model architecture
In Fig-6, we propose a flowchart for the extraction of the
Digital Elevation Model (DEM) from satellite images using
4
Authorized licensed use limited to: Somaiya University. Downloaded on April 17,2025 at 17:20:24 UTC from IEEE Xplore. Restrictions apply.
UNET. The first step in our proposed flowchart is to perform IV. RESULTS
preprocessing on Google Earth Engine to reduce the number The application of U-net in the extraction of digital elevation
of labels in the mask image from 256 labels to 9 labels. Next, models (DEM) from a dataset of 2000 satellite images has
a dataset consisting of 2000 images and their corresponding produced encouraging results. Through the use of pixel-level
masks is extracted from the preprocessed data. The dataset classification, U-net was able to accurately identify and differ-
is initially divided into training, validation, and test sets. entiate object borders, resulting in the successful extraction of
Subsequently, the UNET model undergoes training using the elevation data from the images. The encoder-decoder architec-
training set and evaluation using the validation set. Finally, ture of U-net effectively utilized the feature maps generated
the test set is utilized to generate the outcome. This proposed by the encoder network to produce highly detailed outputs,
flowchart presents a sequential guide for extracting Digital leading to a high degree of accuracy in the resulting DEM.
Elevation Models (DEMs) from satellite images employing Overall, the application of U-net in the processing of a large
UNET. In the last stage of the flowchart, the result is produced, dataset of satellite imagery has demonstrated its potential as a
encompassing the extracted DEM from the satellite images. reliable and effective tool for DEM extraction. In this project, a
This outcome is then compared to the ground truth DEM to U-Net model was trained on a dataset of 2000 satellite images
assess the accuracy of the model. The accompanying research and their corresponding digital elevation models (DEMs) to
paper will furnish a comprehensive explanation of each step
depicted in the flowchart, along with the methodology em-
ployed for DEM extraction from satellite images using UNET.
5
Authorized licensed use limited to: Somaiya University. Downloaded on April 17,2025 at 17:20:24 UTC from IEEE Xplore. Restrictions apply.
elevation data from satellite imagery. However, these methods
can be time-consuming and expensive. Nevertheless, the use of
advanced techniques, such as U-Net, has shown great potential
for improving the accuracy and efficiency of DEM extraction
from satellite imagery. U-Net has shown great potential in
identifying and segmenting terrain features from the back-
ground, which is a critical step in DEM extraction. The derived
Digital Elevation Models (DEMs) possess versatile utility in
a range of applications, encompassing environmental monitor-
ing, land use planning, and hazard mitigation. Therefore, the
continued development and use of advanced techniques for
DEM extraction from satellite imagery is essential for improv-
ing our understanding of the Earth’s surface and for making
informed decisions about its management and conservation.
The current processing time for extracting DEMs using U-
Net models can be quite time-consuming, especially for large-
scale projects. Future research can focus on developing faster
processing methods to allow for real-time DEM extraction.
This would open up a range of new applications, such as
monitoring of changes in terrain in real time. DEM extraction
using U-Net models can be combined with other technologies
to create more powerful applications. For example, the inte-
gration of DEMs with Geographic Information Systems (GIS)
could enable the creation of highly accurate and detailed maps.
Overall, the extraction of DEMs from satellite imagery using
U-Net models is a rapidly evolving field with many exciting
future possibilities.
R EFERENCES
[1] Jarvis, A., H.I. Reuter, A. Nelson, E. Guevara. 2008. Hole-filled SRTM
Fig. 8. Graphs showing U-Net’s accuracy and loss in DEM extraction from for the globe Version 4, available from the CGIAR-CSI SRTM 90m
satellites. Database: https://srtm.csi.cgiar.org.
[2] Ronneberger, O., Fischer, P. and Brox, T. 2015. U-Net: Convolutional
Networks for Biomedical Image Segmentation. Medical Image Comput-
ing and Computer-Assisted Intervention – MICCAI 2015 (Cham, 2015),
training loss and validation loss exhibit a consistent decline, 234–241
signifying the model’s progressive reduction in prediction er- [3] Swisstopo SWISSIMAGE10cm. Available online:
rors. The UNET model attained a training accuracy of 0.90 and https://www.swisstopo.admin.ch/en/geodata/images/-
ortho/swissimage10.html (accessed on 26 April 2023).
a validation accuracy of 0.89. Furthermore, the training loss [4] Noel Gorelick, Matt Hancher, Mike Dixon, Simon Ilyushchenko,
was measured at 0.92, while the validation loss was recorded David Thau, Rebecca Moore, Google Earth Engine: Planetary-
at 0.922. These outcomes serve as evidence that the UNET scale geospatial analysis for everyone, Remote Sensing of
Environment, Volume 202, 2017, Pages 18-27, ISSN0034-4257,
model demonstrates a high degree of accuracy in accurately https://doi.org/10.1016/j.rse.2017.06.031.
predicting Digital Elevation Model (DEM) contour maps based [5] Google Colaboratory [Computer software]. (2023). Retrieved from
on satellite images. We evaluated our approach on a dataset https://colab.research.google.com/
[6] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, ”Gradient-based learning
of satellite images and compared the resulting DEMs to those applied to document recognition,” in Proceedings of the IEEE, vol. 86,
obtained using traditional methods. Our experimental results no. 11, pp. 2278-2324, Nov. 1998, doi: 10.1109/5.726791.
demonstrate that the UNET-based approach can produce high- [7] Goodfellow, I., Bengio, Y., Courville, A. (2016). Deep learning (Vol. 1).
MIT Press.
quality DEMs that are comparable to those obtained using [8] Cohen, C.J., 2000. Early history of remote sensing, in: Proceedings 29th
traditional methods. Applied Imagery Pattern Recognition Workshop. Washington, DC, USA,
pp. 3–9. doi:10.1109/AIPRW.2000.953595
V. CONCLUSION AND FUTURE SCOPE
In conclusion, DEM extraction from satellite imagery is
a crucial process for creating highly accurate digital eleva-
tion models that can be used in a variety of applications.
Advances in satellite technology have made it possible to
collect vast amounts of high-resolution imagery, making it
easier to create detailed DEMs. Various methods, such as
stereo-photogrammetry and LiDAR, have been used to extract
6
Authorized licensed use limited to: Somaiya University. Downloaded on April 17,2025 at 17:20:24 UTC from IEEE Xplore. Restrictions apply.