From Pixels to Predictions: Leveraging Geospatial Data for Smart Analysis
R&D

From Pixels to Predictions: Leveraging Geospatial Data for Smart Analysis

Listen to the article 22 min
Analysing maps, satellite imagery, and spatial datasets requires integrating diverse data into a structured form. This data must then be prepared for accurate spatial computation and analysis.

In this article, we will show how GDAL (Geospatial Data Abstraction Library), the spatial databases PostGIS and/or SQL Server Spatial, and NetTopologySuite make complex geospatial analysis easier through data engineering.

Artificial intelligence
Key takeaways
  • GDAL transforms maps and satellite imagery into structured, georeferenced numerical data that can be used for spatial analysis and visualisation.
  • Temporal analysis and structured workflows turn geospatial data into actionable insights for urban planning, traffic modelling, and strategic decision-making.

Why spatial data analysis is challenging

Imagine you are a constructor and need to analyse the evolution of a city over the next several years. You examine a map, study heavy traffic, evaluate the number of schools, and analyse the population by age. You must process vast amounts of information from Excel or other datasets and visualise it on a map.

This is one example where the superpowers of GDAL are revealed. You can take a snapshot of a map—whether it is a raster file (.tiff) or a vector file (.shp, .kml)—and transform objects on the map into shapes with borders and coordinates, turning a nearly impossible task into a mathematical problem.

Real-world example: analysing traffic patterns

Let us see how spatial data can be used to answer practical questions. Given the number of schools and registered people under the age of 15, we might want to estimate traffic from 8 a.m. to 10 a.m. or calculate other relevant statistics.

Using PostGIS or MS SQL, which have the power to store, calculate, and process spatial data, we can efficiently obtain answers to such questions. Large-scale computation often benefits from cloud computing, especially when datasets are too big to handle locally.

Processing this information requires significant GPU resources to perform complex calculations quickly, but all the data can be transformed into a browser-friendly format. GDAL itself follows a logic similar to that of LLMs: simplify, transform, and interpret.

  • Simplify: To identify objects, you can use a grey-coloured image.
  • Transform: Cut data into tiles or quadkeys, which specify visualised chunks of a map.
  • Interpret: Detect the needed objects on the prepared image.

If you are a .NET developer and need to process data, NetTopologySuite comes in handy. It provides access to many types of GIS data, enables spatial querying, and renders high-quality maps.

Understanding coordinate systems and projections

Projections and datums

In the context of geography and mapping, the terms "projection" and "datum" refer to various aspects of the coordinate system used to represent the Earth’s surface.

  • A projection is a mathematical method for representing the curved surface of the Earth on a flat map. There are many different map projections, each with its own set of strengths and limitations. Some common map projections include the Mercator projection and the Robinson projection.
  • A datum, on the other hand, is a reference frame or set of parameters used to define the coordinates of a map projection. A datum consists of an origin point, a set of orientation parameters, and a set of scale factors. The choice of datum can affect the accuracy and precision of a map, as different datums may result in slight shifts or deformations in the map's coordinate.

WGS84/EPSG:4326 coordinates

WGS84 (World Geodetic System 1984) comprises a standard coordinate frame for the Earth and a datum/reference ellipsoid for raw altitude data. Mostly, when referencing WGS84, it refers to the ellipsoid only. EPSG 4326 defines a full coordinate reference system, providing spatial meaning to otherwise meaningless pairs of numbers. It specifies "latitude and longitude coordinates on the WGS84 reference ellipsoid.

Working with geospatial data formats: raster and vector data

When working with different datasets, we must ensure that projections are the same, so it is crucial to keep track of the datum specified in raster and vector datasets.

cloud computing services
A GIS interface showing coordinate system settings from epsg.io

A GIS interface showing coordinate system settings. Source: epsg.io

One of the well-known challenges of working with projections is remembering the antimeridian, or 180th meridian. 

World map highlighting the Antimeridian (±180°) from Wikipedia

World map highlighting the Antimeridian (±180°). Source: Wikipedia

If your city is located in that area, calculations applicable to London or New York will not work. The reason lies in the projection, since it involves both the -180 and 180 degree meridians, and even finding the centre of the vector or raster data will show a point near the London meridian line, not on that meridian ((-180 + 180)/2 = 0).

Depending on the scale of your calculations, you might prefer to use *.tiff files or *.shp files.

One of the key issues when handling these files is their size. Larger files provide more detailed data on a map and cover a larger-scale area. Each file contains its own datum and reference system.

Raster data represent information at the pixel level, so they might be stored as a huge matrix of data under the hood.

Yes, that's an excellent and complete caption!

Pixel-based raster data structure. Source: NEON

Whereas vector data are a set of points, lines, or polygons with attributes attached to them.

Vector data types: points, lines, and polygons from NEON

Vector data types: points, lines, and polygons. Source: NEON

Handling large vector and raster geospatial datasets

  • Vector datasets are a set of figures. Chopping them into subsets that you can handle without issues is the key to achieving fast and efficient results.
  • Raster datasets require different techniques to provide efficient results. Like vector datasets, you can chop your file into sub-datasets, but in both cases, the number of transformed files can work against you.

By dividing data into separate files, you can process them in parallel and avoid file locking or long processing times. Just remember to dispose of files properly when they are no longer needed. Even with an efficient approach, when working with large datasets, you may encounter many Out of Memory exceptions while trying to process the data, so be patient.

Pyramid computation allows you to provide results suitable for certain zoom levels, as shown in the picture below.

Pyramid computation illustration

Pyramid computation illustration. Source: Taylor and Francis

Temporal analysis and spatial data management

1. Temporal analysis with satellite imagery

By collecting data files over a period, you can analyse and adjust plans for building roads or schools. Iteratively refining these plans using new tools and models enables better analysis and provides more options for solving urban issues. This approach helps balance the needs of stakeholders while proactively avoiding potential risks.

Temporal satellite analysis. Source: archdaily

2. Viewing spatial data in databases

Here is an example of how you can see different areas while using SQL Server:

Spatial dataset view within SQL server from learn.microsoft.com

Spatial dataset view within SQL server. Source: Microsoft Learn

3. Methods to manipulate spatial data

Here is a set of methods you can use to manipulate data:

  • Property methods: Functions to retrieve information about a spatial object.
  • STArea(): Returns the area of a geometry or geography instance.
  • STLength(): Returns the length of a geometry or geography instance.
  • STNumPoints(): Returns the number of points in the instance.
  • STX(), STY(): Returns the X and Y coordinates of a Point instance.
  • STSrid(): Returns the Spatial Reference Identifier.
  • Relationship and comparison methods: Functions to determine the relationship between two spatial objects.
  • STIntersects(): Checks if two spatial objects intersect.
  • STDistance(): Returns the shortest distance between two spatial objects.
  • STEquals(): Checks if two spatial objects are spatially equal.
  • STWithin(): Checks if one spatial object is entirely within another.
  • STContains(): Checks if one spatial object contains another.
  • Manipulation and transformation methods: Functions to modify or transform spatial objects.
  • STIntersection(): Returns a spatial object representing the intersection of two spatial objects.
  • STUnion(): Returns a spatial object representing the union of two spatial objects.
  • STDifference(): Returns a spatial object representing the difference between two spatial objects.
  • STBuffer(): Returns a spatial object representing the buffer area around an instance.

Tools for geospatial data processing

Not only should the information be read properly, but also it needs to be transformed in a way you can use ordinary tools to work with it.

1. GDAL

GDAL is a translator library for raster and vector geospatial data formats that is released under an MIT-style open-source licence by the Open-Source Geospatial Foundation. As a library, it presents a single raster abstract data model and a single vector abstract data model to the calling application for all supported formats. It also comes with a variety of useful command-line utilities for data translation and processing.

Cutting raster tiff into tiles with GDAL

Cutting raster tiff into tiles with GDAL

It reads the input raster and applies options like latitude/longitude range box, conversion, compression details, etc., and then creates a new file with the given set of parameters. With the polygonise option, it transforms data into polygons described before. Now when you read data and normalise it, you can either save data to the database or keep it in files and do the needed manipulations.

2. NetTopologySuite

NetTopologySuite is a direct port of all the functionalities offered by JTS Topology Suite: NTS exposes JTS in a '.NET way', for example, using properties, indexers, etc. A .NET GIS solution that is fast and reliable for the .NET platform.

It works with the JSON data format, so it is convenient to read processed data by the GDAL library and to make manipulations with spatial data like those that SQL Server can do. It can intersect data, evaluate distance and customise given geometry.

There are also some recommendations to work with geometry.

  • Minimise the amount of needed processed data
  • Use the RTree algorithm for indexing and clustering objects, or use existing algorithms provided by the library NetTopologySuite for that purpose.
  • Use caching functionality to provide the best reading experience with complex objects.

3. Browser-friendly map data formats

To visualise objects on a map, you can use .png images or work with .json or .kml data formats. The browser itself has its own limitations, so to provide the best user experience, you need to be aware of these limitations. deck.gl is a GPU-powered framework for visual exploratory data analysis of large datasets, and it allows you to efficiently render and explore multiple layers of geospatial data.

The browser has its limitations; for the best experience, do not use more than a dozen layers and not more than ~2MB of payload size.

Visualisation of GDAL virtual file system using deck.gl

Visualisation of GDAL virtual file system using deck.gl

Orchestration of batch GDAL jobs: automating geospatial data processing

Initially, GDAL was used to be stored locally or on the servers. In its documentation you can traverse through prefixes to the file path that describe in what way (zipped or not) you are sending a file.

Virtual filesystem abstraction layer in GDAL

Virtual filesystem abstraction layer in GDAL

Nevertheless, nowadays a lot of people are using cloud technologies, and GDAL supports them too.

A generic /vsicurl/ file system handler exists for online resources that do not require particular signed authentication schemes. It is specialised into sub-filesystems for commercial cloud storage services, such as /vsis3/, /vsigs/, /vsiaz/, /vsioss/ or /vsiswift/.

So, it will fit in your architecture when you decide to have a containerised solution in the cloud or any other approach. Be aware that processing takes time for bigger files, so it is better to keep it preprocessed upfront to avoid any delays and cache needed responses to make reading more pleasant.

Summary

When properly engineered, geospatial data unlocks insights for businesses and urban planning. Although geospatial workflows involve many technical constraints, strong engineering practices keep this complexity out of sight, allowing companies to focus on better decision-making and improved user experiences.

Being aware of working with spatial analysis can drastically transform analytics, provide a global vision and build multiple hypotheses simultaneously and test multiple scenarios in minutes.

Cloud migration
UX consulting
Skip the section

FAQs

What is a geospatial dataset?

A geospatial dataset is a structured collection of data that includes detailed location information, often represented as coordinates, maps, or geometric shapes tied to real-world positions, and can include additional attributes describing those locations.

What is geospatial computing?
Talk to experts
Listen to the article 14 min
From Pixels to Predictions: Leveraging Geospatial Data for Smart AnalysisFrom Pixels to Predictions: Leveraging Geospatial Data for Smart Analysis
From Pixels to Predictions: Leveraging Geospatial Data for Smart Analysis
From Pixels to Predictions: Leveraging Geospatial Data for Smart Analysis
0:00 0:00
Speed
1x
Skip the section
Contact Us
  • This field is for validation purposes and should be left unchanged.
  • We need your name to know how to address you
  • We need your phone number to reach you with response to your request
  • We need your country of business to know from what office to contact you
  • We need your company name to know your background and how we can use our experience to help you
  • Accepted file types: jpg, gif, png, pdf, doc, docx, xls, xlsx, ppt, pptx, Max. file size: 10 MB.
(jpg, gif, png, pdf, doc, docx, xls, xlsx, ppt, pptx, PNG)

We will add your info to our CRM for contacting you regarding your request. For more info please consult our privacy policy

What our customers say

The breadth of knowledge and understanding that ELEKS has within its walls allows us to leverage that expertise to make superior deliverables for our customers. When you work with ELEKS, you are working with the top 1% of the aptitude and engineering excellence of the whole country.

sam fleming
Sam Fleming
President, Fleming-AOD

Right from the start, we really liked ELEKS’ commitment and engagement. They came to us with their best people to try to understand our context, our business idea, and developed the first prototype with us. They were very professional and very customer oriented. I think, without ELEKS it probably would not have been possible to have such a successful product in such a short period of time.

Caroline Aumeran
Caroline Aumeran
Head of Product Development, appygas

ELEKS has been involved in the development of a number of our consumer-facing websites and mobile applications that allow our customers to easily track their shipments, get the information they need as well as stay in touch with us. We’ve appreciated the level of ELEKS’ expertise, responsiveness and attention to details.

samer-min
Samer Awajan
CTO, Aramex