4 Geospatial artificial intelligence (GeoAI) for remote sensing data

4.1 Background and Simple Examples of Machine Learning

Machine learning (ML) methods have gained popularity in geospatial research due to their ability to model nonlinear relationships from high-dimensional data. One recent development is GPBoost, which integrates tree boosting with Gaussian processes to model both fixed effects and spatial (or other structured) random effects.

4.1.1 Example 1: Land Cover Classification from Remote Sensing Imagery

Satellite images contain multispectral reflectance values across space. GPBoost can be used to classify land cover by combining decision tree boosting (e.g., LightGBM) with a Gaussian Process that accounts for spatial autocorrelation between nearby pixels.

4.1.2 Example 2: Spatial Prediction of Vegetation Index

Normalized Difference Vegetation Index (NDVI) values derived from remote sensing data often show spatial dependence. GPBoost allows prediction by combining covariates (e.g., elevation, temperature) with spatial random effects.

4.2 GeoAI for spatial prediction of future bushfire

4.2.1 Future climate data collection

Aim

We separately collected future climate datasets of temperature and precipitation for the time period 2021–2040 from WorldClim. The previous 2024 data will be used for model training, followed by predicting bushfire burn areas for the time period 2021–2040.

Description of steps

Download WorldClim future climate datasets of temperature and precipitation for the time period 2021–2040.
Preprocess the downloaded future climate data and organize it into vector format for subsequent predictions.

The data is downloaded from the WorldClim website and stored in the ./data/4. GeoAI Modeling/ directory, with period 2041-2060, GCM：ACCESS-CM2, scenarios:ssp585 (Links:https://www.worldclim.org/data/cmip6/cmip6_clim30s.html) .

4.2.2 GeoAI modelling

Aim

The step uses the gpboost model to fit temperature and precipitation data in 2024 for the next step of spatial prediction.

Description of steps

Load necessary libraries and data.
Build the GPBoost model using the data processed in Section 3.
You can also explore traditional machine learning models using the caret package.

Example: Machine learning using caret

Code

library(sf)

d24 = read_sf('./data/3. Spatial analysis/burnedarea_2024.shp')
names(d24) = c("pre","tem","burned","geometry")
d24
# Simple feature collection with 644 features and 3 fields
# Geometry type: POLYGON
# Dimension:     XY
# Bounding box:  xmin: 129 ymin: -14.9 xmax: 131 ymax: -13.3
# Geodetic CRS:  WGS 84
# # A tibble: 644 × 4
#     pre   tem burned                                        geometry
#   <dbl> <dbl>  <dbl>                                   <POLYGON [°]>
# 1  28.0 1893.   1250 ((130 -13.3, 130 -13.3, 130 -13.3, 130 -13.3, …
# 2  28.0 1856.   1529 ((130 -13.3, 130 -13.3, 130 -13.3, 130 -13.3, …
# 3  28.0 1888.   2203 ((130 -13.3, 130 -13.3, 130 -13.3, 130 -13.3, …
# 4  28.0 1884.   1289 ((131 -13.3, 131 -13.3, 131 -13.3, 131 -13.3, …
# 5  28.0 1865.   4247 ((131 -13.3, 131 -13.3, 131 -13.3, 131 -13.3, …
# 6  28.0 1873.   1991 ((131 -13.3, 131 -13.3, 131 -13.3, 131 -13.3, …
# # ℹ 638 more rows

library(caret)

# Prepare non-spatial data
df = sf::st_drop_geometry(d24)

# Train linear model with 5-fold CV
ctrl = trainControl(method = "cv", number = 5)

model_caret = train(
  burned ~ pre + tem, 
  data = df, 
  method = "lm", 
  trControl = ctrl
)

model_caret
# Linear Regression 
# 
# 644 samples
#   2 predictor
# 
# No pre-processing
# Resampling: Cross-Validated (5 fold) 
# Summary of sample sizes: 516, 516, 514, 515, 515 
# Resampling results:
# 
#   RMSE  Rsquared  MAE 
#   4648  0.0657    3922
# 
# Tuning parameter 'intercept' was held constant at a value of TRUE

See here for more details about the gpboost model on spatio-temporal data.

Code

library(sf)

d24 = read_sf('./data/3. Spatial analysis/burnedarea_2024.shp')
names(d24) = c("pre","tem","burned","geometry")
d24
# Simple feature collection with 644 features and 3 fields
# Geometry type: POLYGON
# Dimension:     XY
# Bounding box:  xmin: 129 ymin: -14.9 xmax: 131 ymax: -13.3
# Geodetic CRS:  WGS 84
# # A tibble: 644 × 4
#     pre   tem burned                                        geometry
#   <dbl> <dbl>  <dbl>                                   <POLYGON [°]>
# 1  28.0 1893.   1250 ((130 -13.3, 130 -13.3, 130 -13.3, 130 -13.3, …
# 2  28.0 1856.   1529 ((130 -13.3, 130 -13.3, 130 -13.3, 130 -13.3, …
# 3  28.0 1888.   2203 ((130 -13.3, 130 -13.3, 130 -13.3, 130 -13.3, …
# 4  28.0 1884.   1289 ((131 -13.3, 131 -13.3, 131 -13.3, 131 -13.3, …
# 5  28.0 1865.   4247 ((131 -13.3, 131 -13.3, 131 -13.3, 131 -13.3, …
# 6  28.0 1873.   1991 ((131 -13.3, 131 -13.3, 131 -13.3, 131 -13.3, …
# # ℹ 638 more rows

library(gpboost)
gp_model = GPModel(gp_coords = sdsfun::sf_coordinates(d24), 
                   cov_function = "exponential")
# Training
bst = gpboost(data = as.matrix(sf::st_drop_geometry(d24)[,1:2]), 
              label = as.matrix(sf::st_drop_geometry(d24)[,3,drop = FALSE]), 
              gp_model = gp_model, objective = "regression_l2", verbose = 0)
bst
# <gpb.Booster>
#   Public:
#     add_valid: function (data, name, valid_set_gp = NULL, use_gp_model_for_validation = TRUE) 
#     best_iter: -1
#     best_score: NA
#     current_iter: function () 
#     dump_model: function (num_iteration = NULL, feature_importance_type = 0L) 
#     eval: function (data, name, feval = NULL) 
#     eval_train: function (feval = NULL) 
#     eval_valid: function (feval = NULL) 
#     initialize: function (params = list(), train_set = NULL, modelfile = NULL, 
#     lower_bound: function () 
#     params: list
#     predict: function (data, start_iteration = NULL, num_iteration = NULL, 
#     raw: NA
#     record_evals: list
#     reset_parameter: function (params, ...) 
#     rollback_one_iter: function () 
#     save: function () 
#     save_model: function (filename, start_iteration = NULL, num_iteration = NULL, 
#     save_model_to_string: function (start_iteration = NULL, num_iteration = NULL, feature_importance_type = 0L, 
#     set_train_data_name: function (name) 
#     to_predictor: function () 
#     update: function (train_set = NULL, fobj = NULL) 
#     upper_bound: function () 
#   Private:
#     eval_names: NULL
#     finalize: function () 
#     fixed_effect_train_loaded_from_file: NULL
#     get_eval_info: function () 
#     gp_model: GPModel, R6
#     gp_model_prediction_data_loaded_from_file: FALSE
#     handle: gpb.Booster.handle
#     has_gp_model: TRUE
#     higher_better_inner_eval: NULL
#     init_predictor: NULL
#     inner_eval: function (data_name, data_idx, feval = NULL) 
#     inner_predict: function (idx) 
#     is_mean_scale_regression: FALSE
#     is_predicted_cur_iter: list
#     label_loaded_from_file: NULL
#     name_train_set: training
#     name_valid_sets: list
#     num_class: 1
#     num_dataset: 1
#     predict_buffer: list
#     residual_loaded_from_file: NULL
#     set_objective_to_none: FALSE
#     train_set: gpb.Dataset, R6
#     train_set_version: 1
#     use_gp_model_for_validation: TRUE
#     valid_sets: list
#     valid_sets_gp: list

4.2.3 Model validation

Why model validation?

Model validation ensures that your model performs well on unseen data and helps avoid overfitting.

Common validation methods

Train/test split
K-fold cross-validation
Leave-one-out cross-validation (LOOCV)

Common evaluation metrics

RMSE (Root Mean Squared Error)
MAE (Mean Absolute Error)
R² (Coefficient of determination)

Example: Random forest model with RMSE

Code

set.seed(123)
model_rf = train(
  burned ~ pre + tem, 
  data = df, 
  method = "rf", 
  trControl = ctrl,
  metric = "RMSE"
)
# note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .

model_rf$results
#   mtry RMSE Rsquared  MAE RMSESD RsquaredSD MAESD
# 1    2 4449    0.175 3545    193     0.0224   131

4.2.4 Spatial prediction

Aim

In this step, the gpboost model constructed in the previous step is used to predict the burned area of 2030.

Description of steps

Read the futural climate data.
Process the data to use the gpboost model to predict.
Do spatial prediction by the gpboost model.

Code

library(terra)

pref = terra::rast('./data/4. GeoAI Modeling/future_prec.tif')
pref
# class       : SpatRaster 
# size        : 193, 154, 12  (nrow, ncol, nlyr)
# resolution  : 0.00833, 0.00833  (x, y)
# extent      : 129, 131, -14.9, -13.3  (xmin, xmax, ymin, ymax)
# coord. ref. : lon/lat WGS 84 (EPSG:4326) 
# source      : future_prec.tif 
# names       : wc2.1~ec_01, wc2.1~ec_02, wc2.1~ec_03, wc2.1~ec_04, wc2.1~ec_05, wc2.1~ec_06, ...
pref = terra::app(pref,fun = "sum",na.rm = TRUE)
names(pref) = "pre"
pref
# class       : SpatRaster 
# size        : 193, 154, 1  (nrow, ncol, nlyr)
# resolution  : 0.00833, 0.00833  (x, y)
# extent      : 129, 131, -14.9, -13.3  (xmin, xmax, ymin, ymax)
# coord. ref. : lon/lat WGS 84 (EPSG:4326) 
# source(s)   : memory
# name        :  pre 
# min value   : 1109 
# max value   : 1750

temf = terra::rast('./data/4. GeoAI Modeling/future_tmax.tif')
temf
# class       : SpatRaster 
# size        : 193, 154, 12  (nrow, ncol, nlyr)
# resolution  : 0.00833, 0.00833  (x, y)
# extent      : 129, 131, -14.9, -13.3  (xmin, xmax, ymin, ymax)
# coord. ref. : lon/lat WGS 84 (EPSG:4326) 
# source      : future_tmax.tif 
# names       : wc2.1~ax_01, wc2.1~ax_02, wc2.1~ax_03, wc2.1~ax_04, wc2.1~ax_05, wc2.1~ax_06, ...
temf = terra::app(temf,fun = "mean",na.rm = TRUE)
names(temf) = "tem"
temf
# class       : SpatRaster 
# size        : 193, 154, 1  (nrow, ncol, nlyr)
# resolution  : 0.00833, 0.00833  (x, y)
# extent      : 129, 131, -14.9, -13.3  (xmin, xmax, ymin, ymax)
# coord. ref. : lon/lat WGS 84 (EPSG:4326) 
# source(s)   : memory
# name        :  tem 
# min value   : 33.9 
# max value   : 37.4

d30 = c(pref,temf)
d30 = d30 |> 
  terra::as.polygons(aggregate = FALSE) |> 
  sf::st_as_sf() |> 
  dplyr::filter(dplyr::if_all(1:2,\(.x) !is.na(.x)))
d30
# Simple feature collection with 16430 features and 2 fields
# Geometry type: POLYGON
# Dimension:     XY
# Bounding box:  xmin: 129 ymin: -14.9 xmax: 131 ymax: -13.3
# Geodetic CRS:  WGS 84
# First 10 features:
#     pre  tem                       geometry
# 1  1739 34.6 POLYGON ((130 -13.3, 130 -1...
# 2  1735 34.6 POLYGON ((130 -13.3, 130 -1...
# 3  1745 34.6 POLYGON ((130 -13.3, 130 -1...
# 4  1733 34.6 POLYGON ((130 -13.3, 130 -1...
# 5  1732 34.6 POLYGON ((130 -13.3, 130 -1...
# 6  1731 34.7 POLYGON ((130 -13.3, 130 -1...
# 7  1725 34.8 POLYGON ((130 -13.3, 130 -1...
# 8  1750 34.8 POLYGON ((130 -13.3, 130 -1...
# 9  1741 34.8 POLYGON ((130 -13.3, 130 -1...
# 10 1725 34.7 POLYGON ((130 -13.3, 130 -1...

pred = predict(bst, data = as.matrix(sf::st_drop_geometry(d30)[,1:2]), 
               gp_coords_pred = sdsfun::sf_coordinates(d30), 
               predict_var = TRUE, pred_latent = FALSE)

d30$burned = pred$response_mean
d30
# Simple feature collection with 16430 features and 3 fields
# Geometry type: POLYGON
# Dimension:     XY
# Bounding box:  xmin: 129 ymin: -14.9 xmax: 131 ymax: -13.3
# Geodetic CRS:  WGS 84
# First 10 features:
#     pre  tem                       geometry burned
# 1  1739 34.6 POLYGON ((130 -13.3, 130 -1...   3307
# 2  1735 34.6 POLYGON ((130 -13.3, 130 -1...   3270
# 3  1745 34.6 POLYGON ((130 -13.3, 130 -1...   3095
# 4  1733 34.6 POLYGON ((130 -13.3, 130 -1...   3089
# 5  1732 34.6 POLYGON ((130 -13.3, 130 -1...   3078
# 6  1731 34.7 POLYGON ((130 -13.3, 130 -1...   3055
# 7  1725 34.8 POLYGON ((130 -13.3, 130 -1...   3017
# 8  1750 34.8 POLYGON ((130 -13.3, 130 -1...   2874
# 9  1741 34.8 POLYGON ((130 -13.3, 130 -1...   2905
# 10 1725 34.7 POLYGON ((130 -13.3, 130 -1...   2929

4.3 Analysing future bushfire patterns

Aim

In this step, the predicted burned area of 2030 is used to analyse the future bushfire patterns.

Description of steps

Visualise the predicted burned area of 2030.
Analyse the future bushfire patterns.