4.1 Background and Simple Examples of Machine Learning
Machine learning (ML) methods have gained popularity in geospatial research due to their ability to model nonlinear relationships from high-dimensional data. One recent development is GPBoost, which integrates tree boosting with Gaussian processes to model both fixed effects and spatial (or other structured) random effects.
4.1.1 Example 1: Land Cover Classification from Remote Sensing Imagery
Satellite images contain multispectral reflectance values across space. GPBoost can be used to classify land cover by combining decision tree boosting (e.g., LightGBM) with a Gaussian Process that accounts for spatial autocorrelation between nearby pixels.
4.1.2 Example 2: Spatial Prediction of Vegetation Index
Normalized Difference Vegetation Index (NDVI) values derived from remote sensing data often show spatial dependence. GPBoost allows prediction by combining covariates (e.g., elevation, temperature) with spatial random effects.
4.2 GeoAI for spatial prediction of future bushfire
4.2.1 Future climate data collection
Aim
We separately collected future climate datasets of temperature and precipitation for the time period 2021–2040 from WorldClim. The previous 2024 data will be used for model training, followed by predicting bushfire burn areas for the time period 2021–2040.
Description of steps
Download WorldClim future climate datasets of temperature and precipitation for the time period 2021–2040.
Preprocess the downloaded future climate data and organize it into vector format for subsequent predictions.
The data is downloaded from the WorldClim website and stored in the ./data/4. GeoAI Modeling/ directory, with period 2041-2060, GCM:ACCESS-CM2, scenarios:ssp585 (Links:https://www.worldclim.org/data/cmip6/cmip6_clim30s.html) .
4.2.2 GeoAI modelling
Aim
The step uses the gpboost model to fit temperature and precipitation data in 2024 for the next step of spatial prediction.
Description of steps
Load necessary libraries and data.
Build the GPBoost model using the data processed in Section 3.
You can also explore traditional machine learning models using the caret package.
Example: Machine learning using caret
Code
library(sf)d24=read_sf('./data/3. Spatial analysis/burnedarea_2024.shp')names(d24)=c("pre","tem","burned","geometry")d24# Simple feature collection with 644 features and 3 fields# Geometry type: POLYGON# Dimension: XY# Bounding box: xmin: 129 ymin: -14.9 xmax: 131 ymax: -13.3# Geodetic CRS: WGS 84# # A tibble: 644 × 4# pre tem burned geometry# <dbl> <dbl> <dbl> <POLYGON [°]># 1 28.0 1893. 1250 ((130 -13.3, 130 -13.3, 130 -13.3, 130 -13.3, …# 2 28.0 1856. 1529 ((130 -13.3, 130 -13.3, 130 -13.3, 130 -13.3, …# 3 28.0 1888. 2203 ((130 -13.3, 130 -13.3, 130 -13.3, 130 -13.3, …# 4 28.0 1884. 1289 ((131 -13.3, 131 -13.3, 131 -13.3, 131 -13.3, …# 5 28.0 1865. 4247 ((131 -13.3, 131 -13.3, 131 -13.3, 131 -13.3, …# 6 28.0 1873. 1991 ((131 -13.3, 131 -13.3, 131 -13.3, 131 -13.3, …# # ℹ 638 more rowslibrary(caret)# Prepare non-spatial datadf=sf::st_drop_geometry(d24)# Train linear model with 5-fold CVctrl=trainControl(method ="cv", number =5)model_caret=train(burned~pre+tem, data =df, method ="lm", trControl =ctrl)model_caret# Linear Regression # # 644 samples# 2 predictor# # No pre-processing# Resampling: Cross-Validated (5 fold) # Summary of sample sizes: 516, 516, 514, 515, 515 # Resampling results:# # RMSE Rsquared MAE # 4648 0.0657 3922# # Tuning parameter 'intercept' was held constant at a value of TRUE
See here for more details about the gpboost model on spatio-temporal data.
# Geospatial artificial intelligence (GeoAI) for remote sensing data## Background and Simple Examples of Machine LearningMachine learning (ML) methods have gained popularity in geospatial research due to their ability to model nonlinear relationships from high-dimensional data. One recent development is **GPBoost**, which integrates tree boosting with Gaussian processes to model both fixed effects and spatial (or other structured) random effects.### Example 1: Land Cover Classification from Remote Sensing ImagerySatellite images contain multispectral reflectance values across space. GPBoost can be used to classify land cover by combining decision tree boosting (e.g., LightGBM) with a Gaussian Process that accounts for spatial autocorrelation between nearby pixels.### Example 2: Spatial Prediction of Vegetation IndexNormalized Difference Vegetation Index (NDVI) values derived from remote sensing data often show spatial dependence. GPBoost allows prediction by combining covariates (e.g., elevation, temperature) with spatial random effects.## GeoAI for spatial prediction of future bushfire ### Future climate data collection::: {.callout-tip title="Aim"}We separately collected future climate datasets of temperature and precipitation for the time period 2021–2040 from WorldClim. The previous 2024 data will be used for model training, followed by predicting bushfire burn areas for the time period 2021–2040.:::::: {.callout-caution title="Description of steps"}1. Download WorldClim future climate datasets of temperature and precipitation for the time period 2021–2040.2. Preprocess the downloaded future climate data and organize it into vector format for subsequent predictions.:::The data is downloaded from the WorldClim website and stored in the `./data/4. GeoAI Modeling/` directory,with period 2041-2060, GCM:ACCESS-CM2, scenarios:ssp585 (Links:<https://www.worldclim.org/data/cmip6/cmip6_clim30s.html>) .### GeoAI modelling::: {.callout-tip title="Aim"}The step uses the gpboost model to fit temperature and precipitation data in 2024 for the next step of spatial prediction.:::::: {.callout-caution title="Description of steps"}1. Load necessary libraries and data.2. Build the GPBoost model using the data processed in Section 3.3. You can also explore traditional machine learning models using the [`caret`](https://topepo.github.io/caret/) package.:::Example: Machine learning using caret```{r , cache = FALSE, message = FALSE, echo=!knitr::is_latex_output()}#| code-fold: truelibrary(sf)d24 =read_sf('./data/3. Spatial analysis/burnedarea_2024.shp')names(d24) =c("pre","tem","burned","geometry")d24library(caret)# Prepare non-spatial datadf = sf::st_drop_geometry(d24)# Train linear model with 5-fold CVctrl =trainControl(method ="cv", number =5)model_caret =train( burned ~ pre + tem, data = df, method ="lm", trControl = ctrl)model_caret```See [here](https://towardsdatascience.com/tree-boosting-for-spatial-data-789145d6d97d/?sk=4f9924d378dbb517e883fc9c612c34f1) for more details about the gpboost model on spatio-temporal data.```{r , cache = FALSE, message = FALSE, echo=!knitr::is_latex_output()}#| code-fold: truelibrary(sf)d24 =read_sf('./data/3. Spatial analysis/burnedarea_2024.shp')names(d24) =c("pre","tem","burned","geometry")d24library(gpboost)gp_model =GPModel(gp_coords = sdsfun::sf_coordinates(d24), cov_function ="exponential")# Trainingbst =gpboost(data =as.matrix(sf::st_drop_geometry(d24)[,1:2]), label =as.matrix(sf::st_drop_geometry(d24)[,3,drop =FALSE]), gp_model = gp_model, objective ="regression_l2", verbose =0)bst```### Model validation::: {.callout-note title="Why model validation?"}Model validation ensures that your model performs well on unseen data and helps avoid overfitting.:::#### Common validation methods- **Train/test split**- **K-fold cross-validation**- **Leave-one-out cross-validation (LOOCV)**#### Common evaluation metrics- **RMSE** (Root Mean Squared Error) - **MAE** (Mean Absolute Error) - **R²** (Coefficient of determination)#### Example: Random forest model with RMSE```{r , cache = FALSE, message = FALSE, echo=!knitr::is_latex_output()}#| code-fold: trueset.seed(123)model_rf =train( burned ~ pre + tem, data = df, method ="rf", trControl = ctrl,metric ="RMSE")model_rf$results```### Spatial prediction::: {.callout-tip title="Aim"}In this step, the gpboost model constructed in the previous step is used to predict the burned area of 2030.:::::: {.callout-caution title="Description of steps"}1. Read the futural climate data.2. Process the data to use the gpboost model to predict.3. Do spatial prediction by the gpboost model.:::```{r , cache = FALSE, message = FALSE, echo=!knitr::is_latex_output()}#| code-fold: truelibrary(terra)pref = terra::rast('./data/4. GeoAI Modeling/future_prec.tif')prefpref = terra::app(pref,fun ="sum",na.rm =TRUE)names(pref) ="pre"preftemf = terra::rast('./data/4. GeoAI Modeling/future_tmax.tif')temftemf = terra::app(temf,fun ="mean",na.rm =TRUE)names(temf) ="tem"temfd30 =c(pref,temf)d30 = d30 |> terra::as.polygons(aggregate =FALSE) |> sf::st_as_sf() |> dplyr::filter(dplyr::if_all(1:2,\(.x) !is.na(.x)))d30pred =predict(bst, data =as.matrix(sf::st_drop_geometry(d30)[,1:2]), gp_coords_pred = sdsfun::sf_coordinates(d30), predict_var =TRUE, pred_latent =FALSE)d30$burned = pred$response_meand30```## Analysing future bushfire patterns::: {.callout-tip title="Aim"}In this step, the predicted burned area of 2030 is used to analyse the future bushfire patterns.:::::: {.callout-caution title="Description of steps"}1. Visualise the predicted burned area of 2030.2. Analyse the future bushfire patterns.:::