Skip to contents

Function for determining the optimal spatial data discretization based on SPADE q-statistics.

Usage

cpsd_disc(
  formula,
  data,
  wt,
  discnum = 3:22,
  discmethod = "quantile",
  strategy = 2L,
  increase_rate = 0.05,
  cores = 1,
  return_disc = TRUE,
  seed = 123456789,
  ...
)

Arguments

formula

A formula of optimal spatial data discretization.

data

A data.frame or tibble of observation data.

wt

The spatial weight matrix.

discnum

(optional) A vector of number of classes for discretization. Default is 3:22.

discmethod

(optional) The discretization methods. Default all use quantile. Noted that robust will use robust_disc(); rpart will use rpart_disc(); Others use st_unidisc(). You can try unidisc_methods().

strategy

(optional) Discretization strategy. When strategy is 1L, choose the highest SPADE model q-statistics to determinate optimal spatial data discretization parameters. When strategy is 2L, The optimal discrete parameters of spatial data are selected by combining LOESS model.

increase_rate

(optional) The critical increase rate of the number of discretization. Default is 5%.

cores

(optional) A positive integer(default is 1). If cores > 1, a 'parallel' package cluster with that many cores is created and used. You can also supply a cluster object.

return_disc

(optional) Whether or not return discretized result used the optimal parameter. Default is TRUE.

seed

(optional) Random seed number, default is 123456789.Setting random seed is useful when the sample size is greater than 3000(the default value for largeN) and the data is discretized by sampling 10%(the default value for samp_prop in st_unidisc()).

...

(optional) Other arguments passed to st_unidisc(),robust_disc() or rpart_disc().

Value

A list with the optimal parameter in the provided parameter combination with k, method and disc(when return_disc is TRUE).

x

discretization variable name

k

optimal number of spatial data discreteization

method

optimal spatial data discretization method

disc

the result of optimal spatial data discretization

Note

When the discmethod is configured to robust, it will operate at a significantly reduced speed. Consequently, the use of robust discretization is not advised.

References

Yongze Song & Peng Wu (2021) An interactive detector for spatial associations, International Journal of Geographical Information Science, 35:8, 1676-1701, DOI:10.1080/13658816.2021.1882680

Author

Wenbo Lv lyu.geosocial@gmail.com

Examples

data('sim')
wt = inverse_distance_weight(sim$lo,sim$la)
cpsd_disc(y ~ xa + xb + xc,
          data = sim,
          wt = wt)
#> $x
#> [1] "xa" "xb" "xc"
#> 
#> $k
#> [1]  9  8 17
#> 
#> $method
#> [1] "quantile" "quantile" "quantile"
#> 
#> $disv
#> # A tibble: 80 × 3
#>       xa    xb    xc
#>    <int> <int> <int>
#>  1     1     6     7
#>  2     6     5    13
#>  3     2     6     7
#>  4     2     4     4
#>  5     6     5    12
#>  6     3     6     8
#>  7     6     5    13
#>  8     2     3     3
#>  9     6     4     8
#> 10     8     1    12
#> # ℹ 70 more rows
#>