Skip to contents

Function for determining the best univariate discretization based on geodetector q-statistic.

Usage

gd_bestunidisc(
  formula,
  data,
  discnum = 3:22,
  discmethod = c("sd", "equal", "pretty", "quantile", "fisher", "headtails", "maximum",
    "box"),
  cores = 1,
  return_disc = TRUE,
  seed = 123456789,
  ...
)

Arguments

formula

A formula of best univariate discretization.

data

A data.frame or tibble of observation data.

discnum

(optional) A vector of number of classes for discretization. Default is 3:22.

discmethod

(optional) A vector of methods for discretization,default is using c("sd","equal","pretty","quantile","fisher","headtails","maximum","box")in gdverse.

cores

(optional) A positive integer(default is 1). If cores > 1, a 'parallel' package cluster with that many cores is created and used. You can also supply a cluster object.

return_disc

(optional) Whether or not return discretized result used the optimal parameter. Default is TRUE.

seed

(optional) Random seed number, default is 123456789. Setting random seed is useful when the sample size is greater than 3000(the default value for largeN) and the data is discretized by sampling 10%(the default value for samp_prop in st_unidisc()).

...

(optional) Other arguments passed to st_unidisc().

Value

A list with the optimal parameter in the provided parameter combination with k, method and disc(when return_disc is TRUE).

x

the name of the variable that needs to be discretized

k

optimal discretization number

method

optimal discretization method

disc

optimal discretization results

Author

Wenbo Lv lyu.geosocial@gmail.com

Examples

data('sim')
gd_bestunidisc(y ~ xa + xb + xc, data = sim,
               discvar = paste0('x',letters[1:3]),
               discnum = 3:6)
#> $x
#> [1] "xa" "xb" "xc"
#> 
#> $k
#> [1] 5 6 6
#> 
#> $method
#> [1] "equal"   "maximum" "maximum"
#> 
#> $disv
#> # A tibble: 80 × 3
#>       xa    xb    xc
#>    <int> <int> <int>
#>  1     1     3     2
#>  2     3     3     3
#>  3     1     3     3
#>  4     1     2     2
#>  5     2     3     3
#>  6     1     3     3
#>  7     3     3     3
#>  8     1     2     2
#>  9     2     2     3
#> 10     4     2     3
#> # ℹ 70 more rows
#>