best univariate discretization based on geodetector q-statistic
Source:R/discretization.R
gd_bestunidisc.Rd
Function for determining the best univariate discretization based on geodetector q-statistic.
Usage
gd_bestunidisc(
formula,
data,
discnum = 3:22,
discmethod = c("sd", "equal", "pretty", "quantile", "fisher", "headtails", "maximum",
"box"),
cores = 1,
return_disc = TRUE,
seed = 123456789,
...
)
Arguments
- formula
A formula of best univariate discretization.
- data
A data.frame or tibble of observation data.
- discnum
(optional) A vector of number of classes for discretization. Default is
3:22
.- discmethod
(optional) A vector of methods for discretization,default is using
c("sd","equal","pretty","quantile","fisher","headtails","maximum","box")
ingdverse
.- cores
(optional) A positive integer(default is 1). If cores > 1, a 'parallel' package cluster with that many cores is created and used. You can also supply a cluster object.
- return_disc
(optional) Whether or not return discretized result used the optimal parameter. Default is
TRUE
.- seed
(optional) Random seed number, default is
123456789
. Setting random seed is useful when the sample size is greater than3000
(the default value forlargeN
) and the data is discretized by sampling10%
(the default value forsamp_prop
inst_unidisc()
).- ...
(optional) Other arguments passed to
st_unidisc()
.
Value
A list with the optimal parameter in the provided parameter combination with k
,
method
and disc
(when return_disc
is TRUE
).
x
the name of the variable that needs to be discretized
k
optimal discretization number
method
optimal discretization method
disc
optimal discretization results
Author
Wenbo Lv lyu.geosocial@gmail.com
Examples
data('sim')
gd_bestunidisc(y ~ xa + xb + xc, data = sim,
discvar = paste0('x',letters[1:3]),
discnum = 3:6)
#> $x
#> [1] "xa" "xb" "xc"
#>
#> $k
#> [1] 5 6 6
#>
#> $method
#> [1] "equal" "maximum" "maximum"
#>
#> $disv
#> # A tibble: 80 × 3
#> xa xb xc
#> <int> <int> <int>
#> 1 1 3 2
#> 2 3 3 3
#> 3 1 3 3
#> 4 1 2 2
#> 5 2 3 3
#> 6 1 3 3
#> 7 3 3 3
#> 8 1 2 2
#> 9 2 2 3
#> 10 4 2 3
#> # ℹ 70 more rows
#>