Title: | Feature Selection and Ranking by Simultaneous Perturbation Stochastic Approximation |
---|---|
Description: | An implementation of feature selection and ranking via simultaneous perturbation stochastic approximation (SPSA-FSR) based on works by V. Aksakalli and M. Malekipirbazari (2015) <arXiv:1508.07630> and Zeren D. Yenice and et al. (2018) <arXiv:1804.05589>. The SPSA-FSR algorithm searches for a locally optimal set of features that yield the best predictive performance using a specified error measure such as mean squared error (for regression problems) and accuracy rate (for classification problems). This package requires an object of class 'task' and an object of class 'Learner' from the 'mlr' package. |
Authors: | Vural Aksakalli [aut, cre], Babak Abbasi [aut, ctb], Yong Kai Wong [aut, ctb], Zeren D. Yenice [ctb] |
Maintainer: | Vural Aksakalli <[email protected]> |
License: | GPL-3 |
Version: | 1.0.0 |
Built: | 2024-11-10 04:18:26 UTC |
Source: | https://github.com/yongkai17/spfsr |
Returns a fitted model which uses the best performing feature subset
generated by spFSR. It inherits all methods or functions applied to a WrappedModel
objects.
For example, the predict
function can be used on the fitted model.
If it is a classifcation model, a confusion matrix can be obtained by calling
the calculateConfusionMatrix function. See spFeatureSelection
for a more detailed example.
getBestModel(x)
getBestModel(x)
x |
an |
A WrappedModel
object of the best performing features.
Yong Kai Wong [email protected]
Returns importance ranks of best performing features. See spFeatureSelection for a more detailed example.
getImportance(x)
getImportance(x)
x |
a |
A data.frame
of features and feature importance
Yong Kai Wong [email protected]
plotImportance and spFeatureSelection.
Plot for an spFSR object. It returns a scatterplot of measure values vs. iteration. The error bar of measure values at each iteration can be included. It also allows user to identify the iteration which yields the best measure value. See spFeatureSelection for a more detailed example.
## S3 method for class 'spFSR' plot(x, errorBar = FALSE, annotateBest = FALSE, se = FALSE, ...)
## S3 method for class 'spFSR' plot(x, errorBar = FALSE, annotateBest = FALSE, se = FALSE, ...)
x |
a |
errorBar |
If |
annotateBest |
If |
se |
If |
... |
Additional plot parameters that can be passed into the plot function. |
Plot of error measure values vs iterations of a spFSR object with an error bar (if included).
Yong Kai Wong [email protected]
plotImportance and spFeatureSelection.
Return a vertical bar chart of features vs. feature importance. See spFeatureSelection for a more detailed example.
plotImportance(x, low = "darkblue", high = "black")
plotImportance(x, low = "darkblue", high = "black")
x |
an |
low |
Color for the lowest importance. The default is darkblue. |
high |
Color for the highest importance. The default is black. |
a ggplot
object: a vertical bar chart of features and feature importance.
Yong Kai Wong [email protected]
plotImportance, spFSR.default, and spFeatureSelection.
Searches for the best performing set of features, either automatically or for a given number of features, and ranks them by their importance via the simultaneous perturbation stochastic approximation (SPSA) algorithm for given a task, wrapper, and performance criterion. The task, the wrapper, and the performance criterion are defined using the mlr package.
spFeatureSelection(task, wrapper, measure, norm.method = "standardize", num.features.selected, ...)
spFeatureSelection(task, wrapper, measure, norm.method = "standardize", num.features.selected, ...)
task |
A |
wrapper |
A |
measure |
A performance measure supported by the |
norm.method |
Normalization method for features. NULL value is allowed. Supported methods are 'standardize', 'range', 'center', and 'scale'. Default value is 'standardize'. |
num.features.selected |
Number of features to be selected. Must be between zero and total number of features in the task. A value of zero results in automatic feature selection. |
... |
Additional arguments. For more details, see spFSR.default. |
spFSR
returns an object of class "spFSR". An object of class "spFSR" consists of the following:
task.spfs |
An mlr package |
wrapper |
An mlr package |
measure |
An mlr package performance measure as specified by the user. |
param best.model |
An mlr package |
iter.results |
A |
features |
Names of the best performing features. |
num.features |
The number of best performing features. |
importance |
A vector of importance ranks of the best performing features. |
total.iters |
The total number of iterations executed. |
best.iter |
The iteration where the best performing feature subset was encountered. |
best.value |
The best measure value encountered during execution. |
best.std |
The standard deviation corresponding to the best measure value encountered. |
run.time |
Total run time in minutes |
rdesc.feat.eval |
Resampling specification |
call |
Call |
Vural Aksakalli [email protected]
Babak Abbasi [email protected], [email protected]
Yong Kai Wong [email protected]
V. Aksakalli and M. Malekipirbazari (2015) Feature Selection via Binary Simultaneous Perturbation Stochastic Approximation, Pattern Recognition Letters, Vol. 75, 41 – 47. See https://doi.org/10.1016/j.patrec.2016.03.002
makeClassifTask, makeRegrTask, makeLearner and spFSR.default.
data(iris) # load the data library(mlr) # load the mlr package if( requireNamespace('class', quietly = TRUE) ){ # Load class so that a knn classifier can be defined library('class') # define classification task on 20 random samples task <- makeClassifTask(data = iris[sample(150, 20),], target = 'Species') # define a wrapper (1-KNN classifier) wrapper <- makeLearner('classif.knn', k = 1) # run spsa with 2 iterations # to select 1 out of 4 features spsaMod <- spFeatureSelection( task = task, wrapper = wrapper, measure = mmce, num.features.selected = 1, num.cores = 1, iters.max = 2) # obtain summary summary(spsaMod) # plot with error bars plot(spsaMod, errorBar = TRUE) # obtain the wrapped model with the best performing features bestMod <- getBestModel(spsaMod) # predict using the best mod pred <- predict(bestMod, task = spsaMod$task.spfs ) # Obtain confusion matrix calculateConfusionMatrix( pred ) # Get the importance ranks of best performing features getImportance(spsaMod) plotImportance(spsaMod) }
data(iris) # load the data library(mlr) # load the mlr package if( requireNamespace('class', quietly = TRUE) ){ # Load class so that a knn classifier can be defined library('class') # define classification task on 20 random samples task <- makeClassifTask(data = iris[sample(150, 20),], target = 'Species') # define a wrapper (1-KNN classifier) wrapper <- makeLearner('classif.knn', k = 1) # run spsa with 2 iterations # to select 1 out of 4 features spsaMod <- spFeatureSelection( task = task, wrapper = wrapper, measure = mmce, num.features.selected = 1, num.cores = 1, iters.max = 2) # obtain summary summary(spsaMod) # plot with error bars plot(spsaMod, errorBar = TRUE) # obtain the wrapped model with the best performing features bestMod <- getBestModel(spsaMod) # predict using the best mod pred <- predict(bestMod, task = spsaMod$task.spfs ) # Obtain confusion matrix calculateConfusionMatrix( pred ) # Get the importance ranks of best performing features getImportance(spsaMod) plotImportance(spsaMod) }
This is the default function of spFeatureSelection. See spFeatureSelection for more details.
## Default S3 method: spFSR(task, wrapper, measure, norm.method = "standardize", num.features.selected = 0L, features.to.keep = NULL, iters.max = 100L, stall.limit = 20L, stall.tolerance = 10^(-7), num.grad.avg = 3L, num.gain.smoothing = 3L, perturb.amount = 0.05, gain.min = 0.01, gain.max = 1, perf.eval.method = "cv", num.cv.folds = 5L, num.cv.reps.grad.avg = 3L, num.cv.reps.feat.eval = 3L, cv.stratify = TRUE, run.parallel = TRUE, num.cores = NULL, show.info = TRUE, print.freq = 1L)
## Default S3 method: spFSR(task, wrapper, measure, norm.method = "standardize", num.features.selected = 0L, features.to.keep = NULL, iters.max = 100L, stall.limit = 20L, stall.tolerance = 10^(-7), num.grad.avg = 3L, num.gain.smoothing = 3L, perturb.amount = 0.05, gain.min = 0.01, gain.max = 1, perf.eval.method = "cv", num.cv.folds = 5L, num.cv.reps.grad.avg = 3L, num.cv.reps.feat.eval = 3L, cv.stratify = TRUE, run.parallel = TRUE, num.cores = NULL, show.info = TRUE, print.freq = 1L)
task |
A |
wrapper |
A |
measure |
A performance measure within the mlr package supported by the |
norm.method |
Normalization method for features. NULL value is allowed. Supported methods are 'standardize', 'range', 'center', and 'scale'. Default value is 'standardize'. |
num.features.selected |
Number of features selected. It must be a nonnegative integer and must not exceed the total number of features in the task. A value of 0 results in automatic feature selection. Default value is 0L. |
features.to.keep |
Names of features to keep in addition to |
iters.max |
Maximum number of iterations to execute. The minimum value is 2L. Default value is 100L. |
stall.limit |
Number of iterations to stall, that is, to continue without at least |
stall.tolerance |
Value of stall tolerance. It must be strictly positive. Default value is 1/10^7. |
num.grad.avg |
Number of gradients to average for gradient approximation. It must be a positive integer. Default value is 3L. |
num.gain.smoothing |
Number of most recent gains to use in gain smoothing. It must be a positive integer. Default value is 3L. |
perturb.amount |
Perturbation amount for feature importances during gradient approximation. It must be a value between 0.01 and 0.1. Default value is 0.05. |
gain.min |
The minimum gain value. It must be greater than or equal to 0.001. Default value is 0.01. |
gain.max |
The maximum gain value. It must be greater than or equal to |
perf.eval.method |
Performance evaluation method. It must be either 'cv' for cross-validation or 'resub' for resubstitution. Default is 'cv'. |
num.cv.folds |
The number of cross-validation folds when 'cv' is selected as |
num.cv.reps.grad.avg |
The number of cross-validation repetitions for gradient averaging. It must be a positive integer. Default value is 3L. |
num.cv.reps.feat.eval |
The number of cross-validation repetitions for feature subset evaluation. It must be a positive integer. Default value is 3L. |
cv.stratify |
Logical argument. Stratify cross-validation? Default value is TRUE. |
run.parallel |
Logical argument. Perform cross-validations in parallel? Default value is TRUE. |
num.cores |
Number of cores to use in case of a parallel run. It must be less than or equal to the total number of cores on the host machine. If set to |
show.info |
If set to |
print.freq |
Iteration information printing frequency. It must be a positive integer. Default value is 1L. |
spFSR
returns an object of class "spFSR". An object of class "spFSR" consists of the following:
task.spfs |
An mlr package |
wrapper |
An mlr package |
measure |
An mlr package performance measure as specified by the user. |
param best.model |
An mlr package |
iter.results |
A |
features |
Names of the best performing features. |
num.features |
The number of best performing features. |
importance |
A vector of importance ranks of the best performing features. |
total.iters |
The total number of iterations executed. |
best.iter |
The iteration where the best performing feature subset was encountered. |
best.value |
The best measure value encountered during execution. |
best.std |
The standard deviation corresponding to the best measure value encountered. |
run.time |
Total run time in minutes. |
call |
Call. |
Vural Aksakalli [email protected]
Babak Abbasi [email protected], [email protected]
Yong Kai Wong [email protected], [email protected]
V. Aksakalli and M. Malekipirbazari (2015) Feature Selection via Binary Simultaneous Perturbation Stochastic Approximation, Pattern Recognition Letters, Vol. 75, 41 – 47. See https://doi.org/10.1016/j.patrec.2016.03.002
Summarising an spFSR object
## S3 method for class 'spFSR' summary(object, ...)
## S3 method for class 'spFSR' summary(object, ...)
object |
A |
... |
Additional arguments |
Summary of an spFSR object consisting of number of features selected, wrapper type, total number of iterations, the best performing features, and the descriptive statistics of the best iteration result (the iteration where the best performing features are found).
Yong Kai Wong [email protected]
getImportance, spFSR.default, and spFeatureSelection.