Abstract
Sorted L-One Penalized Estimation (SLOPE) is a relatively new convex
optimization procedure which allows for adaptive selection of regressors under
sparse high dimensional designs. Here we extend the idea of SLOPE to deal with
the situation when one aims at selecting whole groups of explanatory variables
instead of single regressors. Such groups can be formed by clustering strongly
correlated predictors or groups of dummy variables corresponding to different
levels of the same qualitative predictor. We formulate the respective convex
optimization problem, gSLOPE (group SLOPE), and propose an efficient algorithm
for its solution. We also define a notion of the group false discovery rate
(gFDR) and provide a choice of the sequence of tuning parameters for gSLOPE so
that gFDR is provably controlled at a prespecified level if the groups of
variables are orthogonal to each other. Moreover, we prove that the resulting
procedure adapts to unknown sparsity and is asymptotically minimax with respect
to the estimation of the proportions of variance of the response variable
explained by regressors from different groups. We also provide a method for the
choice of the regularizing sequence when variables in different groups are not
orthogonal but statistically independent and illustrate its good properties
with computer simulations. Finally, we illustrate the advantages of gSLOPE in
the context of Genome Wide Association Studies. R package grpSLOPE with
implementation of our method is available on CRAN.
Users
Please
log in to take part in the discussion (add own reviews or comments).