### Lucas Janson - Knockoffs as Post-Selection Inference

### 4 Answers

2

I note a paralellism between your Ws and teh Wilcoxon signed rank statistics. Did this inform your intuiton about the problem?

0

A related paper: https://arxiv.org/abs/1610.02351

An earlier related paper: http://arxiv.org/abs/1505.06549

An earlier related paper: http://arxiv.org/abs/1505.06549

0

Explanatory variables here have to have a known distribution. How does this apply to a designed experiment?
If you are in low dimensions and are willing to assume a linear model, the original knockoff filter by Barber and Candes (2015) fully conditions on the covariates, so there's no problem there (and all the adaptivity stuff has an analogue for that method as well). For model-free knockoffs, if there is a designed randomized experiment, this is the best-case scenario because the covariates are random and their distribution is known (because it's chosen by the experimenter). If all the covariate values are deterministic, then model-free knockoffs won't apply, but I think this is rarely the case (I am not aware of such applications) in high-dimensional applications where variable selection is most relevant. I think the most common setup is observational studies, where by definition the observations are sampled from some distribution/population.

written
12 months ago by
Lucas Janson

0

Have you thought about how your variable importance approach compares to the LOCO (leave one covariate out) approach in the conformal prediction framework? That is, the paper by Lei, G'Sell, Rinaldo, Tibshirani & Wasserman, `Distribution-free predictive inference for regression' (forthcoming in JASA).

arXiv version: https://arxiv.org/abs/1604.04173

arXiv version: https://arxiv.org/abs/1604.04173

1

Absolutely, and I enjoyed that paper very much. Their LOCO definition of an important variable is of course a bit different than ours--theirs depends on a particular prediction method, while ours is a fundamental property only of the underlying data-generating distribution. But it is certainly capturing a similar idea, and in fact can be used as the variable importance measure Z_j inside model-free knockoffs, in which case our results guarantee that the resulting variable selection controls the FDR with respect to our definition of null/non-null variables. I haven't tried it, but would be interesting to see how the power compares to other variable importance measures we've tried.

written
12 months ago by
Lucas Janson

Please login to add an answer/comment or follow this question.

dependentcovariates, since if the covariates are independent, the variable selection problem is not too hard to solve.