### Rina Foygel Barber - Distribution-free inference for estimation and classification

354
views
0
12 months ago by
Community: WHOA-PSI 2017

0
12 months ago by
Here the black is the population quantity that you are trying to estimate.  How is it defined in population terms.  It seems like it is the expecctation of some quantity conditioned on an order statistics from some training sample, but I can't figure out what it is more precisely.
Yes, you can definitely think of it as a population quantity! We want to think about something like E[Y|\hat{p}(X)=t] for t\in[0,1]. If we just think about the isotonic regression problem, it's E[Y|T=t] where we are given a training set of size n consisting of T_1,...,T_n, drawn iid from some distribution on [0,1], and Y_1,...,Y_n where Y_i = f(T_i) + noise, and f(T_i) is some monotone function. A lot of the asymptotic work on isotonic regression is in this setting, since if we think of n tending to infinity then it makes sense to think of n many data points drawn from this continuous function on [0,1]. A few references for this version of the problem are in our preprint (linked in Todd's comment).

When we think about the post-selection inference problem, it actually may get a bit trickier because, after observing (X_i, corrupted Y_i) for i=1,...,n, we now know something about the X_i's. So at stage 2, where we want to do inference on the p_i vector (which now becomes the function f(t)), it's no longer safe to assume that the X_i's are iid draws from some population---we've used this part of the data, it's no longer completely random to us. We are not sure how to think about the problem in this setting, i.e. how to formulate the n data points as a discrete version of some underlying continuous process.
written 12 months ago by Rina Barber
0
12 months ago by
My intuition for this problem is built around estimation of a survival function, which may be far enough from your problem to be misleading, but, following that analogy, Iwould expect that you won't do much better pointwise, as the monitonicity forces pointwise coverage to hold uniformly as well.  Isouldn't expect that you'll get a much more narrow interval pointwise.
The overall convergence property is going to hold uniformly as n goes to infinity, but we are optimistic that the width of the interval might change substantially in our finite sample calculations. It might be just by a constant factor, although our current intuition is that we might swap a log(n) for a log(log(n)). Asymptotically it's probably all the same (i.e. if each point converges, then convergence is uniform) but in finite samples we are hoping to be less conservative - even a constant factor would help us.
written 12 months ago by Rina Barber
0
12 months ago by
A link to her paper related to this talk: https://arxiv.org/abs/1706.01852