Larry Brown - Mallows $C_{p}$ for Realistic Out-of-sample Prediction

24 months ago by

Saturday 1st October 9:00-9:30am

Brown Slides

Very interesting work!


Following up on the two questions i asked after the talk:

1) I said that an advantage of CV over Cp was that one didn't have to know the error variance sigma^2

   Larry answered that the new Cpu criterion doesn't require sigma^2 . This is true, but it instead uses a sandwich estimator

   that is a function of the model residuals. Like estimation of sigma^2 in Cp, this will be problematic if p is close to n, and unavailable if p>n


2) My other point was that Cp and Cpu will be unbiased for the PE for a fixed model of size p, but not for the best model

  of size p (eg from all subsets regression). The penalty factor needs to be modified to make it unbiased for the best model,

 and Ryan Tibshirani showed  in a recent paper that the appropriate factor for Cp  is a unfortunately a function of the true underlying mean mu.

This again suggests another advantage of CV over Cp, and perhaps Cpu

written 23 months ago by rob tibshirani 

3 Answers

23 months ago by

Can you elaborate on the normalization to the covariate for the cholesterol improvement example?

Brad worked with the data this way. So I followed him. Doing this is really a 'cheat' that doesn't fit our assumption-lean setting. But I did it anyway.

written 23 months ago by Larry Brown 

Thanks for both responses.

written 23 months ago by John Kolassa 
23 months ago by

Is Cpu here (subscrpt p, superscript U) definition new?


Colin certainly knows this, and knew it from the start. I don't remember a specific notation or specific comment about this in his paper. But see also the reference to Gilmour who made Colin's original form into something exactly unbiased.

written 23 months ago by Larry Brown 
23 months ago by

In your asymptotic argument, you say on slide 16 that you are using the delta method. Where exactly is that used? I'm just wondering if you're aware of the paper by Kasy (2015, `Uniformity and the delta method'). Essentially, continuous differentiability on a compact domain gives the delta method uniform validity. But in many cases of practical interest, this does not hold. I'm just wondering if the claim that you can disregard the  $O\left(n^{-1}\right)$O(n1) term in computing Cp is affected by non-uniformity of the delta method in some cases.

Thanks for the reference (and for the careful reading). I'll look at it. there's no such subtlety involved in the derivation on my p 16 and I could have left out the phrase "deltan method". The actual derivation is on p 17.

written 23 months ago by Larry Brown 
Please login to add an answer/comment or follow this question.