### Larry Brown - Mallows $C_{p}$ for Realistic Out-of-sample Prediction

### 3 Answers

Can you elaborate on the normalization to the covariate for the cholesterol improvement example?

Brad worked with the data this way. So I followed him. Doing this is really a 'cheat' that doesn't fit our assumption-lean setting. But I did it anyway.

Thanks for both responses.

Is Cpu here (subscrpt p, superscript U) definition new?

John,

Colin certainly knows this, and knew it from the start. I don't remember a specific notation or specific comment about this in his paper. But see also the reference to Gilmour who made Colin's original form into something exactly unbiased.

In your asymptotic argument, you say on slide 16 that you are using the delta method. Where exactly is that used? I'm just wondering if you're aware of the paper by Kasy (2015, `Uniformity and the delta method'). Essentially, continuous differentiability on a compact domain gives the delta method uniform validity. But in many cases of practical interest, this does not hold. I'm just wondering if the claim that you can disregard the $O\left(n^{-1}\right)$`O`(`n`^{−1}) term in computing Cp is affected by non-uniformity of the delta method in some cases.

Thanks for the reference (and for the careful reading). I'll look at it. there's no such subtlety involved in the derivation on my p 16 and I could have left out the phrase "deltan method". The actual derivation is on p 17.

Very interesting work!

Following up on the two questions i asked after the talk:

1) I said that an advantage of CV over Cp was that one didn't have to know the error variance sigma^2

Larry answered that the new Cpu criterion doesn't require sigma^2 . This is true, but it instead uses a sandwich estimator

that is a function of the model residuals. Like estimation of sigma^2 in Cp, this will be problematic if p is close to n, and unavailable if p>n

2) My other point was that Cp and Cpu will be unbiased for the PE for a fixed model of size p, but not for the best model

of size p (eg from all subsets regression). The penalty factor needs to be modified to make it unbiased for the best model,

and Ryan Tibshirani showed in a recent paper that the appropriate factor for Cp is a unfortunately a function of the true underlying mean mu.

This again suggests another advantage of CV over Cp, and perhaps Cpu