Naive imputation implicitly regularizes high-dimensional linear models - École polytechnique Accéder directement au contenu
Communication Dans Un Congrès Année : 2023

Naive imputation implicitly regularizes high-dimensional linear models

Résumé

Two different approaches exist to handle missing values for prediction: either imputation, prior to fitting any predictive algorithms, or dedicated methods able to natively incorporate missing values. While imputation is widely (and easily) use, it is unfortunately biased when low-capacity predictors (such as linear models) are applied afterward. However, in practice, naive imputation exhibits good predictive performance. In this paper, we study the impact of imputation in a high-dimensional linear model with MCAR missing data. We prove that zero imputation performs an implicit regularization closely related to the ridge method, often used in high-dimensional problems. Leveraging on this connection, we establish that the imputation bias is controlled by a ridge bias, which vanishes in high dimension. As a predictor, we argue in favor of the averaged SGD strategy, applied to zero-imputed data. We establish an upper bound on its generalization error, highlighting that imputation is benign in the d √ n regime. Experiments illustrate our findings.
Fichier principal
Vignette du fichier
main.pdf (421.04 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03958825 , version 1 (30-01-2023)

Identifiants

Citer

Alexis Ayme, Claire Boyer, Aymeric Dieuleveut, Erwan Scornet. Naive imputation implicitly regularizes high-dimensional linear models. International Conference on Machine Learning, Jul 2023, Hawai, USA, United States. ⟨hal-03958825⟩
45 Consultations
74 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More