A. Agarwal and J. C. Duchi, The generalization ability of online algorithms for dependent data, IEEE Transactions on Information Theory, vol.59, issue.1, pp.573-587, 2013.

P. Alquier and X. Li, Prediction of quantiles by statistical learning and application to gdp forecasting, 15th International Conference on Discovery Science, vol.2, pp.23-36, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00777482

P. Alquier and O. Wintenberger, Model selection for weakly dependent time series forecasting, Bernoulli, vol.18, issue.3, p.9, 2012.
URL : https://hal.archives-ouvertes.fr/inria-00386733

P. Alquier, X. Li, and O. Wintenberger, Prediction of time series by statistical learning: General losses and fast rates, Dependence Modeling, vol.1, pp.65-93, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00749729

P. Alquier, J. Ridgway, and N. Chopin, On the properties of variational approximations of gibbs posteriors, Journal of Machine Learning Research, vol.17, issue.239, pp.1-41, 2016.

J. Audibert, Fast learning rates in statistical inference through aggregation, The Annals of Statistics, vol.37, issue.4, pp.1591-1646, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00139030

J. Audibert and O. Catoni, Robust linear least squares regression. The Annals of Statistics, pp.2766-2794, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00522534

L. Bégin, P. Germain, F. Laviolette, and J. Roy, PAC-Bayesian bounds based on the Rényi divergence, Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, p.14, 2004.

S. Boucheron, G. Lugosi, and P. Massart, Concentration inequalities: A nonasymptotic theory of independence, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00794821

O. Catoni, Statistical Learning Theory and Stochastic Optimization. Saint-Flour Summer School on Probability Theory, Lecture Notes in Mathematics, issue.1, 2001.
URL : https://hal.archives-ouvertes.fr/hal-00104952

O. Catoni, PAC-Bayesian supervised classification: the thermodynamics of statistical learning, Institute of Mathematical Statistics Lecture Notes-Monograph Series, vol.56, 2005.
URL : https://hal.archives-ouvertes.fr/hal-00206119

O. Catoni, Challenging the empirical mean and empirical variance: a deviation study, Annales de l'Institut Henri Poincaré, Probabilités et Statistiques, vol.48, pp.1148-1185, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00517206

O. Catoni, PAC-Bayesian bounds for the Gram matrix and least squares regression with a random design, 2016.

I. Csiszár and P. C. Shields, Information theory and statistics: A tutorial, 2004.

J. Dedecker, P. Doukhan, G. Lang, L. R. Rafael, S. Louhichi et al., Weak dependence, Weak Dependence: With Examples and Applications, pp.9-20, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00686031

L. Devroye, L. Györfi, and G. Lugosi, A Probabilistic Theory of Pattern Recognition, 1996.

L. Devroye, M. Lerasle, G. Lugosi, and R. I. Oliveira, Sub-Gaussian mean estimators, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01204519

V. C. Dinh, L. S. Ho, B. Nguyen, and D. Nguyen, Fast learning rates with heavy-tailed losses, Advances in Neural Information Processing Systems, vol.29, pp.505-513, 2016.

P. Doukhan, Mixing: Properties and Examples, Lecture Notes in Statistics, vol.8, issue.2, p.9, 1994.

C. Giraud, F. Roueff, and A. Sanchez-pérez, Aggregation of predictors for nonstationary sub-linear processes and online adaptive forecasting of time varying autoregressive processes, The Annals of Statistics, vol.43, issue.6, pp.2412-2450, 2015.

I. Giulini, PAC-Bayesian bounds for Principal Component Analysis in Hilbert spaces, 2015.

P. D. Grünwald and N. A. Mehta, Fast rates with unbounded losses, vol.2, p.14, 2016.

B. Guedj and P. Alquier, PAC-Bayesian estimation and prediction in sparse additive models, Electronic Journal of Statistics, vol.7, issue.1, pp.264-291, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00722969

L. Guillaume and L. Matthieu, Learning from mom's principles, 2017.

J. Honorio and T. Jaakkola, Tight bounds for the expected risk of linear classifiers and PAC-Bayes finite-sample guarantees, Proceedings of the 17th International Conference on Artificial Intelligence and Statistics, pp.384-392, 2014.

D. Hsu and S. Sabato, Loss minimization and parameter estimation with heavy tails, Journal of Machine Learning Research, vol.17, issue.18, pp.1-40, 2016.

L. A. Kontorovich and K. Ramanan, Concentration inequalities for dependent random variables via the martingale method. The Annals of Probability, vol.36, pp.2126-2158, 2008.

V. Kuznetsov and M. Mohri, Generalization bounds for time series prediction with non-stationary processes, International Conference on Algorithmic Learning Theory, pp.260-274, 2014.

J. Langford and J. Shawe-taylor, PAC-Bayes & margins, Proceedings of the 15th International Conference on Neural Information Processing Systems, pp.439-446, 2002.

G. Lecué and S. Mendelson, Regularization and the small-ball method I: sparse recovery, 2016.

B. London, B. Huang, and L. Getoor, Stability and generalization in structured prediction, Journal of Machine Learning Research, vol.17, issue.222, pp.1-52, 2016.

G. Lugosi and S. Mendelson, Risk minimization by median-of-means tournaments, 2016.

G. Lugosi and S. Mendelson, Regularization, sparse recovery, and median-ofmeans tournaments, 2017.

D. A. Mcallester, Some PAC-Bayesian theorems, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp.230-234, 1998.

D. A. Mcallester, PAC-Bayesian model averaging, Proceedings of the twelfth annual conference on Computational learning theory, pp.164-170, 1999.

S. Mendelson, Learning without concentration, J. ACM, vol.62, issue.3, p.14, 2015.

S. Minsker, Geometric median and robust estimation in banach spaces, Bernoulli, vol.21, issue.4, pp.2308-2335, 2015.

D. S. Modha and E. Masry, Memory-universal prediction of stationary random processes, IEEE transactions on information theory, vol.44, issue.1, pp.117-133, 1998.

M. Mohri and A. Rostamizadeh, Stability bounds for stationary ?-mixing and ?-mixing processes, Journal of Machine Learning Research, vol.11, issue.2, pp.789-814, 2010.

R. I. Oliveira, The lower tail of random quadratic forms, with applications to ordinary least squares and restricted eigenvalue properties, 2013.

L. Oneto, D. Anguita, and S. Ridella, PAC-Bayesian analysis of distribution dependent priors: Tighter risk bounds and stability analysis, Pattern Recognition Letters, vol.80, issue.2, pp.200-207, 2016.

L. Ralaivola, M. Szafranski, and G. Stempfel, Chromatic PAC-Bayes bounds for non-iid data: Applications to ranking and stationary ?-mixing processes, Journal of Machine Learning Research, vol.11, issue.2, pp.1927-1956, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00867455

E. Rio, Théorie asymptotique des processus aléatoires faiblement dépendants

. Springer-verlag,

M. Seeger, PAC-Bayesian generalisation error bounds for gaussian process classification, Journal of machine learning research, vol.3, issue.1, pp.233-269, 2002.

Y. Seldin and N. Tishby, PAC-Bayesian analysis of co-clustering and beyond, Journal of Machine Learning Research, vol.11, issue.1, pp.3595-3646, 2010.

Y. Seldin, P. Auer, J. Shawe-taylor, R. Ortner, and F. Laviolette, PAC-Bayesian analysis of contextual bandits, Advances in Neural Information Processing Systems, pp.1683-1691, 2011.

Y. Seldin, F. Laviolette, N. Cesa-bianchi, J. Shawe-taylor, and P. Auer, PACBayesian inequalities for martingales, IEEE Transactions on Information Theory, vol.58, issue.12, pp.7086-7093, 2012.

J. Shawe-taylor and R. Williamson, A PAC analysis of a Bayes estimator, Proceedings of the Tenth Annual Conference on Computational Learning Theory, pp.2-9, 1997.

I. Steinwart and A. Christmann, Fast learning from non-iid observations, Advances in Neural Information Processing Systems, vol.2, pp.1768-1776, 2009.

N. N. Taleb, The black swan: The impact of the highly improbable. Random house, 2007.

L. G. Valiant, A theory of the learnable, Communications of the ACM, vol.27, issue.11, pp.1134-1142, 1984.

V. N. Vapnik, The nature of Statistical Learning Theory, 2000.

B. Yu, Rates of convergence for empirical processes of stationary mixing sequences. The Annals of Probability, pp.94-116, 1994.

A. Zimin and C. H. Lampert, Conditional risk minimization for stochastic processes, vol.2, 2015.