Non–parametric function estimation aims to estimate or recover or denoise a function of interest, perhaps a signal, spectrum or image, that is observed in noise and possibly indirectly after some transformation, as in deconvolution. ‘Non–parametric’ signifies that no a priori limit is placed on the number of unknown parameters used to model the signal. Such theories of estimation are necessarily quite different from traditional statistical models with a small number of parameters specified in advance. Before wavelets, the theory was dominated by linear estimators, and the exploitation of assumed smoothness in the unknown function to describe optimal methods. Wavelets provide a set of tools that make it natural to assert, in plausible theoretical models, that the sparsity of representation is a more basic notion than smoothness, and that nonlinear thresholding can be a powerful competitor to traditional linear methods. We survey some of this story, showing how sparsity emerges from an optimality analysis via the game–theoretic notion of a least–favourable distribution.