Consider the classical problem of learning a signal when observed with noise. One way to do this is to expand the signal in terms of basis functions and then try to learn the coefficients. The collection of basis functions is called a dictionary and the approach is sometimes called “synthesis” because the signal is synthesised from the coefficients. Another learning approach, called “analysis”, is based on an l_1 regularization of a linear operator that describes the signal’s structure. As an example one may think of a signal that lives on a graph, and the linear operator describes the change when going from one node to the next in the graph. The sum of the absolute values of the changes is called the total variation of the signal over the graph. A simple special case is the path graph, and a more complicated one is the two-dimensional grid. We will consider the regularized least squares estimator for such examples and also regularization using total variation of higher order discrete derivatives and Hardy Krause total variation. We will introduce the concept “effective sparsity” which is related to the dimensionality of the unknown signal. The regularized least squares estimator will be shown to mimic an oracle that trades off approximation error and “estimation error”, where the latter depends on the effective sparsity.