Minimax Series Simulations

Motivation

This project discusses nonparametric density estimation under a special type of sparsity condition, the approximate sparsity.

Suppose $X$ is an economic variable of interest and we want to estimate its probability density $f_X$. In particular, $f_X$ has a series representation $$ f_X(x) = \sum_{j=1}^\infty \theta_j\phi_j(x),\quad \theta_j = E[\phi_j(X)] $$ where $\{\phi_j\}$ is an orthonormal basis, choosing by researcher.

Given data $\{X_i\}_{i=1}^N$, we can estimate $$ \hat{f}_J(x) = \sum_{j=1}^J \hat{\theta}_j\phi_j(x),\quad \hat{\theta}_j = \frac{1}{n}\sum_{i=1}^n \phi_j(X_i) $$

The main question is how to choose the series cutoff $J$, which governs the variance-bias tradeoff.

Since the true coefficients $\theta_j = E[\phi_j(X)]$ are expectations, this framework essentially estimates many expectations. Recent high-dimensional econometrics literature provides one method of choosing the optimal cutoff $J$ in a data-driven way, which relies on LASSO.

Approximate Sparsity

In particular, LASSO has been shown to work well when you have sparse coefficients, and in fact, it still works when the sparsity only holds approximately. Using our density $f_X(x) = \sum_{j=1}^\infty\theta_j\phi_j(x)$ for example.

Contribution

In my paper, I establish the complexity of the function classes that are characterized by the approximate sparsity condition, and demonstrate that LASSO as a selection mechanism produces the optimal estimator for the densities in these classes. This has implications for both density estimation and nonparametric regression problem.

Code

Below is the Python code corresponding to the theory in my paper. A few highlights:

Target Density

Random Sample Generator From Target Density

P-Algorithm

Test Run for P-Alg

Simulate ISE/MISE