I found the answer to my question in the paper Feedback stabilization using two-hidden-layer nets by E.D. Sontag. From the introduction:
It is by now well-known that functions computable by nets with a single hidden layer can approximate continuous functions, uniformly on compacts, under only weak assumptions on $\theta$. Consider now the following inversion problem: Given a continuous function $f : \mathbb{R}^m \rightarrow \mathbb{R}^p$, a compact subset $C \subseteq \mathbb{R}^p$ included in the image of $f$, and an $\varepsilon > 0$, find a function $\phi : \mathbb{R}^p \rightarrow \mathbb{R}^m$ so that $\|f(\phi(x)) - x \| < \varepsilon$ for all $x \in C$. It is trivial to see that in general discontinuous functions $\phi$ are needed. We show later that nets with just one hidden layer are not enough to guarantee the solution of all such problems, but nets with two hidden layers are.