Another paper about a qualitative difference between 1- and 2-hidden layer networks:
Neural networks for localized approximation (1994)
We prove that feedforward artificial neural networks with a single hidden layer and an ideal sigmoidal response function cannot provide localized approximation in a Euclidean space of dimension higher than one. We also show that networks with two hidden layers can be designed to provide localized approximation.
The objective of this paper is to investigate the possibility of constructing networks suitable for localized approximation, i.e., a network with the property that if the target function is modified only on a small subset of the Euclidean space, then only a few neurons, rather than the entire network, need to be retrained... We prove that if the dimension of the input space is greater than one, then such a network with one hidden layer and a Heaviside activation function cannot be constructed. In contrast, we also show that a network with two or more hidden layers can always be constructed to accomplish the task.