However Backpropagation can be seen as utilising searching a space of network configurations as weights in order to find a configuration with the least error, measured in the above fashion. Hence the very complicated network structure look like as the error surface that is searched can have local minima so this is a problem for multi-layer networks where we look at ways around it below.
So now having said that, even if a learned network is in a local minima there it may still perform adequately hence multi-layer networks have been utilise to great effect in real world situations as see Tom Mitchell's book for a description of an "ANN" that can drive a car!
According to the problem of local minima is to utilise random re-start as described in the lecture on search techniques. But different initial random weightings for the network may mean like it converges to different local minima or the best of these can be taken for the learned ANN.
However, as described there in Mitchell's book, a "committee" of networks could be learned that with the as possibly weighted average of their decisions taken as an overall decision for a given test example.