Let’s once again rewrite the function as:

In this case we assume that the functions are all the same, and are all ReLU functions, and so the network can be also called a Deep ReLU network.

We can see that is expressed as a big linear combination combination of ridge functions . With ReLU we’ll have that is a piecewise-linear function, meaning that the whole function would not be linear, but each single ReLU output will be linear.

For example let’s imagine a deep network that has 2-layers and for each 2D point returns a real scalar. The output of the network will be something like this:

Output of \mathbb{y}. On the right there is the above view.

Output of . On the right there is the above view.

The blue edges are produced by the first layer, while the red ones by the second.

We can see that the function is linear in each activation region, and so is a piecewise-linear function.

Also note that the sharp edges of the output shape are due to the discontinuity of the gradient of the ReLU.