In previous post, we discussed about basics of neural network. In this post, we will talk about an interesting type of neural network popularly known as CNN, ConvNet or Convolutional Neural Network. Let us first understand the algorithm and the techniques used, then we will look how to implement these things in real.
What is CNN Exactly?
You may have officially known about picture or facial acknowledgment or self-driving vehicles. These are genuine executions of Convolutional Neural Networks (CNNs).
The progressions in Computer Vision with Deep Learning has been developed and idealized with time, fundamentally more than one specific algorithm — a Convolutional Neural Network.
CNNs, as neural networks, are comprised of neurons with learnable weights and biases. Every neuron gets a few sources of info, takes a weighted total over them, go it through an actuation work and reacts with a yield. The entire network has a loss function and every one of the tips and traps that we created for neural networks applicable for CNNs.
A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning technique which can take in an info picture, allocate significance (learnable weights and biases) to different perspectives/protests in the picture/Image and have the capacity to separate one from the other.
The pre-training required in a ConvNet is much lower when contrasted with other characterization algorithms. While in crude techniques channels are hand-designed, with enough training, CNNs can become familiar with these channels/attributes.
Architecture of CNN
The structure of a CNN is similar to that of the connectivity configuration of Neurons in the Human Brain and was motivated by the pattern of the Visual Cortex. Each neurons reply to stimuli only in a controlled area of the visual arena known as the Receptive Field. A gathering of such fields overlap to shield the complete visual area.
The convolution layer is the foremost building block of a CNN
The convolution layer encompasses set of autonomous filters. All these filters are adjusted randomly and develop parameters which will be learned by the network later.
The hidden layers in a CNN are mostly convolution and pooling (down-sampling) layers.
In every convolution layer, a filter of a tiny dimension should move across the input image and perform convolution processes. Convolution processes are element wise matrix multiplication between the filter values and the pixels in the input picture and the resultant values gets added/totaled.
The values of filters are tuned through the iterative training procedure and after a neural net has trained for certain number of epochs, these filters begin to find several features in the given image.
Pooling layers are responsible to down-sample the input image. The image would hold lot of pixel values and it is naturally simple for the network to learn the features if the size of an input image is gradually condensed. Pooling layers support in dropping the number of parameters needed and therefore, this decreases the required computation. Pooling attempts to avoid overfitting problem as well.
There are couple of pooling operations:
- Max Pooling : Choose the maximum value
- Average Pooling: Total sum all of the values and dividing it by the total number of values (take average of present values )
Average pooling is seldom applied in practice; you would find max pooling applications in most of the use cases.
What CNN does?
A CNN can effectively capture the Spatial and Temporal dependencies in an input image through the application of appropriate filters. The construction achieves enhanced fitting to the image data-set due to the lessening in the number of parameters and re-usability of weights. The network can be trained to recognize the complexity of the image in better way.
The role of the CNN is to deduce the images into a form that is simpler to process, without misplacing features that are important for getting decent prediction. This is critical when the need is to design an architecture that is good at learning features and is scalable to huge datasets.