What Is A Capsule Network?
A capsule network is a new type of neural network that is designed to overcome the shortcomings of Convolutional Neural Networks (CNNs). Capsule networks were conceptualized by Geoffrey Hinton, in this paper - https://arxiv.org/abs/1710.09829. Since then the world of AI research has been taken in a storm, avid researchers and scientists are meticulously scrutinizing every detail of the paper. While some proclaim that capsule networks bode the demise of our beloved CNNs, several researchers have pointed out areas of application where these networks might not perform so well.
The main application identified to be impacted is computer vision, as it were the limitations of CNNs that begot the whole idea of capsule networks. Traditional neural networks have their limitations in identifying images with lesser data or identifying the contextual relativeness between components within images. As such, images with same components with different placements or orientation will be labelled similarly in CNNs, causing reduced accuracy.
How Do Capsule Networks Work?
Using a top-down approach, try imagining an image as a collection of various components. A capsule network is made of a network of capsules - small groups of neurons each pertaining to a particular component. The activity of neurons in a group determine the properties of the component of the image. Each capsule is responsible for identifying a single component, and together all capsules will determine what a traditional CNN does, that is label an image correctly. Unlike traditional networks, capsule networks value localization, the orientation and precise location of each component within an image is an important data point in capsule networks.
Each layer in a multilayer capsule neural network consists of these capsules. Using the same top-down approach, taking a parse tree for representation, the nodes of the parse tree represent individual capsules, and the root of the parse tree corresponds to the complete image. Each capsule or node in the parse tree will have a parent capsule as well. The leaves of the parse tree are equivalent to individual independent entities within the image. Each capsule identifies properties of a particular entity by means of the activities of the neurons in that capsule. These properties may range from position, size, orientation to colours and texture. One of the interesting properties according to the author is the existential property, which describes whether the particular entity exists within an image or not (in terms of probability).
Let's deep dive a bit into the implementation of capsule networks. We have already described how a capsule network might look like, when visually represented as a parse tree. Now consider the output of the capsule as a vector. A vector has two properties attached to it that can be used to describe it completely- its length and its orientation. For a capsule vector, the length of the vector will be used to describe the existence of the entity and this length never exceeds 1, achievable by reducing dimensionality; while the orientation of the vector will help describe the properties of the entity. Now that we have the length and orientation of the vector, we can easily identify the parents of the vector in the parse tree. In the paper, Hinton uses a dynamic routing mechanism along with a routing-by-agreement approach to map the capsules with their parents.
Capsule networks have been tested on MNIST data in the paper and have shown much better results in terms of accuracy than traditional networks. This doesn't mean that CNNs can now be chucked out the window. There is still work to be done to improve the capsule network to perform at par or better than CNNs for major applications. There are several limitations of capsule networks as well. Since capsule networks use localization and have emphasis on all entities of the image, many a times the background noises can interfere with labelling of focus images. Apart from the limitations, capsule networks themselves use CNNs for their learning mechanism. In the network, all the layers except the last one are convolutional. At the moment, capsule networks are still in their stage of infancy and it is unfair to compare them to robust networks such as CNNs. In time with more research in the right direction, these capsule networks may be improved upon enough to accommodate its present limitations and address some of the existing challenges as well.
In summary, capsule networks are a network of groups of neurons - each representing an independently identifiable entity of an image. Capsule networks were created as a solution to the limitations of current convolutional neural networks. These capsule networks have shown promising results on the popular MNIST data, but more active research is needed to test the limits of capsules and fine-tune them for broader applications before we declare them as the ultimate replacement of traditional neural networks.