Skip to main content

Neural Networks, how do they work?

These days, the most used branch in AI is Deep Learning. We can consider Deep Learning as a subset of Machine Learning, where the used algorithms should "learn" by themselves. Artificial Neural Networks (ANN) and Convolutional Neural Networks (CNN) are the State of the Art in terms of development and business implementation. 
In Machine Learning (ML) the programming paradigm is a little bit different from the conventional. A "usual" programmer analyzes the inputs and develops the program to get the expected outputs. For instance, we want to create a program that sums two numbers. The programmer evaluates the type of inputs and outputs and then writes the code to transform the input values into the desired outputs.

When working on ML, we give the inputs and outputs to the system, and the algorithm "builds" the program itself. Thus, in ML programming paradigm is the algorithm that creates the program for us.  We represent the data as features and is the network that "builds" the connections between the data and understands how data is related. However, we still need to develop the ML model/algorithm and just then it can "learn" and create the adjusted program to the given data. An ANN is a neural network where all neurons are connected as we can see in Figure 1. 
Figure 1 - ANN
Each connection between neurons has a weight and a bias value. The neuron itself has a weight too, being the weighted sum of all connections to it plus a bias value. When the data is used to train the ANN model, the weights are adjusted according to the optimizer, each activation function used on each layer of the model, and the input features. 
We can have n number of hidden layers, thereby, we are increasing the complexity of the model and giving to it more capacity to better understand the training data. But be careful, many hidden layers could lead to overfitting. Overfitting is when the model is so adjusted to the training data that it only has good results with that data. We need models able to handle new data, achieving high accuracy results, not overfitted ones. 
In each layer, we apply an activation function to the neurons. An activation function calculates the weighted sum of all connections for the neuron and adds a bias value. Then the neuron is activated or not, according to the activation function. Each time the batch of training data go through the entire network, a process called Epoch, we have outputs. Then we have a cost function telling us how far from the correct value are the model's outputs. Please remember that we provide the inputs and the expected outputs (Supervised Learning). Therefore, at the end of each epoch, the model's outputs are compared with the real values ​​given by us. Based on the cost function, all the model's weights are updated by Backpropagation. Later I will post about the adjustment of the model with more details as well as how to choose the best features to train a model.

Another well-known type of neural networks is CNNs. They are a special case of ANNs. CNN has four distinct steps/phases in his model (Convolution, Polling, Flattening, and Fully Connected layers), as we can see in Figure 2.
Figure 2 - CNN - image from freecodecamp
A convolution layer applies a filter to an input resulting in one activation. Applying the same filter n times to an input results in the called feature map (a map of activations), indicating the detected features, for instance,  of a photo (Figure 3). Thus, in the convolution layers, the CNN model applies filters to extract the best features to use in the fully connected layers of its model.

Figure 3 - Applying a filter to an input, resulting in the feature map - image from freecodecamp
The pooling layers are used to reduce the spatial dimension of the found activations. We can apply different pooling types to our feature maps. In Figure 4, we can see the Max Pooling being applied, where we divide the feature map into 2x2 squares and keep with the maximum value.
Figure 4 - Applying the Max Pooling technique to a feature map - image from computersciencewiki
After a convolutional layer, we always add a pooling layer, and if the model has several convolutional layers (CL) each one will be followed by a pooling layer (PL) (CL, PL, CL, PL, CL, PL,...). Then we apply a flattening layer (FL) to reduce the spatial dimension again, so we can feed the fully connected layers. The last part of the CNN model is an ANN and uses the data resulting from the CL, PL and FL layers as its input to train with. Thanks to the convolutional layers, the CNN can "easily" categorize a picture, or identify objects inside of it.
Deep Learning is widely used and its use is still increasing drastically. Now we have lots and lots of data to train our models leading to neural networks able to classify, forecast, and predict quickly and with high accuracy values. CNNs can give us amazing results in image and video classification and object recognition. On the other hand, ANNs can also give us amazing classification systems using numerical data.

Different models of CNNs and ANNs are being applied and we are facing constant improvements in the Deep Learning field. Some systems can identify millions of different objects. What about the future? Do you think there will be systems that can identify any human-known object? How fascinating is that? And how scary could it be if we think on the evil side? 

Comments

Popular posts from this blog

How does COVID-19 continue to spread? - A simulation 2.0 (How it was built)

 Unfortunately, the days we are living right now are still bad, or even worse than ever. Millions of people are being killed by this "new virus", as they called it once. COVID-19 is here and will be among us for too long. Some of us thought, incorrectly, 2021 will be the year, we will have vaccines, that's it! No more problems related to COVID-19! Let's start living as before!  No, no, no! If you still think this way, please stop it right now. By not respecting the known procedures to avoid the COVID-19 infection you will keep the virus spreading chain. Consequently, the virus will kill more people, being them related to you or not. Many apparently  healthy humans are having severe "side effects" by getting infected with this virus. Stop thinking the virus provokes just flu and help to stop the spread!  Millions of healthcare professionals are giving their lives to help in this war. You are neglecting them and all the people around you! Keep yourself safe

Artificial Intelligence History

As you know, AI today is a widely used tool in every kind of systems. However, how did it start? We had only one inventor or more people had invested in AI? AI is a recent discovery? When it became so powerful and why? Today's post will put you up to date to the Artificial Intelligence History. Alan Turing Well, everything started alongside the Second World War. Sadly, some of the human's biggest discoveries occurred during wars.  In 1943,  Warren McCulloch and Walter Pitts presented an initial mathematical and computer model of the biological neuron [2].  There was 1950 when John Von Neumann and Alan Turing created the technology behind AI.  Turing created the called Bombe machine to decipher messages exchanged between the German forces. That system was the pillar of today's machine learning [1]. Turing was a huge impact in the Artificial Intelligence field, and still today some of his statements are updated and used.  Turing questioned the possible intelligence of a ma

How does COVID-19 continue to spread? - A simulation 2.0 (Results)

This post shows some of the results we can find by using the simulation. As in the first version I made some tests, now I focused the new tests on the travelling and vaccination processes. These two processes were added in the last simulation version and represent some critical behaviour and processes in the virus spread. Photo by Sharon McCutcheon on Unsplash Vaccination process impact Using the standard static configuration values we can find the following results: The vaccination process does not have a considerable impact if we close our borders. By not receiving new agents with the infection, the simulation reaches the number of 0 infected agents on the 38th day using a vaccination percentage of 0.1 If we increase the vaccination percentage to 0.9 the 0 infected agents threshold is reached on the 39th day. Thus, we can infer that if we control the flow of agents in a city/simulation, the vaccination process does not have a considerable impact as it takes some time until the people