**Deep neural networks**, a type of Artificial Intelligence began outperforming standard algorithms 10 years ago.

The majority of artificial intelligence (AI) is a game of numbers. Deep neural networks, a type of AI that learns to recognize patterns in data, began outperforming standard algorithms 10 years ago because we ultimately had enough data and processing capabilities to fully utilize them.

Today’s neural nets are even more data and power-hungry. Training them necessitates fine-tuning the values of millions, if not billions, of parameters that define these networks and represent the strength of interconnections between artificial neurons. The goal is to obtain near-ideal settings for them, a process called optimization, but teaching the networks to get there is difficult.

**Getting Hyper**

At the moment, the most effective approaches for training and improving convolutional neural networks are variations on a process known as stochastic gradient descent (SGD). Training entails reducing the network’s errors on a certain job, such as image recognition. An SGD method processes a large amount of labeled data in order to alter the network’s settings and reduce errors, or loss. Gradient descent is the repeated process of descending from high loss function values to some lower limit that indicates good enough (or, in some cases, the best feasible) parameter values.

However, this strategy is only effective if you have a network to improve. Engineers must depend on intuitions and rules of thumb to construct the initial neural network, which is often made up of numerous layers of artificial neurons that connect from an intake to a production. The number of layers of neurons, the number of neurons per layer, and other factors can all differ between these layouts.

Gradient descent guides a network through its “loss landscape,” with higher values representing greater errors or loss. To reduce loss, the algorithm seeks the global lower limit.

A graph hyper network begins with any design that has to be optimized (dubbed the candidate). It then attempts to forecast the best parameters for the candidate. The team then adjusts the parameters of a real neural network to the projected values and runs it through a task-specific test. Ren’s team demonstrated how to utilize this strategy to rank candidate architectures and select the best performer.

**Training the Trainer**

Knyazev and his partners dubbed their hyper network GHN-2, and it improves on two key characteristics of Ren and partners’ graph hyper network.

Firstly, they relied on Ren’s technique of representing a neural network’s architecture as a tree. Each node in the graph represents a subset of neurons that do a specific sort of computation. The graph’s edges show how information moves from source node to destination node, from input to output.

The process of training the hyper network to produce predictions for new candidate designs was the second notion they drew on. This needs the inclusion of two additional neural networks. The first allows calculations on the original potential graph, resulting in modifications to the information associated with each node, whereas the second takes the modified nodes as input and forecasts the settings for the proposed neural network’s computational elements. These 2 networks each have their own set of conditions that must be modified before the hyper network can accurately forecast model parameters.

You’ll need training data to do this, which in this case is a random sampling of various artificial neural network (ANN) topologies. You begin with a graph for each design in the sample, then use the tree hyper network to forecast parameters and establish the candidate ANN with the predicted variables. The ANN is then trained to execute a specific task, like image recognition. You compute the ANN’s loss and then, rather than adjusting the ANN’s parameters to produce a better forecast, you alter the settings of the hyper network that produced the prediction in the first place. This helps the hyper network to perform better the next time. Iterate over each picture in some training examples data sequence of pictures and each ANN in the random selection of topologies, lowering the loss at each step until it can no longer perform better. You eventually end up with a trained hyper network.

Because Ren’s team did not provide their source code, Knyazev’s team adopted these concepts and created their own application from scratch. Knyazev and his colleagues then improved on it. To begin, they identified 15 different types of nodes that may be combined and matched to build practically any modern deep neural network. They also made a number of advancements to boost prediction accuracy.