Recently, we talked to our expert in data science development services Pavlo Tkachenko, one of the successive geometrical transformation machine (SGTM) co-developers, to find out what's so special about high-speed neural network technology, how it works, and how businesses from versatile domains can benefit from utilizing it.
The theoretical background of SGTM technology was born in the early 2000s. SGTM technology fulfills the principles of artificial intelligence; namely, it embodies the original neural network paradigm, which is based on the geometric interpretation of the network training process.
STGM has been used to optimize business operations in many projects, starting from agricultural enterprises to the fashion industry. Keep reading to get on the inside of SGTM high-speed neural networks and learn how applying them can help your business.
Q: What are high-speed neural networks? What other types of neural networks are there?
Pavlo: The development of neural networks has been inspired by information on how biological neurons and our brain work. There's an approach in cognitive science called connectionism that deals with questions of how mental phenomena can be described by interconnected networks of simple and often uniform units.
The most useful feature of neural networks is the ability to learn from examples in order to generalize the information processed. This means we can show a neural network some objects, for example, an apple and a pear, and train this network to differentiate these objects.
The majority of neural networks solve one of five types of tasks:
- Prognostication
- Prediction
- Classification
- Clusterization
- Association
Almost all existing neural networks work perfectly for one type of task and aren't appropriate for other tasks. The reason for this is simple — the architecture of each individual network is developed for solving a particular task. For example, networks that process visual information (photo or video) can't make forecasts for time series.
The high-speed neural network technology we're going to discuss is called SGTM, or Successive Geometrical Transformation Machine. This technology is a geometrical interpretation of the process of training a network. The process of training can be described as a sequence of orthogonal transformations of the hyperbody of the modeling object in its multidimensional feature space.
The goal of training any type of neural network is to determine a set of neuron weights which will ensure the production of desired network output values, according to the samples presented in the training dataset.
To describe the processes occurring inside a neural network, you can imagine a chain of neurons that are united into layers; each neuron of the following layer is connected with the neurons of the previous layer. While determining the weight of neurons, you have to ensure the transformation of the incoming data into the expected output.
The ways to determine those weights vary for different types of neural networks.
The first important distinction is whether the process of training is iterative or non-iterative. In the process of designing iterative neural networks, the signal is spread several times across all the layers. This means, to train such a network, you have to make several iterations to distribute the signal from input layer to output layer.
One of the core tasks of training a neural network is to determine the values of neurons' weights to accurately reproduce the output value based on the input data.
Let's take a closer look at the training procedure of an iterative neural network.
During the first iteration, the signal is spread across the neurons’ layers, providing us with some output value. Next, you need to compare the value you received with the expected output value.
Now, you can determine how big the error of the produced output value is. Taking into account the error limit, you carry out the next iteration, while at the same time adjusting the value of the neuron weights to ensure a minimal error limit in order to receive the desired output value produced by the network. Each iteration of signal propagation over a network is called an epoch.
In simple words, an iterative neural network requires many repeated procedures to reach a minimal error limit. This type of neural network training is time-consuming, as you have to conduct a multitude of epochs (hundreds or thousands) to successfully train a network.
Our SGTM network isn't iterative, which means the identification of neuron weights occurs through a determined number of steps (one epoch).
We always know in advance how many steps it will take to train a network. That's why this type of neural network is characterized as high-speed. They are non-iterative and can therefore be trained quickly.
Another drawback of classic neural networks is a random weight initialization on the first epoch of training. At the start of network training, the input neuron weight gets random values. During the next training epochs that correspond to a training algorithm, we correct these weight values, trying to get the output results as close as possible to the output samples presented in the training dataset.
However, having a random initialization means the defined neuron weights will be slightly different at the end of each training attempt. Though the dataset as well as the neural network topology are the same, each successive experiment will bring a slightly different result on your test dataset. Thus, to understand the real "power" of a built network, you should conduct a series of experiments under the same conditions and at the end receive an average result as the answer.
Let's take a look at a simple example: we classify apples and pears. After the first epoch, we received 95.1% accuracy. In the next iteration, using the same selection, we got 94.9% accuracy. For the third time, we received 95.8% accuracy. Based on this series of experiments, we can conclude that the neural network provides an average of 95.3% accuracy.
Q: So how long would the iterative process of training a classical neural network take?
Pavlo: Training a neural network in iterations is inconvenient. Depending on the size of the dataset, it can take from half a day to several days to train a neural network.
Q: And what would the training process look like for an SGTM network?
Pavlo: You won't face the described issue with our SGTM network. Using the same dataset and network topology, you will receive similar results, as the initialization of neuron weight doesn't happen randomly. You will receive the fully repeated results of the experiment, which means zero uncertainty.
SGTM is a geometrical interpretation of the training process. Any object can be represented as an object as well as its realization (object in a particular state). It can be displayed as a point in a multidimensional space of the object's features.
Q: Could you describe the main idea of how the SGTM network works?
Pavlo: Sure. The idea of SGTM (like many other ML methods) is based on the hypothesis of compactness, formulated by Emmanuel Braverman in 1961: realizations of one and the same pattern are commonly represented in the space of features as closely spaced points, forming "compact" clusters. Let's imagine the modeling object is the totality of 100,000 random people. We can describe each person with three characteristics — age, weight, and height. We build a three-dimensional space in which each of the axes corresponds to a separate characteristic. In this case, each person is represented as a dot in this three-dimensional space.
These dots won't be dispersed chaotically. You can imagine it as a cloud with a clearly visible center represented by the largest density of dots. This cloud will have its size and form and can be also called the hyperbody of the examined object (the totality of all those people).
With each following step of training an SGTM neural network, the hyperspace is decreased by one dimension. Let's imagine our hyper-body has the shape of a "zucchini." We look for the center of the "zucchini" and search for dots that are most remote from the center. Then we draw a plane across the centroid area. The distances of the projections should be as large as possible at each step of designing a neural network.
If we start with a three-dimensional space, we have to project all dots of the "zucchini" on the plane drawn across its center, while also decreasing the space by one dimension. It's also important to remember the distances of the projections to each of the dots.
Now, when we are in a two-dimensional space, we still see a cloud of dots on the plane. Our next step is to decrease the space by one unit. We search a centroid of a cloud, find the most distant dots from the centroid, draw a line across the center, and project all dots on this line. Similar to the previous step, we have to remember the distances of the projections.
Finally, when all dots lie on one line, we determine their mass center and keep in mind the distances of these projections. In the last step, the "zucchini" loses its shape and looks like one dot.
All the patterns and correlations available in the processed dataset are now represented in the trained neuron weights defined by calculated projection distances. Now we can, for example, predict the age, weight, and height of a person.
Q: In what ways are high-speed neural networks better than other types of neural networks? What are their benefits?
Pavlo: There are five key benefits of the high-speed SGTM neural network:
- Quick performance
- Repeatability of training and testing results, which allows discovering anomalies in data
- A minimal amount of data provides good results
- High-speed training enables easy operations with a large number of input variables and the possibility to build complex nonlinear network topologies
- They’re "gray boxes" instead of "black boxes," which enables principal (orthogonal) component extraction, ensuring deeper understanding of the discovered data and input weights
- Supervised and unsupervised training modes
We have already discussed the first two points in the previous answer, so let's now focus on the following benefits.
A minimal amount of data provides maximum results
The majority of classic neural networks are black boxes. This means the observer doesn't know what occurs inside the network and has no way to learn it. You can only see how changing the input values affects the output. The majority of neural networks are closed to the observer; thus, the structure of the neural network being trained isn't obvious.
In contrast to classic neural networks, the SGTM neural network is a gray box. In the process of training, we have the opportunity to extract the principal components (PC). PC represents the directions of the data, describing the maximum volume of variations with the largest amount of information from this data.
Concerning real physical objects, any of their qualities will correlate with each other, e.g., age, weight, and height. We can always trace a correlation between those parameters — the lesser the weight, the lesser the height.
The main components are the transformed input values that are decorrelated between one another. On each step of training, we make an orthogonal projection which means the core components extracted on each step are independent of the following ones.
Having these principal components enables a multitude of opportunities for further developments. For example, we get the possibility to display the same amount of information using a smaller number of core components.
Let's imagine an object described with 20 features. These features correlate with one another, and depending on the strength of these correlations in the initial dataset, we can reproduce the same information with the help of the core 10 components. With a smaller amount of data, we provide the same output.
Ability to exclude noise from input data
Any physical processes we can describe in any way can be observed with the help of some sensors or measuring devices. Because all sensory technologies developed by humans aren't perfect, it's crucial to keep in mind that whatever data you're using as an input value will always involve a share of noise. All varieties of information are collected with human-made measuring devices, meaning they sometimes fail and provide erroneous data as a result.
As I’ve already said, it leads to any data that describes real physical processes being inaccurate to some extent. And if we are designing a neural network using inaccurate data, these inaccuracies affect the results of the training process.
The neural network can't tell accurate data from inaccurate ones. By default, it perceives all information as useful. The neural network adjusts its weights in a way to reproduce the data most efficiently, including the inaccuracies that are part of the initial data.
With SGTM, things are different. We can present the initial data in the form of principal components. Having done this transformation, we receive one of the PCs that reflects all inaccuracies presented in the data. Now we can exclude this noisy component from the output signal formation procedure and make our network more efficient.
This means that we can use contradictory datasets, and after eliminating the noisy PC, we will receive better results.
Another point is the training process with an SGTM high-speed neural network is quick, so we get an opportunity to painlessly grow the number of input datasets used to solve the problem.
If we have an iterative training process and add 10 additional input values, it will make training each of the epochs longer. The complexity of each of the training steps will increase too.
In this case, extending the number of input values isn't viable, as it will prolong the training process considerably. The only efficient solution for classic neural networks is to limit the number of input features while ignoring doubtful ones.
Our core goal should be to receive results utilizing an acceptable amount of resources. While iterative networks are bad at using large datasets, non-iterative networks like SGTM provide the opportunity to expand the amount of input data and take into account all potentially viable information.
Ability to use complex nonlinear neural network architecture
I would also like to discuss another benefit of high-speed neural networks. SGTM enables us to use complex nonlinear neural network architecture designs. This means each process/object can be approximated using linear or nonlinear methods.
Let's come back to the cloud of dots in the form of a "zucchini." We can search linear correlations between input values as well as nonlinear correlations between input and output qualities. With linear correlations, it is always something simple. For example, we can find a linear correlation between height and weight.
Ideal weight index is: 22 × (height in meters − 10 cm)2. We can also find a nonlinear correlation between height and weight, which will be described with a more complex formula.
While linear approximation is something averaged, with the help of nonlinear approximation we can clearly reproduce more complex dependencies. A neural network that is designed in a way to search for linear correlations between input and outcome features will work faster than a network that searches for nonlinear correlations between input and output values.
Correspondingly, linear correlations work quickly and not precisely, while nonlinear ones work longer but more precisely. However, having a non-iterative training process enables us to have repeatable results each time, meaning we can approximate nonlinear correlations between input and output information more easily.
To sum up, SGTM is a comprehensive methodological basis for solving all types of tasks quickly and efficiently.
Q: When should high-speed neural networks be implemented? Which businesses and projects can get the most benefits from them?
Pavlo: The use of high-speed neural networks is limitless! Several LS clients have already benefited from applying high-speed neural networks in their businesses. No matter whether you're in the fashion or agriculture domain, you can take advantage of this technology and boost your business.
For example, we helped one of our customers develop a technology of contactless pig weighing with the help of a stereo camera. To build this solution, we used the SGTM network to define correlations between the physical size of a pig's body and its weight.
For this, we collected a large dataset with photos and weights of pigs and designed a neural network to produce a formula that would determine the weight of a pig by assessing its physical dimensions.
Simultaneously, another expert built a mathematical model to detect dependencies using the regression technique. As a result, the SGTM neural network provided an even more precise result.
The next project where it was used was a recommendation engine — an information system that offers content to fashion portal visitors. By analyzing readers' behavior and various demographic features, we determined the most relevant and individualized content for readers. In this case, we used the PC to get the opportunity to describe the unique characteristics of each user.
The takeaway
High-speed neural networks can optimize your business performance, regardless of your domain. This smart technology will make your product smarter, automate your business operations, enable faster performance, and provide you with repeatable outcomes.
Want to take advantage of a high-speed neural network for your business? Lemberg Solutions has a proven track record of using this technology for projects with versatile specializations. Get in touch with Lou Dutko, our CTO, through the contact form, and he'll get back to you shortly to discuss your requirements and plan our further steps.