1. Home/
  2. Blog/
  3. How we perceive visual complexity and how its analysis helps solve business tasks

How we perceive visual complexity and how its analysis helps solve business tasks

How we perceive visual complexity and how its analysis helps solve business tasks

Humans consume and process tons of complex visual information every second.
How do we perceive complex information? How complex is it to analyze what we see?

By assessing the complexity of perception, we can solve many business tasks. In some cases, when there are many objects in a scene, we need only to estimate how complex it is by viewing the picture as a whole. There is no need for extra resources to recognize each object separately.

Visual complexity perception is an important question in psychology, cognitive science, and computer vision. It is critical for comprehension of human perception mechanisms and the perceived objects.

How complex is this scene, and what is complexity? Keep reading to find the answers in the following chapters of this article.
How complex is this scene, and what is complexity? Keep reading to find the answers in the following chapters of this article.

How we perceive a scene

Visual perception is an active process that includes a series of eye fixations and saccades — fast eye movements made to switch visual fixation from one part of the visual field to another.

And there is a big difference between how we perceive an individual object and a scene. The scene perception requires much more effort to process multiple objects with their properties, how these objects relate to each other, and the plot of the scene in general.

Humans have an amazing ability to process and analyze complex real-world scenes in the blink of an eye. It takes only about 100—120 ms to get the gist. This time is sufficient for the recognition of a couple of key objects, which is enough to understand the main idea of the scene.

It takes only 100—120 ms to recognize the key objects in the scene and understand the gist.
It takes only 100—120 ms to recognize the key objects in the scene and understand the gist.

First, we get a rough map-like representation of the scene. Then we process the object structure of the most noticeable objects, the layout, and the gist, which is the most abstract property of the scene. It implies the common knowledge of the surrounding world and how the objects can relate to each other.

There are two basic types of processing the information we receive. They are called top-down and bottom-up processing.

In the previous article about human attention, we mostly mentioned the perception of visual information from the point of bottom-up processing. Now, let’s focus more on the top-down processing, though the visual complexity perception definitely includes both.

Top-down processing allows us to process the incoming information and systemize it faster based on our knowledge about the world around us. We group similar objects in the scene together and perceive them as a single class.

It also empowers us to reconstruct the pieces of information that are hidden or unavailable using the former experience.

Visual complexity perception embraces the objective features of objects, as well as people’s subjective knowledge.

The prior knowledge shapes common patterns and forms certain expectations about them. Thus, our visual system can instantly catch the gist of the scene without having time to pay attention.

But at the same time, our sight is limited as well as the amount of data our brain can process at a time, the picture we’re looking at can be ambiguous and incomplete, the scenes may change too fast, etc.

It can be tricky as well. Based on personal experience, human attention can be objective only to a certain extent. It's usually biased.

Even the same person's interpretation of the image may vary depending on the time, light, emotional state, etc.

One day, you see a duck, the other day, you see a rabbit, as in the ambiguous image shown below.

A good old example of an ambiguous image of a rabbit-duck illusion.
A good old example of an ambiguous image of a rabbit-duck illusion.

When the information is not complete or consistent, our brain tends to compensate for it and creates illusions. And we also tend to see things the way we want to see them.

But the above-mentioned examples are about the anomalies that we don’t face too often. In real life and when watching videos about real life, people mostly deal with ordinary situations where our common knowledge helps us successfully perceive the information.

Evaluating the complexity of the surrounding scene is natural for humans. We do it every day completing our habitual visual tasks even without noticing it.

What makes a scene complex

Before getting to the properties of a complex scene, let’s try to define what the visual complexity actually is.

Complexity is... well... a complex term. There is no single, clear, and consistent definition of it.

Roughly summarizing it all, it’s about how hard it is to perceive the scene due to the number and organization of objects, the intricacy of their forms, level of detail, and variety of colors.

Visual complexity perception embraces the objective features of objects, as well as people’s subjective knowledge.

So what are the properties of the scene that influence the perception complexity?

Among the main properties defining the scene as high-complex or low-complex are the number of objects, the spatial layout, symmetry, clutter, and changes in color.

Scenes with fewer objects and a more regular or symmetric organization are easier to analyze.

One of the key characteristics that make a scene complex for human perception is the presence of multiple objects in the scene.

The more objects are in the scene, the more time and effort it takes for our visual system to process the scene.

But if the objects are of the same kind, our brain successfully does the grouping job — and we perceive a group of similar objects as a whole. For example, an image of a crowd includes many objects, but they all can be classified as a single group of people, which makes the scene not so complex for our perception.

The crowd scene is not very complex for human perception because it includes multiple objects that are similar and can be easily grouped.
The crowd scene is not very complex for human perception because it includes multiple objects that are similar and can be easily grouped.

And when the objects are various and unique and cannot be grouped, it makes the perception of the scene much more complex, requiring more resources to process it.

Take a look at the image of the living room below. You can admit that the presence of multiple objects of different kinds demands more concentration to perceive the image compared to the scene with the crowd above.

The room represents a high-complexity scene with multiple unique objects.
The room represents a high-complexity scene with multiple unique objects.

And the research based on texture images showed that understandability played the main role in the complexity evaluation, which means that prior knowledge is important in human perception.

And here are some examples of our experiments with scene complexity analysis.
Processing various scenes with the help of computer vision algorithms gave the following results.
A scene with white closing credits on a black background has the lowest complexity rate among the images reviewed below.

Here the scene with closing credits displays relatively low complexity. But in some cases, scenes in the movie may have even lower complexity than this.
Here the scene with closing credits displays relatively low complexity. But in some cases, scenes in the movie may have even lower complexity than this.

The following ballet dance scene turned out to be only 1.6 times more complex than the previous image with end credits. Here and further in this review, the numbers are only given as a rough approximation to convey the main idea in an easy and vivid way.

The ballet dance scene has a relatively low complexity rate because it’s easy to perceive: few objects, plain background, limited color spectrum, etc.
The ballet dance scene has a relatively low complexity rate because it’s easy to perceive: few objects, plain background, limited color spectrum, etc.

The office scene below is 2.8 times more complex than the ballet dance scene, which is not surprising due to the many more details presented in the scene.

The scenes with many objects are perceived as more complex.
The scenes with many objects are perceived as more complex.

The next sunset scene has a complexity rate 1.5 times higher than the one in the office and four times higher than the image with ballet dancers.

The sunset scene, in addition to objects, is characterized by a greater variety of colors, shades, and luminance.
The sunset scene, in addition to objects, is characterized by a greater variety of colors, shades, and luminance.

A scene with a flamenco dancer turned out to be 7.8 times more complex than that with the ballet dancers and almost twice as complex as the sunset.

More refined details and textures increase the scene complexity as in the flamenco scene.
More refined details and textures increase the scene complexity as in the flamenco scene.

And the next forest lake view turned out to be the most complex scene among all the mentioned above. In comparison with the previous image, its complexity is 1.8 higher in rate.

The forest lake scene was estimated as the most complex one among the images reviewed above.
The forest lake scene was estimated as the most complex one among the images reviewed above.

Why does the last image win the perception complexity contest?
The variety of objects, their number, layout, colors with shades, luminance, and textures — all make a scene more complex for perception not only by a human but also a machine.

As mentioned in the introduction, evaluating how complex the visual scene is can help manage resources wiser, make web interfaces more attractive for users, or estimate if the industrial process goes smoothly.

It makes no sense to recognize each visitor’s face and then count the number of unique faces — it’s enough to estimate how complex the scene is.

Let’s take, for example, a case when using CCTV data. You need to estimate if the exhibition hall is overcrowded or there aren’t many people. It makes no sense to recognize each visitor’s face and then count the number of unique faces — it’s enough to estimate how complex the scene is.

In Cognitive Mill, we use scene complexity evaluation in our Cognitive Mill™ product to differentiate meaningful content related to the plot from the plain scenes with closing credits that can be safely skipped without the risk of missing something important.
As the text itself and the way it is organized can be complex for perception, we don’t consider it during the scene evaluation; we estimate the complexity of the background only.

Below is an example of how Cognitive Mill™ works. You can check it out at run.cognitivemill.com.

In the Cognitive Mill visualizer, you can view how Cognitive Mill™ works.
In the Cognitive Mill visualizer, you can view how Cognitive Mill™ works.

This is one of the cases that show that the knowledge of how to emulate human complexity perception mechanisms provides an efficient solution for high-level business problems.

Let’s sum it up

Perception complexity is about how hard it is to perceive the scene.
Visual complexity perception considers objects' features and people’s subjective knowledge.

The presence of multiple unique objects, random organization of objects within the scene, and variety of colors, luminance, and textures — all make the scene perception more complex for humans and machines.

But for humans, evaluating the complexity of the scene is a default feature. To enable machines to do the same, scientists have developed various approaches.
We at AIHunters use computer vision algorithms to estimate scene complexity.

Evaluation of scene complexity provides solutions for various business tasks when we need the big picture without the details. It helps businesses avoid extra costs and manage resources more efficiently.

Related Insights

Read all