The future of AI for media and entertainment: the decision-making factor

November 11, 2022 · 9 min read
Pavel Saskovec Technical Writer
Proofread by the expert: Oleg Gubin

Since the early inception of Artificial Intelligence, could anybody imagine the scope at which we would implement it? Almost every major sphere of our lives now features some sort of AI automation.

But the situation is a bit different with the media and entertainment industry. Here, the AI implementation is limited to the simplest forms and severely lags behind its counterparts.

Today we try to see what the future of Artificial Intelligence in the media holds for us.

Where is Artificial Intelligence currently?

Let’s make some things straight before going into the discussion.

AI, though might be perceived as such, is not a single piece of tech that does all the things you would imagine it to do. On the contrary, it is rather an umbrella term that houses a huge number of technologies, each of those serving a separate function.

Natural language processing can recognize speech, object recognition can see stuff, deep learning processes the incoming data to recover insights, and so on.

We just call it AI for convenience.

But all of those technologies have found their place in our everyday lives on a variety of levels. They help to make our jobs more effective and effortless.

We have robots dealing with tech support issues, smart cameras unlocking our phones and computers, sensors keeping track of equipment performance, and so on.

AI is quite popular for businesses.
AI is quite popular for businesses.

Actually, let’s look closer at the things AI is doing for us now, and why it is important to the future of AI in media business.


The seed is already planted: there are driverless cars roaming the streets of certain cities. The tech still requires a ton of polish, but the engineers do crack those nuts pretty well, figuring out the quirks and overcoming obstacles.

Imagine getting and Uber and seeing that on the screen: 78GH:X# is 5 minutes away.
Imagine getting and Uber and seeing that on the screen: 78GH:”X# is 5 minutes away.

It’s only a matter of time when you won't have to chat with your driver any more. And you will control the music. Hurray!


Here, the automation stretches over several workflows helping humans increase the production efficiency. I bet the first thing that comes to your mind is automated assembly lines which use AI tech to produce goods with little to no involvement from humans.

For many assembly lines, humans are only required to observe and maintain the machines. They are out of danger.
For many assembly lines, humans are only required to observe and maintain the machines. They are out of danger.

And that is a very effective application of technology, and why should it stop there?

You also have smart warehouse management, where people in charge can easily locate the package they need with the help of software and smart sensors placed around the said warehouse.


AI algorithms enable fast and accurate diagnosis with the help of a wider pool of data. There is no data that it does not account for, no human-ish mistakes and all that jazz.

AI tech does not miss data. It uses all of it for the most accurate health evaluation.
AI tech does not miss data. It uses all of it for the most accurate health evaluation.

You get the most effective treatments, the most thorough evaluations and check-ups.


We can use AI to digitize textbooks for students and teachers, making it easier for them to access and exchange the information they need.

Given the adoption of remote learning, it is certainly a step towards more accessible learning.

But then we got digital education platforms which can benefit a great deal from AI technology. For those platforms hosting courses, textbooks, and other education materials, the hosts can use AI to analyze which content a certain user interacts with the most, and offer other materials.

On top of that, automated AI systems are more than capable of evaluating the students’ knowledge. The AI may offer some additional content the student can get into to improve their performance.

Customer service

I guess we are all accustomed to AI bots dealing with our issues in support, processing our orders and whatnot.

But as the tech gets more complex, we get more cool things out of it.

For instance, the Google Assistant on PixelsGoogle’s own flavor of smartphone — can not just set reminders and tell you weather.

On Pixel, Google assistant can screen the call for you: if you get a call and don’t feel like answering, you can have Google do that. The assistant will notify the caller that it is screening the call, and you get a transcription of the convo in real time.

Get a Google Pixel and you won’t have to deal with spammers or telemarketers anymore.
Get a Google Pixel and you won’t have to deal with spammers or telemarketers anymore.

And then it does even more.

Say you call up the support line. If it puts you on hold, you can have the assistant do that for you, and let you know when a person on the other end picks up.

And appointments. No more calling — the robot can do the whole thing for you, by placing a call. Like a real human.

Now, what else is there?

We are working with the media and entertainment. Why haven’t we talked about media and entertainment?

Well, let’s talk about media and entertainment

What is the major drawback for the AI future in media?

The thing is that when compared to the media industry, Artificial Intelligence used in many other spheres has more nuance to it.

You see, the car can drive all by itself not just by using neural networks to look for turn stops, crossings, other vehicles, and people and then “if-then-else-ing” it all.

It analyzes the data and makes an almost conscious decision to stop, turn, or move forward.

And that decision factor is what AI in the media misses. The factor is facilitated by more complex technology than deep learning and is crucial to the future of AI development.

Bounding box for you, bounding box for them, bounding box for everybody

Let’s see how media automation works in most cases at this moment.

Say you need to find and mark end credits so that the viewers can skip past them.

In accordance with the current development of AI in media, you apply a neural network that has been trained on data to identify the credits rolling.

And then what you get is a json file filled with things like probability scores for each frame of the video. One frame may be 32% credits, and the other one is 91%.

But what about the point where it is safe to skip the credits? You would still have to do that job by yourself, finding the frames and marking them as beginning and ending of credit sequences which can be skipped.

Here’s another popular example: you may need to cut the football game footage, choosing the best moments.

In that situation, you can also apply a neural network that is able to, let’s say, see the ball going into the goal area. As it finishes analyzing the video content, you get a file that — again — evaluates the individual frames with probability scores. That one is a 17% goal, and that one over there is 87%.

Is that a penalty? Idk, here’s a bounding box.
Is that a penalty? Idk, here’s a bounding box.

But those are just software putting bounding boxes on the objects they see. At the end of the day, it’s you who goes through the frames and cuts the highlights.

Artificial Intelligence, you say? More like Artificial Eyesight.

AI in its current form, if we’re talking strictly about media and entertainment, just transcribes the data from one form to another. You had a video, not you get a text with stamps.

We would argue that it’s not faster than having a human sit through a video, find the credits and mark them.

If we strive for true automation of the video analysis, we will have to automate the decision-making factor as well. That will make the media production process more seamless, effortless, and cost-effective — the way it already is for many other industries.

How do we extract the business value with technology?

We gotta keep in mind the business value those media companies are so concerned about. It appears that nobody wants to spend money on implementing tech that does not limit the involvement of humans.

Huh. Who would’ve thought.

We have already established that the primitive AI implementation only reformats the data it receives, the outcome of which is of limited benefit to business.

It can see the credits rolling on the screen, but it can’t say where you can skip them in order not to miss the end of story or a post-credit scene. That’s the feature that improves the viewing experience, so that’s where the business value is.

The decision factor provides that value, being the major milestone for the future of AI.

What do you think of the future of AI?

Let’s break down the whole process.

We can turn back to any of our previous examples — let’s say, cutting the football game into a highlight compilation.

If we were to implement a more advanced system to analyze the footage, we would have to divide the process into two major stages. Each stage would employ its own set of technologies to perform its function.

Let’s see what those stages are.

Representations stage

The first thing the AI system has to do is to “look” at the video content and identify the objects of interest in accordance with the chosen task. In our case, the software looks for any clues that would mean the particular moment is highlight-worthy: it may be players brawling, arguing with the coach, mascots doing their mascot things, touchdowns, home runs, and all of that.

The tools like deep learning, digital image processing, and cognitive computer vision facilitate the process.

As you can see, deep learning does not carry the whole operation on its own, it’s only used as one of the tools to do a very specific job.

Those are our “Artificial Eyesight” thingy.

But unlike the more primitive approaches to AI in the media, there is more to the process.

The precise recreation of the process.
The precise recreation of the process.

Decisions stage

This part is vital to the Artificial Intelligence future. Having collected the data, the AI system now can work its magic on it. Leveraging math modeling, probabilistic AI, and cognitive science, it is able to make sense of the context of the footage.

That means that the software does not just can see the ball going into the goal area. It can tell apart penalties, self goals, goalkeeper mistakes, three-pointers, and other things.

That’s the Intelligence part. The system can analyze the footage and understand the players’ movements, the layout of the field to correctly identify the in-game moments — and cut them into the appropriate compilations. Automatically, of course.

Is AI's future positive for the media industry?

So, Artificial Intelligence implementation in the media and entertainment industry is like 14 steps behind its counterparts in other spheres like transportation or healthcare.

The simple deep learning implementation that the AI boils down to here does not solve the problems of media companies, nor does it bring value to business.

Editors still end up spending time overseeing the AI work and editing the content manually.

That is why it is important to bring AI in the media up to speed. We believe that AI in the future will be used to its fullest potential, and the major component to this realization is the decision factor.

With the help of advanced technology, we can make software analyze the visuals and make sense of the context, which helps it make conscious decisions autonomously.

At the end of the day the system can work up to 50x faster than a human, with the same insight into the context, and with none of the mistakes.

Now, that is a good future for AI.

Related Insights

We use cookies to ensure that we give you the best experience on our website. Read cookies policies.