
How Audio Description Is Made: What It Takes to Produce AD for Broadcast and Streaming
If you are buying audio description for the first time, or evaluating whether your current AD vendor is doing a good job, it helps to understand how AD is actually produced. The process has more steps than most people expect, and each one affects quality, cost, and turnaround.
This is how it works.
What audio description is (quickly)
AD is a separate audio track that narrates visual information for blind and partially sighted viewers. It describes what is happening on screen (actions, expressions, scene changes, on-screen text) in the gaps between dialogue. The narration is mixed into the existing soundtrack so viewers hear it alongside the original audio.
AD is not a transcript. It is not subtitles read aloud. It is written narration, timed to fit specific windows in the dialogue, voiced by a narrator, and mixed by a sound engineer. Each of those steps requires a different skill.
For a breakdown of where AD is legally required, see our guide: Audio Description and Subtitling Requirements in Europe (link to /blog-post/audio-description-requirements-europe).
Step 1: Watching and spotting
The describer watches the full programme and identifies where AD can be inserted. These are the gaps between dialogue. Pauses, scene transitions, moments where nobody is speaking.
This is not straightforward. A 60-minute drama might have only 8 to 12 minutes of available gaps. An action sequence with constant dialogue may have almost none. The describer has to decide what visual information is most important and what can be left out, because there is never enough time to describe everything.
Spotting determines the timecodes for each AD insertion point. These timecodes become the framework for the script.
Step 2: Writing the script
The describer writes narration for each insertion point. Every line has to fit within the available gap. Sometimes just 3 to 5 seconds. The writing needs to be precise. No wasted words.
What gets described:
- Character actions and movements that are not obvious from the dialogue.
- Facial expressions that affect meaning (a character says "I'm fine" while visibly shaking).
- Scene changes, location shifts, and time jumps.
- On-screen text: titles, signs, phone screens, news tickers.
- Visual details relevant to the plot that a sighted viewer would notice.
What does not get described:
- Anything already conveyed by the dialogue or soundtrack.
- Subjective interpretation. Do not assign emotions to characters. Instead of writing "she looks sad," describe what you see. "Tears stream down her cheeks." The viewer draws their own conclusion from the context.
- Information that would spoil what comes next. If a character appears behind a door and the audience is meant to be surprised, the AD should not reveal them early.
- Camera movements or film jargon, unless a specific visual technique is relevant to the story.
Major streaming platforms require AD to be written in present tense, third person. Always. Even during flashbacks. Characters should not be named until the programme names them, unless they are well-known public figures or the scene would be confusing without it. If gender is unknown, use "they/them."
The tone of the script matches the genre. A horror scene needs pacing that preserves suspense. Description should account for intentional pauses and dramatic silences. A children's programme gets simpler vocabulary and a more intimate tone. Vocabulary should fit the audience, genre, and tone, and the describer should use precise verbs rather than bland ones with adverbs. "She strides" is better than "she walks quickly."
A scriptwriter typically completes about 60 minutes of description per day, according to Voquent. For complex or dialogue-heavy content, it takes longer.
Step 3: Voice recording
A narrator records the script. This can be done in a studio or remotely, depending on the vendor and the project.
The narrator needs to:
- Match the tone of the programme. The AD voice quality should match the dominant mood of the content. A mellifluous voice for a love story, a grittier voice for a western. The narrator's delivery must match the tone of the material, and their accent should reflect the predominant accent in the programme.
- Read each line within the exact timecoded window, down to the fraction of a second.
- Maintain consistent volume and pacing across the entire programme.
- Not draw attention to themselves. The narrator should be distinguishable from character voices but should not be distracting or over-animated.
Most major platforms require the same voice talent across all episodes and seasons of a series. Changing the narrator mid-series breaks continuity for regular viewers. If you are commissioning AD for a multi-season show, lock in the narrator early.
Some vendors use synthetic (AI-generated) voices for AD. This is cheaper and faster, but the quality is noticeably different. Synthetic voices handle factual, descriptive content reasonably well. They struggle with pacing, emotional nuance, and timing. For broadcast and premium streaming content, most regulators and broadcasters still expect human narration.
Step 4: Mixing and QA
The recorded AD track is mixed with the original programme audio. The sound engineer adjusts levels so the description is audible without drowning out dialogue or music. In some cases, the original audio is slightly ducked (lowered in volume) during AD passages.
For a 5.1 mix, standard practice is to dip the centre channel only during AD. For very loud sections, you can dip left and right channels too, but generally no more than -6dB, and only up to -12dB when absolutely necessary. The mix should transition to and from dips in no more than 5 seconds, with no abrupt level changes. AD should be mixed to blend conversationally with the programme. Automatic ducking or unmixed deliverables are not accepted by major platforms.
QA checks include:
- Does the AD fit within the timecoded gaps without overlapping dialogue?
- Is the volume balanced against the original soundtrack?
- Does the narration match what is actually on screen? (Scripts can go out of sync if the edit changes after the AD was written.)
- Are there any technical issues (clicks, pops, background noise in the recording)?
The finished AD is delivered as a separate audio track (usually WAV or MP3) or mixed into an alternate version of the programme file, depending on how the platform handles AD playback.
How long the whole process takes
For a standard 60-minute programme with human narration:
- Spotting and scripting: 1 to 2 days
- Recording: half a day to 1 day
- Mixing and QA: half a day to 1 day
- Total: roughly 3 to 5 working days per hour of finished content
Industry pricing for traditional human-narrated AD ranges from $15 to $75 per finished minute, depending on the language, complexity, and vendor. AI-narrated AD starts around $16 per finished minute for scriptwriting and delivery combined.
For large catalogues (hundreds or thousands of hours), the main bottleneck is not the technology. It is the availability of trained describers who can work in the right language. Qualified AD writers in major European languages (English, French, German, Spanish, Italian) are in demand. For smaller languages or regional dialects, lead times can stretch to weeks.
What major streaming platforms require
If you deliver content to major streaming platforms, your AD must meet their specific style guides. Non-compliant AD gets rejected.
Requirements that commonly catch vendors out:
Characters and identity. Most platforms require physical descriptions of characters (skin colour, hair, build, age range, visible disabilities) described consistently for all main characters, not selectively. "Person first" language is standard ("a person who uses a wheelchair," never "wheelchair-bound"). Factual description of visual attributes is required. Guessing racial or gender identity when it is not established in the plot is not allowed.
Foreign language and subtitles. When on-screen dialogue is in a foreign language and subtitled, the AD must read the subtitles aloud. The word "subtitle" should be stated the first time they appear. For heavily subtitled content, an additional voice may be needed to differentiate speakers. The original dialogue audio is typically dipped so viewers can hear both the AD reading and the original language underneath.
Credits. Opening and closing credits must be described. If there is not enough time during the opening sequence (because of simultaneous action), defer them and read "remaining opening credits" after the first black. Prioritise creator, writer, director, main cast, producer, and director of photography. The AD post-house, scriptwriter, and narrator should be credited within the AD track.
On-screen text. Phone screens, signs, news tickers, narrative titles ("Three months later," "Berlin, 1942"). All of these must be described if relevant to the plot. The narrator can either read the text verbatim or weave it into the description ("Downtown Los Angeles, 1929. John drives down a narrow street.").
Each platform publishes its own style guide with specific requirements. If you are commissioning AD for streaming delivery, make sure your vendor works to the right one.
What affects quality
Three things separate good AD from bad AD:
The describer. AD writing is a specialist skill. A good describer knows what to leave out as much as what to include. They write for the ear, not the page. They understand pacing, narrative structure, and how their words interact with the existing soundtrack. Experience matters more than tools.
The timing. AD that is even slightly out of sync (starting too early, running too long, overlapping with the next line of dialogue) breaks the viewing experience. Precision at the timecode level is non-negotiable.
The narrator. A voice that is too flat sounds robotic. A voice that is too dramatic competes with the programme. The narrator needs to match the tone without imposing their own performance. This is harder than it sounds.
One vendor or many?
If you produce AD across multiple languages and territories, you will either use a single vendor with a multilingual network or manage multiple vendors per market.
Single vendor: consistent quality standards, one point of contact, simpler QA. You get the same process and the same quality benchmarks regardless of language. The trade-off is that you depend on one partner's capacity and describer network.
Multiple vendors: gives you flexibility and access to local specialists. The trade-off is coordination. You need shared style guides, shared timecode formats, and someone checking quality across all of them. Without that, you get inconsistent AD across territories.
What Includio does
We deliver audio description and SDH for national broadcasters, distributors, production houses, and enterprises across Europe. 400+ professional describers across 42 languages. 325,000+ minutes of audio description delivered. One workflow for everything: AD, SDH, sign language, translations.
If you need AD at scale across territories, get in touch.





.png)