Captions, Transcripts, and Text Alternatives
This article is designed for professional organizations and large-scale content creation companies, such as gaming tournament organizers, video game news websites, esports teams, and gaming publishers. While the ideas are important for the entire gaming community, implementing these policies may be out of reach of individual streamers.
Article Overview:
Introduction: Why your organization needs to start prioritizing text-based alternatives
Definitions: Captions, subtitles, transcripts, and text-based alternatives
Who benefits from text-based alternatives?: A list of important uses for text-based alternatives in gaming spaces
Creating and using captions: General information on how to create and use captions
Creating and using transcripts: General information on how to create and use transcripts
Example cases: Specific strategies for a fictional podcast and a made-up tournament
Personal favorites: Some real-world examples of great text-based alternatives
Conclusion: A short summary
Personal note: An explanation of the article structure and links to information on D/deaf and HoH needs and culture
Introduction: Why your organization needs to start prioritizing text-based alternatives
Definitions: Captions, subtitles, transcripts, and text-based alternatives
Who benefits from text-based alternatives?: A list of important uses for text-based alternatives in gaming spaces
Creating and using captions: General information on how to create and use captions
Creating and using transcripts: General information on how to create and use transcripts
Example cases: Specific strategies for a fictional podcast and a made-up tournament
Personal favorites: Some real-world examples of great text-based alternatives
Conclusion: A short summary
Personal note: An explanation of the article structure and links to information on D/deaf and HoH needs and culture
Introduction
In most countries, captions are required for traditional media broadcasts on television. The same laws don't apply to video content that is streamed, livestreamed, uploaded, or otherwise made available only through the internet. Captioning is often seen as an unnecessary luxury, when it's considered at all.
This article aims to change that perspective. While text-based alternatives started as accessibility tools for D/deaf and Hard of Hearing (HoH) people, these tools are crucial to reaching huge parts of your potential audience and maximizing your impact on the most popular social media sites. As I've mentioned in other articles, increasing accessibility for one group usually increases overall enjoyment for a much larger part of your audience.
For example, on Facebook, videos play without sound as a default. To compensate, content creation companies put high-quality captions on their videos. Among a sea of static words, the moving words naturally catch the viewer's eye. This gives the content creation company another few seconds to turn a chance glance into an engaged viewer. In this article, I'll cover the various types of text-based alternatives and how you can use similar strategies to increase engagement across a range of platforms and viewer types.
This article aims to change that perspective. While text-based alternatives started as accessibility tools for D/deaf and Hard of Hearing (HoH) people, these tools are crucial to reaching huge parts of your potential audience and maximizing your impact on the most popular social media sites. As I've mentioned in other articles, increasing accessibility for one group usually increases overall enjoyment for a much larger part of your audience.
For example, on Facebook, videos play without sound as a default. To compensate, content creation companies put high-quality captions on their videos. Among a sea of static words, the moving words naturally catch the viewer's eye. This gives the content creation company another few seconds to turn a chance glance into an engaged viewer. In this article, I'll cover the various types of text-based alternatives and how you can use similar strategies to increase engagement across a range of platforms and viewer types.
Definitions
Captions
Captions involve transforming all important sounds into text that appears on the video. Captions incorporate spoken language, sound effects, music, and other noises relevant to the video. For example, if a ringing doorbell makes an actor look up, the caption would read [Doorbell rings]. Without comprehensive captions, videos would not make much sense because people would be acting and reacting with what appears to be nothing.
Subtitles
Subtitles are a less comprehensive version of captions. Subtitles only transform spoken language into text. You can think of subtitles as a subcategory of captions. Subtitles are frequently used when the speakers in a video are hard to understand (for example, speaking a different language, speaking in a strong accent, speaking while far away from the microphone, speaking in a noisy environment). Think of a news report that uses 911 call audio, or a reality TV show where the participants start fighting and disconnect their microphones. Even those with perfect hearing have trouble understanding spoken words in these situations.
Transcript
Captions and subtitles are attached to a video. In contrast, a transcript is a text record that is not attached to a video. You can read a transcript in the same way you’d read an article. Transcripts may be used for live events, such as a transcript of a panel at a convention. In journalism, sometimes journalists use lightly edited transcripts of recorded conversations with subjects as articles.
Text-Based Alternative
Text-based alternative refers to any text-based record of audio. TBA can refer to any combination of the above, plus things that don't fit into traditional definitions. For example: You tweet a link to a Twitch Clip. In the text of the tweet, you type out the most important words spoken in part of the clip. It's not a caption, and not a traditional transcript, but it's a text-based alternative.
Captions involve transforming all important sounds into text that appears on the video. Captions incorporate spoken language, sound effects, music, and other noises relevant to the video. For example, if a ringing doorbell makes an actor look up, the caption would read [Doorbell rings]. Without comprehensive captions, videos would not make much sense because people would be acting and reacting with what appears to be nothing.
Subtitles
Subtitles are a less comprehensive version of captions. Subtitles only transform spoken language into text. You can think of subtitles as a subcategory of captions. Subtitles are frequently used when the speakers in a video are hard to understand (for example, speaking a different language, speaking in a strong accent, speaking while far away from the microphone, speaking in a noisy environment). Think of a news report that uses 911 call audio, or a reality TV show where the participants start fighting and disconnect their microphones. Even those with perfect hearing have trouble understanding spoken words in these situations.
Transcript
Captions and subtitles are attached to a video. In contrast, a transcript is a text record that is not attached to a video. You can read a transcript in the same way you’d read an article. Transcripts may be used for live events, such as a transcript of a panel at a convention. In journalism, sometimes journalists use lightly edited transcripts of recorded conversations with subjects as articles.
Text-Based Alternative
Text-based alternative refers to any text-based record of audio. TBA can refer to any combination of the above, plus things that don't fit into traditional definitions. For example: You tweet a link to a Twitch Clip. In the text of the tweet, you type out the most important words spoken in part of the clip. It's not a caption, and not a traditional transcript, but it's a text-based alternative.
Who benefits from text-based alternatives?
- People who are D/deaf or hard of hearing (HoH)
- People with auditory processing disorders who can physically hear sound, but struggle translating sound into speech
- People who are not fluent speakers of the language used in the content. They may miss information if it’s spoken too quickly or pronounced differently than they learned. This is especially important for an international gaming audience.
- People with limited or unreliable Internet access who can view a transcript much easier than a video
- People who are busy at school, work, or in public places where they can’t listen to audio or wear headphones (for example, getting the latest gaming news while on a bus or checking in on a tournament during class)
- People who want the information at their own pace instead of the video's pace
- People who prefer extensive customization options for their content (for example, someone might prefer large-font text in dark mode instead of watching a video)
- People who are watching the content in a loud environment (for example, in a sports bar or at a convention)
- People who simply prefer to read instead of listen
Creating and using captions
Closed captions are captions that are encoded alongside the video. The viewer can turn the captions on and off. Closed captions are key on YouTube videos and traditional TV where only part of your audience wants to see the captions, but part of your audience needs the captions.
There are two popular ways to create closed captions: video editing software and YouTube tools. Professional-level editing programs such as Adobe Premiere and Final Cut Pro have specialized captioning tools to allow you write, import, or otherwise create captions, then encode them to professional broadcasting standards. YouTube has a captioning feature that is a great starting point for people without access to top-tier software.
With YouTube, you can upload a video, enable captions, then download a copy of YouTube's automated captions as a .txt file. Do not rely on the automated captions. These captions are often unreadable and nonsensical. However, the .txt file is an incredibly valuable resource for beginning captioners. The .txt may show incorrect words, but it will have usable timestamps. The average person can't type timestamps from scratch, but they can certainly fix the grammar, punctuation, and speaker tags of the actual captions. Sometimes, you can fix the grammar without even listening to the audio. And, as you get more experienced, you'll get faster and better. Consider the YouTube automated captions as your starting point for closed captions, not the finished product.
Open captions are captions that are burned into the video itself. Since they are part of the video, they cannot be removed. To create open captions, you can use any program that puts text on a video. This include both professional-tier software and cheap or free video editing programs. Open captions have lots of great uses for gaming-related content.
As mentioned earlier, videos on Facebook and Twitter play without sound as a default. By including open captions on these forms of content, you can entice the viewer to play the full video with sound, or you can gain a new viewer even if they can't or won't use the audio track.
Additionally, open captions are an essential part of sharing gifs. While gifs of facial expressions or cute animals need no explanation, gifs with memorable lines from popular media need the open captions. In fact, you've likely used gifs with open captions without even realizing what they were. When users upload and captions gifs themselves, the quality can be low, the text small, and the gif unimpressive. By creating your own high-quality gifs with open captions, you ensure that viewers see your content the way you want it to be seen. More on that in the 'Example cases' section.
There are two popular ways to create closed captions: video editing software and YouTube tools. Professional-level editing programs such as Adobe Premiere and Final Cut Pro have specialized captioning tools to allow you write, import, or otherwise create captions, then encode them to professional broadcasting standards. YouTube has a captioning feature that is a great starting point for people without access to top-tier software.
With YouTube, you can upload a video, enable captions, then download a copy of YouTube's automated captions as a .txt file. Do not rely on the automated captions. These captions are often unreadable and nonsensical. However, the .txt file is an incredibly valuable resource for beginning captioners. The .txt may show incorrect words, but it will have usable timestamps. The average person can't type timestamps from scratch, but they can certainly fix the grammar, punctuation, and speaker tags of the actual captions. Sometimes, you can fix the grammar without even listening to the audio. And, as you get more experienced, you'll get faster and better. Consider the YouTube automated captions as your starting point for closed captions, not the finished product.
Open captions are captions that are burned into the video itself. Since they are part of the video, they cannot be removed. To create open captions, you can use any program that puts text on a video. This include both professional-tier software and cheap or free video editing programs. Open captions have lots of great uses for gaming-related content.
As mentioned earlier, videos on Facebook and Twitter play without sound as a default. By including open captions on these forms of content, you can entice the viewer to play the full video with sound, or you can gain a new viewer even if they can't or won't use the audio track.
Additionally, open captions are an essential part of sharing gifs. While gifs of facial expressions or cute animals need no explanation, gifs with memorable lines from popular media need the open captions. In fact, you've likely used gifs with open captions without even realizing what they were. When users upload and captions gifs themselves, the quality can be low, the text small, and the gif unimpressive. By creating your own high-quality gifs with open captions, you ensure that viewers see your content the way you want it to be seen. More on that in the 'Example cases' section.
Uses and Using Transcripts
A transcript is a text-only form of audio content. Here are some common uses and benefits of transcripts:
If you've created closed captions for your video, you have already done most of the work needed to create a transcript. Download a separate version of the caption .txt file. Delete all timestamps and add speaker tags every time a new person starts talking. Now you have a full text transcript you can share on your website or in the description of a YouTube video.
Keep your transcripts in one folder on your computer, and you now have a way to search past video content. Just use the Find tool to find the name or topic you want. It's also an extremely portable solution, as team members can search, share, and discuss relevant parts of videos without needing the video itself. I recommend saving one version of the transcript with timestamps and one version with just the text. The text-only version is reader-friendly, but the timestamped version will help you pinpoint exactly when the content occurs on the video.
- Provides the entire text of a speech (especially in political reporting where context is important)
- Distributes critical information (especially information from government agencies, such as a weather advisory)
- Allows readers to consume the content at their own speed instead of waiting for the subtitles to play out
- Allows viewers to search for a specific quote or comment from a huge audio file
- Lets the reader save the information more easily for future reference
- Consumes fewer resources for people who have limited or unstable Internet access
- Gives more customization options (for example, you can print out, zoom in, or increase the font of a saved transcript)
- Increases portability and sharing potential (for example, if you wanted to share a quote from a video, you could copy and paste the relevant section from the transcript for immediate reading)
- If the transcript is its own web page that looks like an article, it can be added to traditional reading apps and tools for people who have audio processing issues or simply prefer information in that format.
If you've created closed captions for your video, you have already done most of the work needed to create a transcript. Download a separate version of the caption .txt file. Delete all timestamps and add speaker tags every time a new person starts talking. Now you have a full text transcript you can share on your website or in the description of a YouTube video.
Keep your transcripts in one folder on your computer, and you now have a way to search past video content. Just use the Find tool to find the name or topic you want. It's also an extremely portable solution, as team members can search, share, and discuss relevant parts of videos without needing the video itself. I recommend saving one version of the transcript with timestamps and one version with just the text. The text-only version is reader-friendly, but the timestamped version will help you pinpoint exactly when the content occurs on the video.
Example Cases
Let's say that you have a podcast. How can you increase accessibility, incorporate text-based alternatives, and capture a much larger audience? Your new workflow would look something like this:
1. Record the podcast as normal
2. Convert it into a simple video. You do not necessarily need to make the video public; you just need something that YouTube can work with.
3. Upload the video to YouTube
4. Enable YouTube automated captions to get the first draft of captions
5. Save the .txt file of automated captions
6. Manually review the file for errors and clean up any mistakes
7. Upload the fixed version to YouTube as the final captions (if you have a usable video version of the podcast)
8. Modify the text file to serve as a transcript
9. Post the transcript to your website
10. Save a separate copy of the .txt caption file with timestamps
11. Keep all timestamped file in a new folder so you can easily search your archives for mentions of a specific topic
Now let's look at some potential uses for a tournament organizer. I come from a Dota background, so these ideas are intended for that style of tournament.
The ideal here would be a live captioner who typed all spoken words as captions. That may be cost-prohibitive at the moment, but if you ever want to show your tournament on live TV, you're going to need captions. Some companies may choose to omit or ignore captions, but they will eventually be eclipsed by other businesses who plan for the almost-inevitable future of traditionally broadcasted gaming tournaments.
Some steps you can take in the meantime: Create a standard format for all clips, highlights, and key moments from your broadcast. Let's say that your logo is blue and white. Create an open caption format that uses white text against a blue background. Put the tournament's logo on one side where the caption box meets the video. Insert the high-quality footage either above or below your caption box, depending on the type of content. Now, you have high-quality, highly shareable, accessible, and branded content. With the appropriate framework, viewers will never forget which company or which tournament the clip came from--because it says right on the clip! Sure, some viewers will still make their own clips and gifs, but relying on the potential of a third-party of questionable quality is not a sustainable strategy.
1. Record the podcast as normal
2. Convert it into a simple video. You do not necessarily need to make the video public; you just need something that YouTube can work with.
3. Upload the video to YouTube
4. Enable YouTube automated captions to get the first draft of captions
5. Save the .txt file of automated captions
6. Manually review the file for errors and clean up any mistakes
7. Upload the fixed version to YouTube as the final captions (if you have a usable video version of the podcast)
8. Modify the text file to serve as a transcript
9. Post the transcript to your website
10. Save a separate copy of the .txt caption file with timestamps
11. Keep all timestamped file in a new folder so you can easily search your archives for mentions of a specific topic
Now let's look at some potential uses for a tournament organizer. I come from a Dota background, so these ideas are intended for that style of tournament.
The ideal here would be a live captioner who typed all spoken words as captions. That may be cost-prohibitive at the moment, but if you ever want to show your tournament on live TV, you're going to need captions. Some companies may choose to omit or ignore captions, but they will eventually be eclipsed by other businesses who plan for the almost-inevitable future of traditionally broadcasted gaming tournaments.
Some steps you can take in the meantime: Create a standard format for all clips, highlights, and key moments from your broadcast. Let's say that your logo is blue and white. Create an open caption format that uses white text against a blue background. Put the tournament's logo on one side where the caption box meets the video. Insert the high-quality footage either above or below your caption box, depending on the type of content. Now, you have high-quality, highly shareable, accessible, and branded content. With the appropriate framework, viewers will never forget which company or which tournament the clip came from--because it says right on the clip! Sure, some viewers will still make their own clips and gifs, but relying on the potential of a third-party of questionable quality is not a sustainable strategy.
Personal favorites
Here are a few examples of people doing captions right:
Wronchi Animation: Move the Payload 2: An Overwatch Cartoon
Not all of Wronchi Animation's videos have captions, but when they do, they are surprisingly good. It's clear that someone put a lot of effort into making the captions both usable and enjoyable. While there are a few slip-ups (some missed timing, referring to a silence as 'deafening'), these captions put some major gaming brands to shame. One other caveat: Wronchi Animation sometimes puts small Easter eggs into the captions. At one point, a running Zenyatta is captioned with '(How do you translate a robotic humanoid being out of breath?)' This works for Wronchi's style here. If you're just starting out with captions, focus on covering the basics before adding secrets.
Refresher Parodies: Wraith King's Back
Disclaimer: I love Refresher Parodies and supported them on Patreon when they were active. This is not my favorite song by the team (that would be Epicenter), but it's the best example of using captions alongside your content. A typical Dota screen is very cluttered, with the minimap, abilities, and items along the bottom, then the time and hero portraits across the top. You don't need any of that for a parody song. Refresher filmed a video in Dota itself, cut off the top part with the unnecessary information, then covered up the bottom with a black bar--and then put captions over it. It's an extremely elegant solution that prioritizes enjoyment and relevant information over the typical screen layout.
I'll add to this section over time, so check back for more! I am also considering a separate Hall of Fame page to recognize content creators that go above and beyond with regards to accessibility.
Wronchi Animation: Move the Payload 2: An Overwatch Cartoon
Not all of Wronchi Animation's videos have captions, but when they do, they are surprisingly good. It's clear that someone put a lot of effort into making the captions both usable and enjoyable. While there are a few slip-ups (some missed timing, referring to a silence as 'deafening'), these captions put some major gaming brands to shame. One other caveat: Wronchi Animation sometimes puts small Easter eggs into the captions. At one point, a running Zenyatta is captioned with '(How do you translate a robotic humanoid being out of breath?)' This works for Wronchi's style here. If you're just starting out with captions, focus on covering the basics before adding secrets.
Refresher Parodies: Wraith King's Back
Disclaimer: I love Refresher Parodies and supported them on Patreon when they were active. This is not my favorite song by the team (that would be Epicenter), but it's the best example of using captions alongside your content. A typical Dota screen is very cluttered, with the minimap, abilities, and items along the bottom, then the time and hero portraits across the top. You don't need any of that for a parody song. Refresher filmed a video in Dota itself, cut off the top part with the unnecessary information, then covered up the bottom with a black bar--and then put captions over it. It's an extremely elegant solution that prioritizes enjoyment and relevant information over the typical screen layout.
I'll add to this section over time, so check back for more! I am also considering a separate Hall of Fame page to recognize content creators that go above and beyond with regards to accessibility.
Conclusion
Right now, text-based alternatives are a great way to make your company stand out as a professional, inclusive organization. In the near future, text-based alternatives will become commonplace or even required for big broadcasters. Some companies have already started experimenting with ways to increase engagement through some of the strategies outlined here. Talk to your content team about ways to incorporate TBA into your overall media strategy. Looking to move even faster? Contact me to discuss hiring me to develop your caption strategy or caption your existing content.
It's far better to be the leader of the pack instead of the straggler trying to catch up.
The race has already started.
It's far better to be the leader of the pack instead of the straggler trying to catch up.
The race has already started.
Personal note: As I was writing this guide, I felt that in some places, the structure seemed to marginalize the needs of D/deaf and HoH people. My intention is not to disrespect or devalue the culture or needs of D/deaf and HoH people. This article is about motivating big organizations to change their media strategies. To do that, I needed to show them ways that text-based alternatives would give them a competitive (and potentially profitable) advantage and increase engagement across a huge range of viewers with a diverse set of needs.
I would also like to specifically acknowledge that D/deaf and HoH people often struggle with discrimination in many facets of life, including housing, employment, and education. In many cases, it's not an issue of (dis)ability, but of a world that is slow to adapt, even when the technology is available. To read more about these issues, check out Captioning Activism and Community for caption activism, National Association for the Deaf for a comprehensive resource database, and Gallaudet University for information on D/deaf culture.
I would also like to specifically acknowledge that D/deaf and HoH people often struggle with discrimination in many facets of life, including housing, employment, and education. In many cases, it's not an issue of (dis)ability, but of a world that is slow to adapt, even when the technology is available. To read more about these issues, check out Captioning Activism and Community for caption activism, National Association for the Deaf for a comprehensive resource database, and Gallaudet University for information on D/deaf culture.