Audio Translator

Audio Translator

Audio Translator

Concept & design for a Mac application to consume content beyond linguistic abilities with a proof of concept that utilizes Apple's Speech framework and Google’s Translate API.

Concept & design for a Mac application to consume content beyond linguistic abilities with a proof of concept that utilizes Apple's Speech framework and Google’s Translate API.

Concept & design for a Mac application to consume content beyond linguistic abilities with a proof of concept that utilizes Apple's Speech framework and Google’s Translate API.

Type

Conceptual

Timeline

3 Days

My Role

Concept, UX, UI

Team

Collaboration with 1 Engineer

Tools

Figma, Notion, XCode

Type

Conceptual

Timeline

3 Days

My Role

Concept, UX, UI

Team

Collaboration with 1 Engineer

Tools

Figma, Notion, XCode

Type

Conceptual

Timeline

3 Days

My Role

Concept, UX, UI

Team

Collaboration with 1 Engineer

Tools

Figma, Notion, XCode

Problem

People are limited in their consumption and understanding of content that is only available in their native or spoken language.

Problem

People are limited in their consumption and understanding of content that is only available in their native or spoken language.

In 2022, only around 19% of the global population spoke English

In 2022, only around 19% of the global population spoke English

In 2022, only around 19% of the global population spoke English

Goal

Allow users to listen to or watch any content in their own language. This eliminates language barriers and opens up access to news, audiobooks, and video calls that were previously unavailable.

Goal

Allow users to listen to or watch any content in their own language. This eliminates language barriers and opens up access to news, audiobooks, and video calls that were previously unavailable.

Context

I had the idea of an app that translates audio into any language because my mom doesn't speak English. She is unable to watch certain TV shows that are only available with English audio or subtitles. My mom isn't the only one in my family who faces language barriers. We lived in Kazakhstan (formerly USSR) until 1998, when we moved to Germany as ethnic repatriates. After almost 25 years in Germany, many of my older relatives don't speak German well enough to understand local news on TV. Instead, they rely on Russian television and miss important information from the local government. If it comes to English, only a few younger relatives can communicate properly. Research shows, that we are just one of many families that face those kind of issues.

Context

I had the idea of an app that translates audio into any language because my mom doesn't speak English. She is unable to watch certain TV shows that are only available with English audio or subtitles. My mom isn't the only one in my family who faces language barriers. We lived in Kazakhstan (formerly USSR) until 1998, when we moved to Germany as ethnic repatriates. After almost 25 years in Germany, many of my older relatives don't speak German well enough to understand local news on TV. Instead, they rely on Russian television and miss important information from the local government. If it comes to English, only a few younger relatives can communicate properly. Research shows, that we are just one of many families that face those kind of issues.

Research

Statista estimates that only around 1.5 billion people, or 19% of the global population, speak English either natively or as a second language. In the U.S. alone, the translation services market is valued at $9 billion.

Research

Statista estimates that only around 1.5 billion people, or 19% of the global population, speak English either natively or as a second language. In the U.S. alone, the translation services market is valued at $9 billion.

Findings About Potentital Users

Potential users may have different needs and preferences based on their demographics. For example, according to UK Government data, women are more likely than men to be unable to speak English, and this likelihood increases with age. Use cases ranging from watching TV shows to getting online consultations to understanding international news to avoid misinformation. Designs should be simple to accommodate the different demographics.

Findings About Potentital Users

Potential users may have different needs and preferences based on their demographics. For example, according to UK Government data, women are more likely than men to be unable to speak English, and this likelihood increases with age. Use cases ranging from watching TV shows to getting online consultations to understanding international news to avoid misinformation. Designs should be simple to accommodate the different demographics.

Competetive Analysis

During my research, I found that both Google and Apple have similar technologies. Google Translate allows you to communicate in two languages by speaking into your phone. If you're not in the same place, you can use translated captions in Google Meet. But what if you want to communicate via Zoom or WhatsApp? Apple provides Live Captions as an accessibility feature, but does not offer the option to translate into another language. However, these features are only available on certain sites or devices. In contrast, our product can be used on all Mac computers.

Competetive Analysis

During my research, I found that both Google and Apple have similar technologies. Google Translate allows you to communicate in two languages by speaking into your phone. If you're not in the same place, you can use translated captions in Google Meet. But what if you want to communicate via Zoom or WhatsApp? Apple provides Live Captions as an accessibility feature, but does not offer the option to translate into another language. However, these features are only available on certain sites or devices. In contrast, our product can be used on all Mac computers.

Proof of Concept

Quick implementation of a macOS app to proof the feasability without final designs. The app uses Apple's Speech framework to transcribe audio and Google's Translate API for the translation.

Proof of Concept

Quick implementation of a macOS app to proof the feasability without final designs. The app uses Apple's Speech framework to transcribe audio and Google's Translate API for the translation.

Ideation

Finding The Right Device

We chose to build a macOS app, as it was the platform we were familiar with and enabled us to quickly create a proof of concept. We didn't opt for mobile, as we felt a desktop application would better suit most use cases, whereas mobile is more for on-the-go scenarios.

Ideation

Finding The Right Device

We chose to build a macOS app, as it was the platform we were familiar with and enabled us to quickly create a proof of concept. We didn't opt for mobile, as we felt a desktop application would better suit most use cases, whereas mobile is more for on-the-go scenarios.

Information Architecture

I experimented with different user flows to define the information architecture, leading to the decision to name the transcript functionality “Recording”. I assumed the user might get confused with “Transcript” and “Translation” side by side. I also think “Recording” is easier to understand. To validate these decisions, I would need to conduct user testing.

Information Architecture

I experimented with different user flows to define the information architecture, leading to the decision to name the transcript functionality “Recording”. I assumed the user might get confused with “Transcript” and “Translation” side by side. I also think “Recording” is easier to understand. To validate these decisions, I would need to conduct user testing.

Accuracy vs. Speed

Our proof of concept revealed technical limitations for live audio translation. Combining these with various use cases led me to design an accuracy setting. Users can choose between faster translations or higher accuracy. Transcripts can be stored locally or automatically uploaded to a connected Notion account. OpenAI's Whisper can be used to enhance the transcribed text.

Accuracy vs. Speed

Our proof of concept revealed technical limitations for live audio translation. Combining these with various use cases led me to design an accuracy setting. Users can choose between faster translations or higher accuracy. Transcripts can be stored locally or automatically uploaded to a connected Notion account. OpenAI's Whisper can be used to enhance the transcribed text.

Design Direction

Research revealed the diversity of demographics. To make the app feel familiar, I chose to stay close to macOS design. A minimal navigation ensures there are no distractions. After some sketching, I moved on to higher fidelity wireframes as we already knew the desired look and had a limited time to finalize the designs. After finding a solution, I created components to speed up the prototype building process.

Design Direction

Research revealed the diversity of demographics. To make the app feel familiar, I chose to stay close to macOS design. A minimal navigation ensures there are no distractions. After some sketching, I moved on to higher fidelity wireframes as we already knew the desired look and had a limited time to finalize the designs. After finding a solution, I created components to speed up the prototype building process.

Design System

Research confirmed our assumptions, that demographics would very a lot, that’s why we decided to go with a straightforward navigation and a UI that is familiar to macOS users, without any distractions. After some sketching I decided to go directly into more high fidelity wireframes, because we already knew what kind of look we wanted and had a short period of time to finalise the designs. After finding the solution I wanted to test, I created components to help me build the prototype quicker.

Design System

Research confirmed our assumptions, that demographics would very a lot, that’s why we decided to go with a straightforward navigation and a UI that is familiar to macOS users, without any distractions. After some sketching I decided to go directly into more high fidelity wireframes, because we already knew what kind of look we wanted and had a short period of time to finalise the designs. After finding the solution I wanted to test, I created components to help me build the prototype quicker.

Prototype

Prototype

Reflection

The biggest lesson from this project has been working with an engineer and gaining insight from his perspective. If I had more time, I would have conducted a survey and interviews to validate my assumptions and answer any remaining questions. The next step would be user testing to ensure the interaction is as easy as I intended.

Reflection

The biggest lesson from this project has been working with an engineer and gaining insight from his perspective. If I had more time, I would have conducted a survey and interviews to validate my assumptions and answer any remaining questions. The next step would be user testing to ensure the interaction is as easy as I intended.

Sources

English language skills GOV UK (https://www.ethnicity-facts-figures.service.gov.uk/uk-population-by-ethnicity/demographics/english-language-skills/latest) English Language Statistics of 2022 in the UK & Worldwide (https://preply.com/en/blog/english-language-statistics/) Language services industry in the U.S (https://www.statista.com/topics/2152/language-services-industry-in-the-us/#topicOverview) The most spoken languages worldwide 2022 | Statista (https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/)

Sources

English language skills GOV UK (https://www.ethnicity-facts-figures.service.gov.uk/uk-population-by-ethnicity/demographics/english-language-skills/latest) English Language Statistics of 2022 in the UK & Worldwide (https://preply.com/en/blog/english-language-statistics/) Language services industry in the U.S (https://www.statista.com/topics/2152/language-services-industry-in-the-us/#topicOverview) The most spoken languages worldwide 2022 | Statista (https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/)

© Kris Loewen 2023