December 18th, 2018
Our goal with Gboard is to help you communicate in a way that’s comfortable and natural, regardless of the language you speak. While the ten most common languages cover about half of the world’s population, many more thousands of languages are spoken by the other half. As the Next Billion Users come online, technology needs to support their languages so they can get the most out of using the internet. Today, Gboard offers more than 500 language varieties on Android, bringing a smart, AI-driven typing experience to many more people around the world. This means that more than 90% of the world can now type in their first language with Gboard, with keyboard layouts tailored to each language and typing smarts like autocorrect and predictive text.
In December 2016, Gboard first launched on Android with about 100 language varieties. Over the last few months, more than one hundred new languages have been added to Gboard, such as Nigerian Pidgin (~30 million speakers), Rangpuri (~15 million speakers), Balinese (~3 million speakers), Pontic Greek (~800,000 speakers) and many more.
A quick look at the layouts below shows the sheer diversity of input methods used across the world every day:
Gboard currently supports more than 40 writing systems across the world, ranging from alphabets used across many languages, like Roman and Cyrillic, to scripts that are used for only one language, like Ol Chiki (used for Santali).
Building technology that works across languages is important: without a keyboard tailored to your language, simple things like messaging friends or family can be a challenge. Often, keyboard apps don’t support the characters and scripts used for languages with a smaller speaking population. As an example, the Nigerian language “Ásụ̀sụ̀ Ị̀gbò” is impossible to type on an English keyboard. Plus, wouldn’t it be frustrating to see nearly every word you type incorrectly autocorrected into another language?
Many of Gboard’s newly added languages are traditionally not widely written, such as in newspapers or books, so they’re rarely found online. But as we spend more time on our phones on messaging apps and social media, people are now typing in these languages more than ever. The ability to easily type in these languages lets people communicate with others in the language they would normally speak face-to-face as well.
How we add new languages to Gboard
In addition to designing a new keyboard layout, every time a new language is added to Gboard we create a new machine learning language model. This model trains Gboard to know when and how to autocorrect your typing, or to predict your next word. For languages like English, which has only about 30 characters and large amounts of written materials widely available, this is easy. For many of the world’s languages, though, this process is much harder.
In order to train our machine learning language models, we need a text corpus (which is a database of lots of available texts written in a particular language). Often, finding text data in these languages can be challenging. When we can’t find data online, we’ll share a list of writing prompts with native speakers, so we can create new text corpora from scratch. (You can read more about our crawling efforts for these languages in one of our recent research papers.)
Next, we focus on the layout design. Layout design for a new language on Gboard requires careful investigation and research to fit in all the characters in a way that makes sense to native speakers. If there isn’t a lot of information for the language available online, we’ll analyze text corpora to figure out which characters to include and to determine how frequently they’re used.
Depending on the language, we may tailor aspects of the layout, like the set of digits—for example, while English uses 0123456789, Hindi and other Indian languages written in Devanagari use ०१२३४५६७८९. Once we’ve built support for a language, we always invite a group of native speakers to test and fill out a survey to understand their typing experience.