The Intricacies of Chinese Text Input and the QWERTY Keyboard

Category Technology

tldr #
31 seconds

The use of input method editors, or IMEs, has become essential for typing Chinese characters on digital devices, with over 100 different input methods currently available. Chinese input methods use phonetics to determine which characters the user wants to produce, and the 2013 National Chinese Characters Typing Competition showcased the speed and accuracy of this method. The keys pressed on a QWERTY keyboard to type Chinese characters are different from the characters that appear on the screen, showing how Chinese must be virtually manufactured in the digital world.


content #
4 minutes, 30 seconds

A young Chinese man sat down at his QWERTY keyboard and rattled off an enigmatic string of letters and numbers.

Was it code? Child’s play? Confusion? It was Chinese.

The beginning of Chinese, at least. These 44 keystrokes marked the first steps in a process known as "input" or shuru: the act of getting Chinese characters to appear on a computer monitor or other digital device using a QWERTY keyboard or trackpad. Stills taken from a 2013 Chinese input competition screencast. COURTESY OF MIT PRESS .

The first Chinese input methods were developed in the 1980s.

Across all computational and digital media, Chinese text entry relies on software programs known as "input method editors"—better known as "IMEs" or simply "input methods" (shurufa). IMEs are a form of "middleware," so named because they operate in between the hardware of the user’s device and the software of its program or application. Whether a person is composing a Chinese document in Microsoft Word, searching the web, sending text messages, or otherwise, an IME is always at work, intercepting all of the user’s keystrokes and trying to figure out which Chinese characters the user wants to produce. Input, simply put, is the way ymiw2klt4pwyy… becomes a string of Chinese characters.

There are currently over 100 different input methods for Chinese.

IMEs are restless creatures. From the moment a key is depressed or a stroke swiped, they set off on a dynamic, iterative process, snatching up user-inputted data and searching computer memory for potential Chinese character matches. The most popular IMEs these days are based on Chinese phonetics—that is, they use the letters of the Latin alphabet to describe the sound of Chinese characters, with mainland Chinese operators using the country’s official Romanization system, Hanyu pinyin. Example of Chinese Input Method Editor pop-up menu (抄袭 / "plagiarism") COURTESY OF MIT PRESS .

IMEs require constant updates to keep up with new words and phrases.

This young man was Huang Zhenyu (also known by his nom de guerre, Yu Shi). He was one of around 60 contestants that day, each wearing a bright red shoulder sash—as in a ticker-tape parade of old, or a beauty pageant. "Love Chinese Characters" (Ai Hanzi) was emblazoned in vivid golden yellow on a poster at the front of the hall. The contestants’ task was to transcribe a speech by outgoing Chinese president Hu Jintao, as quickly and as accurately as they could. "Hold High the Great Banner of Socialism with Chinese Characteristics," it began, or in the original: 高举中国特色社会主义伟大旗帜为夺取全面建设小康社会新胜利而奋斗. Huang’s QWERTY keyboard did not permit him to enter these characters directly, however, and so he entered the quasi-gibberish string of letters and numbers instead: ymiw2klt4pwyy1wdy6… .

Some Chinese input methods use a combination of phonetics and strokes to input characters.

With these four dozen keystrokes, Huang was well on his way, not only to winning the 2013 National Chinese Characters Typing Competition, but also to clocking one of the fastest typing speeds ever recorded, anywhere in the world.

ymiw2klt4pwyy1wdy6… is not the same as 高举中国特色社会主义… The keys that Huang actually depressed on his QWERTY keyboard—his "primary transcript," as we could call it—were completely different from the symbols that ultimately appeared on his computer screen, namely the "secondary transcript" of Hu Jintao’s speech. This is somewhat like the sweater you received at Christmas, or a salmon dinner you might upload to Instagram. The keys you pressed on the QWERTY keyboard didn’t spell out: 高举中国特色社会主义… dhautyvkal,eil. That would not be the ymiw2klt4pwyy1wdy6… methodology, after all; Yu and the other contestants were not attempting to spell out English pronounced Chinese characters as written phonetically in English for the sake of English speakers. "Pure" Chinese turns up on as few as one in every few hundred digital pages published by mainland Chinese computers, nowadays. But in the pre-Internet world of 1990? Pure Chinese was it. Xiao Qisheng, a Chinese computer programmer of that era, tried to sum it up best: "Chinese just does not exist." For Xiao and everyone else, Chinese must still be "virtually manufactured" as its initials suggest. Our internet, email, browsers, search engines and software is what we used to use.

Chinese input methods have become so advanced that they can even predict the next character a user wants to input.

hashtags #
worddensity #

Share