WuBi 五笔 Input Method
Write Chinese characters with a computer keyboard
Robert SIEMER
http://backsla.sh/talks
[beware, your fonts may render dramatically different]
2010-06-21
Overview
- know the basics
- first exercise together
- understand what Wubi is about
- further introduction (25min)
- step-by-step guide to master Wubi (rest of the time)
- simple exercises follow each step
The Wubi basics
- it is not pinyin
- for standard keyboards
- different character components spread over 25 letter keys
- for each character to write, pick the components one by one, in order of hand writing
- at the end: commit/confirm
- like with other Input Methods as well
- usually “space” key commits
Exercise I (2min)
Use the Wubi keyboard layout sheet and guess the Wubi codes to write these characters:
众 林 昌 品 因 比 双
Part I — Theory
Input Methods
- pronunciation based
- Hanyu Pīnyīn (nǐ hǎo)
- Standard Cantonese Pīnyīn (nei5 hou2)
-
Zhùyīn/Bopomofo (ㄚㄛㄜㄝㄞㄟ)
- character/structure/stroke/writing based
- 4-corner (method used for name lists)
- Cāngjié
- Wǔbǐ
- Dàyì
- combination of both
Structure based Input Methods
- many different methods exist, differing in
- popularity
- in Mainland China very clear: Wubi very dominant
- e.g. Taiwan: not so clear (keyboard labels ready for two)
- official status
- numbers for name characters on forms/registration purposes
- keys used
- only numbers (0-9 or even less)
- “letters” (that is, 26 or a little less keys)
- “everything” (e.g. 46 keys)
- adapted language/script variant
- simplified/traditional
- languages (Korean, Japanese, ...)
- decomposition basis
- none/indexed/numbered
- dictionary radicals
- writing related (stroke, components)
- shape related (at predefined positions or writing order)
- other characteristics
- code length (speed, auto commit)
- ambiguity (official use, blind typing, speed)
- wildcards (ease of use, searching)
Wubi characteristics
- invented by 王永民 WANG Yongmin
- developed for simplified Chinese
- “normal” is Wubi86
- a revamped version, Wubi98, had no success
- a little more suitable for simplified and traditional characters
- same rules, but a small changes on the components
- depending on the implementation:
- Wubi86 may yield traditional characters as well
- yields over 20,000 characters
- characters combinations available as well
- nice wildcard key Z
- limited structure based character search capability
- high typing speeds
Advantages
pro
|
contra
|
no need to know the pronunciation
|
need to know the writing
|
fast typing
|
needs more time to learn
|
nice character search possible
|
limited, but alternatives limited as well
|
helps brain with a nice structural breakdown
|
does not correspond to dictionary radicals
|
helps memorizing writing (Pinyin does not)
|
easy to forget distinction of same key shapes
|
impress your Chinese friends
|
better hobbies to spend time on available
|
- in any case: you should not learn it instead, but in addition to Pinyin
- some say: “to complicated for foreigners”, I say: “especially useful for Chinese learners”
Part II — Practice
Bihua Input Method
- is not the same, but Wubi shares some concepts with the Bihua Input Method
- most mobile phones have Bihua (apart from Pinyin)
- characters are entered stroke by stroke
- each stroke has a number form 1 - 5
- number depends on stroke type
Bihua and Wubi stroke types
|
1
|
2
|
3
|
4
|
5
|
primary
|
一
|
丨
|
丿
|
㇏
|
乙
|
secondary
|
㇀
|
亅
|
㇒
|
丶
|
many
|
notes
|
from left to right
|
the only “has a hook” exception
|
from (top) right to (bottom) left
|
some dots point down left 灬
|
everything with a corner or hook
|
Exercise II (3min)
Determine the Bihua number code:
三 中 木 手 丝 须 川 我 没
Wubi partition: zones
- Wubi keyboard divided into five zones
-
- according to Bihua stroke types
- to ease lookup
-
first stroke of component decides which zone it belongs to
- some exceptions exist, though...
- zone map
3 4
1 2
5
Exercise III (3min)
In which zone should these components be? Are they actually there?
大 人 三 车 刀 白 山 川 米 攵 马 巴
Wubi partition: keys
- every zone has five keys
- counting starts from the “middle”
- for example key “24” is: zone 2, key 4 (English keyboard: L)
-
different rules exist to determine the key number of a component
-
- second stroke type
- repetition of first stroke (type)
- similarities to other components on the key
- otherwise try
-
- key number mnemonics (e.g. total stroke count)
- key letter mnemonics (shape?)
Exercise IV (2min)
These components are keyed strictly by first and second stroke type. On which key should they be?
纟厶 七 石 八 夕 王 己 了 刀 士 竹 冖 门 文
Exercise V (2min)
These components are keyed by first stroke type and than number of stroke type repetition. Look them up:
彡 三 二 一 丶 冫 氵 灬 丨 刂 川
[Note: single strokes of every type are available on the x1 keys: 一 丨 丿丶 乚]
Exercise VI (4min)
Why are these components where they are?
人 扌 水 火 冖 凵 山 几 阝 木 大 车
Wubi difficulties
- unfortunately, it is hard to tell which component is available on Wubi
-
- look through the keyboard layout and see what is there
- create your own mnemonics
- at first, remember the frequent ones
- sometimes a geometrically connected part of a character needs to be decomposed in Wubi
-
- the easiest rule is: do it intuitively
- follow the stroke order
- minimize the number of components
Exercise VI (5min)
Determine the Wubi code for the following characters (2, 3 or 4 components).
Easy: 没 语 主 达 钱 他 冰 字 汉
Medium: 会 到 稳
Hard: 狗 面 新
Code shortening
- all Wubi codes are limited to four or less keys
- characters with more components get shortened
1. 2. 3. and last
警=艹勹口+言
Code prolongation
- all characters have an official code of 3 or 4 keys
- 1 and 2 key codes are always abbreviations for frequent characters
- some “natural” codes are to short and collide with abbreviations
- writing of components often raises this problem, e.g.
言 方 广 文 would all be just ”y”, so do instead
- type that key 4 times: “yyyy” or
- type the key once and enter the first, second and last stroke as well
|
田
|
四
|
力
|
车
|
国
|
4 times
|
LLLL
|
|
|
|
|
key+1. 2. last stroke
|
|
LHNG
|
LTN
|
LGNH
|
|
actual abbreviation
|
LLL
|
LH
|
LT
|
LG
|
L (of LGY)
|
note
|
LHNG is 四
|
|
wrong stroke order
|
wrong zone
|
not a component!
|
Exercise VII (4min)
The following characters are Wubi components. Which code do they have? (Use main comp. code (4×), stroke decomposition or abbreviation.)
十 九 八 七 六 五 四 三 二 一
Code prolongation: distinguishing key
- all “natural” 2 and 3 key codes can also be extended with one extra key
-
- determine the stroke type of the last stroke of the last component (1, 2, 3, 4, 5)
- analyze the geometric shape ⿰ is 1, ⿱ is 2 and others are 3
use the key “ab” as distinguishing key
- extension only necessary in case of collision
- mostly optional
- forbidden only in case of collision...
去 = 土 厶 + U (丶⿱ is 42)
Code “FC” alone is 支.
1
|
2
|
3
|
⿰ ⿲
|
⿱ ⿳
|
⿴ ⿴ ⿵ ⿶ ⿷ ⿸ ⿹ ⿺ ⿻
|
Exercise VII (3min)
At least, determine the distinguishing key.
Optional: 主 天 不 业 国
Must: 她 票 壮 状 美 里 庄 今 待 推 血 市 申 亦
Must, if not written abbreviated: 年
Nice to know
- Wubi is not (dictionary) radical based
-
- many components are not radicals
- some radicals really look like being composed of two, three others (and so are in Wubi)
- 殳青足音是黾香高
- some radicals are broken up in hand writing (and Wubi)
- 亘 has radical 二
- sometimes Wubi does not follow the Mainland stroke order, or no valid stroke order at all
- Wubi: 力 = 4 5 戈 = 1 5 4 3
- you can actually enter multiple characters with one code
- 你好 对不起 不好意思 很多
References
http://www.wenlin.com/cdl/cdl_strokes_2004_05_23.pdf
- 227 radicals for simplified characters
http://de.wikipedia.org/wiki/227_Radikale
- Wikipedia
- Input Methods (en)
- (the English Wubi article is really bad)