Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Even in 1989, it should have been clear that 16 bits were not enough to encode all of the Chinese characters, let alone encoding all the human scripts. Unicode today encodes 92,865 Chinese characters (https://en.wikipedia.org/wiki/CJK_Unified_Ideographs).

The only reason anybody would think of UCS-2 was a good idea was that they did not consult a single Chinese or Japanese scholar on Chinese characters.



Nobody in 1989 expected to encode 92k Chinese characters into Unicode because none of the existing encodings were encoding 92k characters either. The most common encoding for Chinese, GB2312, only has 7k characters.

I recommend reading your own link, specifically the list of sources for the first CJK block to see how many characters were included and where they were sourced from.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: