LYCOS RETRIEVER Beta Retriever Home  |  What is Lycos Retriever?   
Unicode: Utf-8
built 627 days ago
In order to take full advantage of Unicode on your Linux or other UNIX system, you will need to set your locale to a UTF-8 locale. Some recent distributions of Linux now default to using a UTF-8 locale by default. However, unless you are using a very recent Linux distribution, you are still very likely using a legacy locale based on ISO-8859 or other national encoding. If you are using some UNIX-based OS other than Linux, it is even less likely that you are already using a UTF-8 locale. To determine your current locale settings, type locale. Here are some results from Linux and Solaris:
Despite several technical limitations, problems, and criticisms, Unicode has emerged as the dominant encoding scheme in w:internationalization of software and multilingual environments. Microsoft Windows NT and its descendants Windows 2000 and Windows XP make extensive use of Unicode, more specifically w:UTF-16, as an internal representation of text. UNIX-like operating systems such as Linux, BSD and Mac OS X have adopted Unicode, more specifically UTF-8, as the basis of representation of multilingual text.
A Unicode string is turned into a string of bytes containing no embedded zero bytes. This avoids byte-ordering issues, and means UTF-8 strings can be processed by C functions such as strcpy() and sent through protocols that can't handle zero bytes
In the Vim text editor, Unicode characters can be entered by pressing CTRL-V and then entering a key combination. For more information, type ":help i_CTRL-V_digit" in Vim. (Note that the entered text will be Unicode only if the current encoding is set to Unicode or a related format like UTF-8; type ":help encoding" in Vim for details.) Many Unicode characters can ... be entered using digraphs; a table of such characters and their corresponding digraphs can be obtained using the ":digraphs" command (again provided the current encoding is set to Unicode).
The Unicode Standard, Version 5.0 UTF-8 encodes each Unicode character as a variable number of 1 to 4 octets, where the number of octets depends on the integer value assigned to the Unicode character. It is an efficient encoding of Unicode documents that use mostly US-ASCII characters because it represents each character in the range U+0000 through U+007F as a single octet. UTF-8 is the default encoding for XML.
Source:
James Su's Smart Common Input Method (SCIM) is a Unicode-based IM platform written in C++. For users, SCIM is an excellent choice because it is simple to set up and use in a UTF-8 or legacy locale. For software developers, it is ... nice because it abstracts input method interfaces into a set of simple, independent classes so developers can write their own input methods easily in a few lines of code.
SEARCH
MORE ABOUT