LYCOS RETRIEVER
Unicode: Unicode Support
built 643 days ago
Unicode is the new foundation for the process of internationalization. The older code page concepts were really complicated to use for most of complex (Asian) languages. It was never designed for such use, and its ad hoc extensions have lead to inconsistent definitions for many characters. Internationalizing your code while using the same code base is complex, since you would have to support different character sets with different architectures for different markets. But modern business requirements are even stronger; programs have to handle characters from a wide variety of languages at the same time; the EU alone requires several different older character sets to cover all its languages. Mixing older character sets together is a nightmare, since all data has to be tagged, and mixing data from different sources is nearly impossible to do reliably.
Source:
Unicode is the native code set of Windows NT, but the Win32 subsystem provides both ANSI and Unicode support. Character strings in the system, including object names, path names, and file and directory names are represented with 16-bit Unicode characters. The Win32 subsystem converts any ANSI characters it receives into Unicode strings before manipulating them. It then converts them back to ANSI, if necessary, upon exit from the system.
Source:
You've now learned more about the benefits and capabilities that Unicode offers, in addition to looking more closely at its functionality. You might ... be wondering about the extent to which Windows supports Unicode's features. Microsoft Windows NT 3.1 was the first major operating system to support Unicode, and since then Microsoft Windows NT 4, Microsoft Windows 2000, and Microsoft Windows XP have extended this support, with Unicode being their native encoding. In fact, when you run a non-Unicode application on them, the operating system converts the application's text internally to Unicode before any processing is done. The operating system then converts the text back to the expected code-page encoding before passing the information back to the application.
Source:
The other problem with Unicode is how to enter Non-ASCII characters. Often, the only way to specify Unicode characters is by by using Unicode escape sequences as shown in the table above. Unicode specification, though, requires that composite characters must be specified by a sequence of Unicode characters led by the base one. Many French characters, for example, are built on top of the Latin character set with additional hyphens, carets, apostrophes, etc. The Unicode specification requires that such characters must be specified by the Latin character, followed by the apostrophes' (for example) Unicode value. The JavaScript implementation, like other ones, do not support this option.
Source:
Unicode has been around since the early 1990's, but its integration into Microsoft Windows, Microsoft Word, and other popular software packages has been a gradual process. Only with the combination of Microsoft Windows XP and Microsoft Word XP has Unicode support come to maturity and begun to deliver on its promised increases in productivity.
Source:
For compatibility with 8-bit and 7-bit environments, Unicode can ... be encoded as UTF-8 and UTF-7, respectively. While Unicode-enabled functions in Windows use UTF-16, it is also possible to work with data encoded in UTF-8 or UTF-7, which are are supported in Windows as multi-byte character set code pages. See Code Pages.
Source: