LYCOS RETRIEVER
Unicode
built 643 days ago
Unicode is a hot topic these days among computer users that work with multilingual text. They know it is important, and they hear it will solve problems, especially for dealing with text involving multiple scripts. They may not know where to go to learn about it, though. Or they may have read a few things about it and perhaps have seen some code charts, but they are at a point at which they need to gain a firmer understanding so that they can start to develop implementations or create content. This introduction is intended to give such people the basic grounding that they need.
Source:
The Internationalization & Unicode Conference is the premier technical conference for both software and Web internationalization as well as a great opportunity for networking with other practitioners. The program committee has issued a Call for Participation; please use the above link to access additional information. For information about sponsoring, exhibiting or attending the 32nd Internationalization & Unicode Conference, please send an email to info@unicodeconference.org or click on the appropriate link on the left.
Source:
Now that Unicode has more than 65536 characters, it can't be represented in two bytes. This means that a .NET char value can't store all possible values. The solution UTF-16 uses is that of surrogate pairs: pairs of 16-bit values where each value is between 0xd800 and 0xdfff. In other words, two "sort of" characters make one "real" character. (UCS-4 and UTF-32 get round this problem entirely by having wider values to start with - when everything's four bytes, you can get all possible characters in.) This is basically a headache - it means that a string of 10 chars can actually represent anywhere between 5 and 10 "real" Unicode characters. Fortunately, most applications which don't involve scientific/mathematical notation and Han characters are unlikely to need to worry too much about them.
Source:
Converts all Unicode characters in the string that have a case to uppercase. The exact manner that this is done depends on the current locale, and may result in the number of characters in the string increasing. (For instance, the German ess-zet will be changed to SS.)
Source:
The Unicode data type is compatible with the wide-character data type wchar_t in ANSI C... allowing access to the wide-character string functions. Most of the C run-time (CRT) libraries contain wide-character versions of the strxxx string functions. The wide-character versions of the functions all start with wcs.
Source:
Reason for inclusion: Originally included in Unicode for the sole purpose of indicating byte order or use in file signatures, the character acquired the ZWNBSP semantics as part of the merger between ISO/IEC 10646 and Unicode. When used as a byte order mark the character is placed at the beginning of a file. If a recipient views it as FEFF then the byte order between sender and receiver match. If the recipient views it as FFFE (a non-character code point) then the sender used opposite byte order from the recipient, and the recipient needs to invert the byte order or refuse to read the file. When used as a ZWNBSP the character is intended to prevent breaks between adjacent characters. This function is now provided by U+2060
Source: