LYCOS RETRIEVER
Unicode: Unicode Consortium
built 628 days ago
The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard, which specifies the representation of text in modern software products and standards. Unicode is the accepted international standard that includes support for all major scripts of the World and is adopted by all current major computer operating systems. This is a 16 bit standard that allows use of more than 65000 characters in one font. It has support for major Indic (Indian) scripts that include Devanagari (Hindi, Marathi, Sanskrit), Bengali (Bengali, Assamese), Gurmukhi (Punjabi), Gujarati, Oriya, Tamil, Telugu, Kannada and Malayalam. Microsoft Windows XP has full support for Indic scripts, including Gurmukhi. All future development regarding scripts will be based on Unicode.
Source:
Unicode is an especially good fit for the age of the Internet, since the worldwide nature of the Internet demands solutions that work in any language. The World Wide Web Consortium (W3C) has recognized this fact and now expects all new RFCs to use Unicode for text. Many other products and standards now require or allow use of Unicode; for example, XML, HTML, Microsoft JScript, Java, Perl, Microsoft C#, and Microsoft Visual Basic 7 (VB.NET). Today, Unicode is the de facto character encoding standard accepted by all major computer companies, while ISO 10646 is the corresponding worldwide de jure standard approved by all ISO member countries. The two standards include identical character repertoires and binary representations.
Source:
The Unicode Consortium site at <http://www.unicode.org> has character charts, a glossary, and PDF versions of the Unicode specification. Be prepared for some difficult reading. <http://www.unicode.org/history/> is a chronology of the origin and development of Unicode.
Source:
The concept of the Unicode character set began in 1987, thanks to Joe Becker from Xerox and Mark Davis from Apple. The following year, Becker, Davis, and Lee Collins (currently of Xerox; formerly of Apple) began investigating the design and soon made the case for Han unification to ANSI, ISO. Unicode is, indeed, based on the historic evolution of the Chinese character set (Han). Several people from various high tech companies began holding bimonthly meetings in 1989. By the end of 1990 , an initial, full-review draft was created. In 1991, the group became the Unicode Consortium, a non-profit organization incorporated as Unicode, Inc. Version 1.0 became available to the public for the first time in 1992.
Source:
The w:California-based w:Unicode Consortium first published "w:The Unicode Standard" in w:1991, and continues to develop standards based on that original work. Unicode was developed in conjunction with the International Organization for Standardization and it shares its character repertoire with w:ISO/IEC 10646. Unicode and ISO/IEC 10646 are equivalent as character encodings, but The Unicode Standard contains much more information for implementers, covering, in depth, topics such as bitwise encoding, w:collation, and rendering, and enumerating a multitude of character properties, including those needed for w:BiDi support. The two standards ... have slightly different terminology.
Source:
The Unicode Consortium is a body trying to standardise the handling of character data, including its transformation to and from binary form (otherwise known as encoding and decoding). There is ... a set of ISO standards (10646 in various versions) which do similar things; Unicode and ISO 10646 can largely be regarded as "the same thing" in that they are compatible in almost all respects. (In theory ISO 10646 defines a larger potential set of characters, but this is never likely to become an issue.) Most modern computer languages and environments, such as .NET and Java, use Unicode for character representations. Unicode defines, amongst other things, an abstract character repertoire (the set of characters it covers), a coded character set (a mapping from each character in the repertoire to a non-negative integer), some character encoding forms (mappings from the non-negative integers in the coded character set to sequences of "code units" (eg bytes)), and some character encoding schemes (mappings from sequences of code units into a serialized byte sequences). The difference between a character encoding form and a character encoding scheme is slightly subtle, but takes account of things like endianness.
Source: