What is Unicode?
In order to understand the concept of Unicode we require to move a little back and understand the ANSI code. The ASCII (ask key) stands for American Standard Code for Information Interchange. In the ASCII format every character is represented by one byte (i.e. 8 bits). So in brief we can have 256 characters (2^8). Before UNICODE came into picture the programmers used code page to represent characters in different languages. The Code page is a different interpretation of ASCII set. The Code pages keep 128 characters for English and the rest 128 characters are tailored for a specific language.
The pictorial representation of the same is shown below:
Figure:- Code page in action
There are following demerits of the CODE page approach:-
(i)Some languages such as Chinese have more than 5000 characters that is difficult to represent only 128 character set.
(ii)Only 2 languages can be supported at one time. As said you can use only 128 for English and the rest 128 for the other language.
(iii)The end client must have the code page.
(iv)The Code Representation modify according to the Operating system and Language used.
(v)That means a character can be represented in different numbers depending on the operating system.
For all the above problems the UNICODE was introduced. The UNICODE represents characters with 2 bytes. Therefore if its two bytes that means 18 bits. You can now have 2^16 characters that is 65536 characters. That's a huge number you can include any language in the world. In further if you use surrogates you can have additional 1 million characters...Hmm that can involve type of language including historian characters.
The ASCII representation varied according to the operating system and language. But in the UNICODE it assigns a unique letter for every character irrespective of the Language or the operating system which makes programmers life much easier while developing international compatible applications.