Unit 2 - Information representation and Strings

CBSE Revision Notes

Class-11 Computer Science (New Syllabus)
Unit 2: Computer Systems and Organisation (CSO)

Information representation and Strings

In computer sciences information is represented as binary which is represented by base 2 or octal its base is 8 and hexadecimal its base is 16. Let’s discuss it one by one.

Binary Number system: Binary has base 2 it can be represented by either 0 or 1. First position of binary is represented to the power of 0 20 and last position is represented by 2 raise power x here x represents last digit . binary is represented as (110001)2. Let’s see how to convert binary into decimal.

Step	Binary	Decimal
Step1	(110001)2	((125) + (124) + (023) + (022) + (021) + (020))10
Step 2	(110001)2	(32+ 16 + 0 +0 +0 +1)10
Step 3	(110001)2	4910

Octal Number System: Octal number system has base 8 it is represented by 8 digits 0,1,2,3,4,5,6,7. Just like binary first number of octal is represented to the power 0 of the base 8 80. And last digit is represented to the power x. x means value of last digit starting from 0. It’s represented as (20761)8. Let’s see how to convert octal to decimal

Step	Octal	Decimal
Step1	(20761)8	((284) + (083) + (782) + (681) + (1*80))10
Step 2	(20761)8	(8196+ 0 + 448 +48 +1 )10
Step 3	(20761)8	869310

Hexadecimal Number System: Hexadecimal has base 16. It’s represented by 16 digits starting from 0. 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F. In Hexadecimal we don’t use 10,11,12,13,14,15 but A,B,C,D,E,F. First number represented to the power 0 of the base 16 160 and last number is represented to the power x of the base 16 16x. Hexadecimal is represented as (61DA7)16. Let’s convert Hexadecimal to Decimal.

Step	Octal	Decimal
Step1	(61DA7)16	((6164) + (1163) + (D162) + (A161) + (7*160))10
Step 2	(61DA7)16	((6164) + (1163) + (13162) + (10161) + (7*160))10
Step 3	(61DA7)16	(393216 + 4096 + 3328 + 160 + 7)10
Step 4	(61DA7)16	(400807)10

Unsigned integer: Unsigned integer can hold only positive number starting from 0. Maximum value that an unsigned integer can hold depends on register size for example register size is 16 bits it can store max value of 216-1 which is 65535. If register size or we can say word size is 32 bit it can hold maximum value of 232-1. It cannot hold negative numbers because signed bit is used to hold number not sign.

Binary addition: In this section we’ll see how can we do addition of binary numbers.

0+0=0, 0+1=1, 1+1= 0 and 1 is carry. Let’s do some examples.

Carry

$\frac{\begin{array}{l} 1 \\ 11001 (25)_{10} \\ + 11100 (28)_{10} \end{array}}{110101 {(53)}_{10}}$

Carry

$\frac{\begin{array}{l} 1111 \\ 1000111 (71)_{10} \\ + 1011 (11)_{10} \end{array}}{1010010 {(82)}_{10}}$

strings

In computers strings are represented in different formats like ASCII, UTF-8, UTF-32.

ASCII: American Standard Code for Information Interchange is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. ASCII can represent 127 characters it stores 7 bits of information it. ASCII cannot represent special characters it represents all the keys present on our keyboard. Every character has its own ASCII code. For example ASCII code of “A” is 65 and is represented as 100 0001 code for “B” is 66 represented as 100 0010. Same ways first character on ASCII is NULL its code is 0. Last character on ASCII is “DEL” 127 111 1111.

UTF-8: UTF-8 is the most popular type of Unicode encoding. It uses one byte for standard English letters and symbols, two bytes for additional Latin and Middle Eastern characters, and three bytes for Asian characters. Additional characters can be represented using four bytes. UTF-8 is backwards compatible with ASCII, since the first 128 characters are mapped to the same values. UTF-8 has been the dominant character encoding for the World Wide Web since 2009,

UTF-32: UTF-32 stands for Unicode Transformation Format in 32 bits. It is a protocol to encode Unicode code points that uses exactly 32 bits per Unicode code point (but a number of leading bits must be zero as there are fewer than 221 Unicode code points). UTF-32 is a fixed-length encoding, in contrast to all other Unicode transformation formats, which are variable-length encodings. Each 32-bit value in UTF-32 represents one Unicode code point and is exactly equal to that code point's numerical value.

The main advantage of UTF-32 is that the Unicode code points are directly indexed. Finding the Nth code point in a sequence of code points is a constant time operation. In contrast, a variable-length code requires sequential access to find the Nth code point in a sequence. This makes UTF-32 a simple replacement in code that uses integers that are incremented by one to examine each location in a string, as was commonly done for ASCII.

The main disadvantage of UTF-32 is that it is space-inefficient, using four bytes per code point. Characters beyond the BMP are relatively rare in most texts, and can typically be ignored for sizing estimates.

ISCII: Indian Script Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Assamese, Bengali, Devangiri, Gujarati, Gurumukhi, Kannada, Malayalam, Oriya, Tamil and Telugu. ISCII does not encode the writing systems of India based on Arabic, but its writing system switching codes nonetheless provide for Kashmiri, Sindhi, Urdu, Persian, Pashto and Arabic. The Arabic-based writing systems were subsequently encoded in the PASCII encoding. ISCII has not been widely used outside certain government institutions and has now been rendered largely obsolete by Unicode. Unicode uses a separate block for each Indic writing system, and largely preserves the ISCII layout within each block.