Characters and ASCII

Every character you see on screen is secretly a number. Understanding this idea is key to working with text in C - and to understanding security problems that can happen when handling text.

The char Type

A char holds one character - one letter, digit, or symbol. It takes up one byte of memory (8 bits), which can hold values from 0 to 255.

char letter = 'A';
char digit = '7';
char symbol = '@';

Notice the single quotes. This is important:

  • 'A' is a single character (the letter A, stored as the number 65)
  • "A" is a string (we will cover this in Part 5 - it is different!)

Characters Are Numbers

Here is the key idea: char is just a small number. When you store 'A', you are actually storing the number 65.

char letter = 'A';
printf("Character: %c\n", letter);   // Prints: A
printf("Number: %d\n", letter);      // Prints: 65

This means you can do math with characters:

char letter = 'A';
letter = letter + 1;    // Now letter is 'B' (66)
letter = letter + 25;   // Now letter is 'Z' (90)

ASCII: The Character-to-Number Mapping

ASCII (American Standard Code for Information Interchange) is the standard that defines which number represents which character. It covers values 0-127.

Quick Reference: The Important Parts

You do not need to memorize the full ASCII table. Just remember these key ranges - you can always look up the rest:

RangeCharactersWhat to Remember
0'\0' (null)Marks the end of strings - very important!
48-57'0' to '9'Digits. Note: '0' is 48, not 0!
65-90'A' to 'Z'Uppercase letters
97-122'a' to 'z'Lowercase letters (32 more than uppercase)

That is it! These four ranges cover most of what you will need day-to-day.

Key Ranges to Know

Here is a more detailed breakdown:

RangeCharactersNotes
0-31Control charactersNon-printable. Includes \0 (0), \t (9), \n (10), \r (13)
32SpaceThe space character
48-57'0' to '9'Digits. '0' is 48, NOT 0!
65-90'A' to 'Z'Uppercase letters
97-122'a' to 'z'Lowercase. Exactly 32 more than uppercase.

The Critical Difference: '\0' vs '0'

This confuses everyone at first, and getting it wrong causes serious bugs:

CharacterDecimalWhat It Is
'\0'0The null character - invisible, marks end of strings
'0'48The digit zero - a printable character
char null_char = '\0';   // ASCII 0 - the string terminator
char zero_digit = '0';   // ASCII 48 - the character "0"

// These are completely different!
if (null_char == zero_digit) {
    printf("Same\n");     // This will NEVER print
} else {
    printf("Different\n"); // This WILL print
}

The null character '\0' marks where strings end in C. We will cover this in detail in Part 5.

The Full ASCII Table

The tables below list every ASCII character. These are reference tables - you do not need to memorize them. Bookmark this page and come back when you need to look something up.

Control Characters (0-31)

These are invisible characters that tell the computer to do things like start a new line or make a beep sound. You cannot see them on screen, but they still exist in your text.

DecHexOctCharDescription
00x00000NUL \0Null character (string terminator)
10x01001SOHStart of heading
20x02002STXStart of text
30x03003ETXEnd of text
40x04004EOTEnd of transmission
50x05005ENQEnquiry
60x06006ACKAcknowledge
70x07007BEL \aBell (makes a beep)
80x08010BS \bBackspace
90x09011HT \tHorizontal tab
100x0A012LF \nLine feed (newline)
110x0B013VT \vVertical tab
120x0C014FF \fForm feed
130x0D015CR \rCarriage return
140x0E016SOShift out
150x0F017SIShift in
160x10020DLEData link escape
170x11021DC1Device control 1
180x12022DC2Device control 2
190x13023DC3Device control 3
200x14024DC4Device control 4
210x15025NAKNegative acknowledge
220x16026SYNSynchronous idle
230x17027ETBEnd of transmission block
240x18030CANCancel
250x19031EMEnd of medium
260x1A032SUBSubstitute
270x1B033ESC \eEscape
280x1C034FSFile separator
290x1D035GSGroup separator
300x1E036RSRecord separator
310x1F037USUnit separator

Printable Characters (32-126)

These are the characters you can actually see - letters, numbers, and symbols.

DecHexOctCharDecHexOctCharDecHexOctChar
320x20040(space)640x40100@960x60140`
330x21041!650x41101A970x61141a
340x22042"660x42102B980x62142b
350x23043#670x43103C990x63143c
360x24044$680x44104D1000x64144d
370x25045%690x45105E1010x65145e
380x26046&700x46106F1020x66146f
390x27047'710x47107G1030x67147g
400x28050(720x48110H1040x68150h
410x29051)730x49111I1050x69151i
420x2A052*740x4A112J1060x6A152j
430x2B053+750x4B113K1070x6B153k
440x2C054,760x4C114L1080x6C154l
450x2D055-770x4D115M1090x6D155m
460x2E056.780x4E116N1100x6E156n
470x2F057/790x4F117O1110x6F157o
480x300600800x50120P1120x70160p
490x310611810x51121Q1130x71161q
500x320622820x52122R1140x72162r
510x330633830x53123S1150x73163s
520x340644840x54124T1160x74164t
530x350655850x55125U1170x75165u
540x360666860x56126V1180x76166v
550x370677870x57127W1190x77167w
560x380708880x58130X1200x78170x
570x390719890x59131Y1210x79171y
580x3A072:900x5A132Z1220x7A172z
590x3B073;910x5B133[1230x7B173{
600x3C074<920x5C134\1240x7C174|
610x3D075=930x5D135]1250x7D175}
620x3E076>940x5E136^1260x7E176~
630x3F077?950x5F137_1270x7F177DEL

Delete Character (127)

DecHexOctCharDescription
1270x7F177DELDelete character

Useful Patterns

Converting Between Cases

Uppercase and lowercase letters differ by exactly 32:

char upper = 'A';           // 65
char lower = upper + 32;    // 97 = 'a'

char c = 'g';               // 103
char C = c - 32;            // 71 = 'G'

You can also use bitwise operations (these flip individual bits in the number). Bit 5 is the one that controls whether a letter is uppercase or lowercase:

char c = 'G';
c = c | 32;    // Force lowercase: 'g' (set bit 5)
c = c & ~32;   // Force uppercase: 'G' (clear bit 5)
c = c ^ 32;    // Toggle case (flip bit 5)

Checking Character Types

char c = '7';

// Is it a digit?
if (c >= '0' && c <= '9') {
    int value = c - '0';  // Convert char '7' to int 7
    printf("Digit with value: %d\n", value);
}

// Is it uppercase?
if (c >= 'A' && c <= 'Z') {
    printf("Uppercase letter\n");
}

// Is it lowercase?
if (c >= 'a' && c <= 'z') {
    printf("Lowercase letter\n");
}

// Is it a letter?
if ((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z')) {
    printf("Letter\n");
}

The standard library (built-in C code you can use) provides helpful functions in <ctype.h>:

#include <ctype.h>

char c = 'A';
isdigit(c);   // 0 (false) - not a digit
isalpha(c);   // non-zero (true) - is a letter
isupper(c);   // non-zero (true) - is uppercase
islower(c);   // 0 (false) - not lowercase
isspace(c);   // 0 (false) - not whitespace
toupper(c);   // 'A' (already upper)
tolower(c);   // 'a'

Converting Digit Characters to Numbers

The character '5' is not the number 5 - it is the number 53 (its ASCII value). To get the actual number:

char digit = '7';
int value = digit - '0';  // 55 - 48 = 7

// Going the other way:
int num = 3;
char c = num + '0';  // 3 + 48 = 51 = '3'

Escape Sequences

Some characters cannot be typed directly on your keyboard (like “newline” or “tab”). C uses backslash escape sequences to represent them:

EscapeDecDescription
\00Null character (string terminator)
\a7Bell/alert (makes a beep)
\b8Backspace
\t9Horizontal tab
\n10Newline (line feed)
\v11Vertical tab
\f12Form feed
\r13Carriage return
\\92Backslash
\'39Single quote
\"34Double quote

You can also specify any character by its octal or hex value:

char newline = '\n';     // Using escape sequence
char newline2 = '\012';  // Using octal (012 = 10)
char newline3 = '\x0A';  // Using hex (0A = 10)
// All three are identical

Extended ASCII (128-255)

Standard ASCII only defines characters 0-127 (7 bits). But a char is 8 bits, which can hold values 0-255. The characters 128-255 are called “Extended ASCII.”

Here’s the problem: There is no single agreed-upon standard for Extended ASCII. Different computers and operating systems used different mappings for these extra characters:

Code PageUsed ByCharacters 128-255
ISO-8859-1 (Latin-1)Western EuropeFrench, German, Spanish characters
Windows-1252WindowsSimilar to Latin-1, with extras
CP437DOSBox-drawing characters, some accents
KOI8-RRussianCyrillic alphabet

This caused chaos. A file written on one system would display garbage on another. The character at position 200 might be È on one system and on another.

Extended ASCII Table (ISO-8859-1 / Latin-1)

This table shows ISO-8859-1, also called “Latin-1” - the most common extended ASCII for Western languages. Again, this is just a reference - no need to memorize it.

DecHexCharDecHexCharDecHexCharDecHexChar
1280x80(ctrl)1600xA0(nbsp)1920xC0À2240xE0à
1290x81(ctrl)1610xA1¡1930xC1Á2250xE1á
1300x82(ctrl)1620xA2¢1940xC2Â2260xE2â
1310x83(ctrl)1630xA3£1950xC3Ã2270xE3ã
1320x84(ctrl)1640xA4¤1960xC4Ä2280xE4ä
1330x85(ctrl)1650xA5¥1970xC5Å2290xE5å
1340x86(ctrl)1660xA6¦1980xC6Æ2300xE6æ
1350x87(ctrl)1670xA7§1990xC7Ç2310xE7ç
1360x88(ctrl)1680xA8¨2000xC8È2320xE8è
1370x89(ctrl)1690xA9©2010xC9É2330xE9é
1380x8A(ctrl)1700xAAª2020xCAÊ2340xEAê
1390x8B(ctrl)1710xAB«2030xCBË2350xEBë
1400x8C(ctrl)1720xAC¬2040xCCÌ2360xECì
1410x8D(ctrl)1730xAD(shy)2050xCDÍ2370xEDí
1420x8E(ctrl)1740xAE®2060xCEÎ2380xEEî
1430x8F(ctrl)1750xAF¯2070xCFÏ2390xEFï
1440x90(ctrl)1760xB0°2080xD0Ð2400xF0ð
1450x91(ctrl)1770xB1±2090xD1Ñ2410xF1ñ
1460x92(ctrl)1780xB2²2100xD2Ò2420xF2ò
1470x93(ctrl)1790xB3³2110xD3Ó2430xF3ó
1480x94(ctrl)1800xB4´2120xD4Ô2440xF4ô
1490x95(ctrl)1810xB5µ2130xD5Õ2450xF5õ
1500x96(ctrl)1820xB62140xD6Ö2460xF6ö
1510x97(ctrl)1830xB7·2150xD7×2470xF7÷
1520x98(ctrl)1840xB8¸2160xD8Ø2480xF8ø
1530x99(ctrl)1850xB9¹2170xD9Ù2490xF9ù
1540x9A(ctrl)1860xBAº2180xDAÚ2500xFAú
1550x9B(ctrl)1870xBB»2190xDBÛ2510xFBû
1560x9C(ctrl)1880xBC¼2200xDCÜ2520xFCü
1570x9D(ctrl)1890xBD½2210xDDÝ2530xFDý
1580x9E(ctrl)1900xBE¾2220xDEÞ2540xFEþ
1590x9F(ctrl)1910xBF¿2230xDFß2550xFFÿ

Unicode: The Modern Solution

The Extended ASCII mess was a real problem. Imagine writing an email in French on your computer, then your friend in Japan opens it and sees garbage characters. That happened all the time!

Unicode fixed this by creating one big list that gives every character in every language its own unique number - over 150,000 characters and counting. It includes letters from every alphabet, plus things like Chinese characters and even emojis.

What you need to know for now:

  1. UTF-8 is how most computers store Unicode text today. The good news: characters 0-127 work exactly like ASCII, so all your ASCII code still works.

  2. Text handling gets tricky. For example, Turkish has four different “I” letters (I, ı, İ, i), not just two like English. Code that assumes uppercase of 'i' is always 'I' will mess up Turkish text.

  3. C was not designed for Unicode. For now, stick with basic ASCII characters (the ones in this lesson). When you need to handle text from different languages, you will need to learn about special libraries.

If you want to learn more later, see unicode.org.

Try It Yourself

  1. Write a program that prints the ASCII table from 32-126
  2. Write a function that converts a lowercase letter to uppercase (without using toupper)
  3. Write a function that takes a digit character (‘0’-‘9’) and returns its numeric value
  4. What does 'A' + 'a' equal? Calculate it, then verify with code.

Common Mistakes

  • Confusing '\0' and '0': The null character (value 0) vs the digit zero (value 48) - this trips up everyone at first!
  • Confusing char and int: They are both numbers, but char only holds values 0-255
  • Forgetting single quotes: A is a variable name, 'A' is the character A
  • Assuming all text is ASCII: Characters from other languages use different systems

Next Up

In Part 4, we will learn about bitwise operations - how to manipulate individual bits for low-level programming tasks like flags, permissions, and efficient calculations.


Enjoyed This?

If this helped something click, subscribe to my YouTube channel. More content like this, same approach - making things stick without insulting your intelligence. It’s free, it helps more people find this stuff, and it tells me what’s worth making more of.