JMU
Character Encoding
An Introduction


Prof. David Bernstein
James Madison University

Computer Science Department
bernstdh@jmu.edu


Background
Some History of Character Encodings
The Modern Era
The Modern Approach
8-Bit Unicode Transformation Format (UTF-8)
UTF-8 (cont.)

Theoretical Byte Sequences
(Note: Some of the following sequences are not considered well-formed in the specification.)

images/utf-8.gif
The Unicode Standard
The Unicode Standard (cont.)

Nerd Humor

https://imgs.xkcd.com/comics/the_history_of_unicode.png
(Courtesy of xkcd)