JMU
Character Encoding
An Introduction


Prof. David Bernstein
James Madison University

Computer Science Department
bernstdh@jmu.edu


Background
Some History of Character Encodings
The Modern Era
The Modern Approach
8-Bit Unicode Transformation Format (UTF-8)
UTF-8 (cont.)

Theoretical Byte Sequences
(Note: Some of the following sequences are not considered well-formed in the specification.)

utf-8
The Unicode Standard
The Unicode Standard (cont.)

Nerd Humor

/imgs
(Courtesy of xkcd)