Latin 1 and Unicode characters in &ampersand; entities

*Return to index
*Go to HTML <IMG ...> tag ALT attribute info

Latin 1 Characters

This chart shows the effects of numeric ampersand entities on your browser. To use these characters in your own HTML files, put the appropriate number into &#__; (e.g. "&#163;" for the British pound (currency) sign), or, for the 8-bit alphabetic characters, use the alternative standard HTML 2.0 entity in parentheses on the right. (These are the only non-numeric character entities defined in HTML 2.0, except for "&amp;", "&lt;", and "&gt;", which should be used to escape the characters & < > in an HTML file, and "&quot;" to escape a double-quote character in an attribute value.)

If the right column looks the same as the left column, you're losing the eighth bit somewhere. If the characters in the right column don't match their descriptions, then your browser is translating incorrectly between ISO 8859-1 Latin 1 and your platform's native character set. (You can refer to a .gif image of the character glyphs that should be displayed in 8859-1 Latin 1.)

Finally, note that positions 127-159 are not displayable characters in ISO 8859-1 Latin 1, and are not part of any HTML standard, so that HTML code such as "™" is incorrect, and will be displayed differently in browsers on different platforms (probably often in ways that you did not intend). See the next chart below for the (future) correct way of displaying characters which are in positions 130-159 in Microsoft Windows -- including such typographical niceties as "curly" quotes, dashes, ellipses, and the trademark symbol.

The following chart only tests the ISO 8859-1 compliance of your browser's non-proportional font; to test the proportional font see the alternative Latin 1 chart using HTML Tables.

 32             160         Non-breaking space
 33    !        161    ¡    Inverted exclamation
 34    "        162    ¢    Cent sign
 35    #        163    £    Pound sterling
 36    $        164    ¤    General currency sign
 37    %        165    ¥    Yen sign
 38    &        166    ¦    Broken vertical bar
 39    '        167    §    Section sign
 40    (        168    ¨    Umlaut (dieresis)
 41    )        169    ©    Copyright
 42    *        170    ª    Feminine ordinal
 43    +        171    «    Left angle quote, guillemotleft
 44    ,        172    ¬    Not sign
 45    -        173    ­    Soft hyphen
 46    .        174    ®    Registered trademark
 47    /        175    ¯    Macron accent
 48    0        176    °    Degree sign
 49    1        177    ±    Plus or minus
 50    2        178    ²    Superscript two
 51    3        179    ³    Superscript three
 52    4        180    ´    Acute accent
 53    5        181    µ    Micro sign
 54    6        182    ¶    Paragraph sign
 55    7        183    ·    Middle dot
 56    8        184    ¸    Cedilla
 57    9        185    ¹    Superscript one
 58    :        186    º    Masculine ordinal
 59    ;        187    »    Right angle quote, guillemotright
 60    <        188    ¼    Fraction one-fourth
 61    =        189    ½    Fraction one-half
 62    >        190    ¾    Fraction three-fourths
 63    ?        191    ¿    Inverted question mark
 64    @        192    À    Capital A, grave accent ("&Agrave;")
 65    A        193    Á    Capital A, acute accent ("&Aacute;")
 66    B        194    Â    Capital A, circumflex accent ("&Acirc;")
 67    C        195    Ã    Capital A, tilde ("&Atilde;")
 68    D        196    Ä    Capital A, dieresis or umlaut mark ("&Auml;")
 69    E        197    Å    Capital A, ring ("&Aring;")
 70    F        198    Æ    Capital AE dipthong (ligature) ("&AElig;")
 71    G        199    Ç    Capital C, cedilla ("&Ccedil;")
 72    H        200    È    Capital E, grave accent ("&Egrave;")
 73    I        201    É    Capital E, acute accent ("&Eacute;")
 74    J        202    Ê    Capital E, circumflex accent ("&Ecirc;")
 75    K        203    Ë    Capital E, dieresis or umlaut mark ("&Euml;")
 76    L        204    Ì    Capital I, grave accent ("&Igrave;")
 77    M        205    Í    Capital I, acute accent ("&Iacute;")
 78    N        206    Î    Capital I, circumflex accent ("&Icirc;")
 79    O        207    Ï    Capital I, dieresis or umlaut mark ("&Iuml;")
 80    P        208    Ð    Capital Eth, Icelandic ("&ETH;")
 81    Q        209    Ñ    Capital N, tilde ("&Ntilde;")
 82    R        210    Ò    Capital O, grave accent ("&Ograve;")
 83    S        211    Ó    Capital O, acute accent ("&Oacute;")
 84    T        212    Ô    Capital O, circumflex accent ("&Ocirc;")
 85    U        213    Õ    Capital O, tilde ("&Otilde;")
 86    V        214    Ö    Capital O, dieresis or umlaut mark ("&Ouml;")
 87    W        215    ×    Multiply sign
 88    X        216    Ø    Capital O, slash ("&Oslash;")
 89    Y        217    Ù    Capital U, grave accent ("&Ugrave;")
 90    Z        218    Ú    Capital U, acute accent ("&Uacute;")
 91    [        219    Û    Capital U, circumflex accent ("&Ucirc;")
 92    \        220    Ü    Capital U, dieresis or umlaut mark ("&Uuml;")
 93    ]        221    Ý    Capital Y, acute accent ("&Yacute;")
 94    ^        222    Þ    Capital THORN, Icelandic ("&THORN;")
 95    _        223    ß    Small sharp s, German (sz ligature) ("&szlig;")
 96    `        224    à    Small a, grave accent ("&agrave;")
 97    a        225    á    Small a, acute accent ("&aacute;")
 98    b        226    â    Small a, circumflex accent ("&acirc;")
 99    c        227    ã    Small a, tilde ("&atilde;")
100    d        228    ä    Small a, dieresis or umlaut mark ("&auml;")
101    e        229    å    Small a, ring ("&aring;")
102    f        230    æ    Small ae dipthong (ligature) ("&aelig;")
103    g        231    ç    Small c, cedilla ("&ccedil;")
104    h        232    è    Small e, grave accent ("&egrave;")
105    i        233    é    Small e, acute accent ("&eacute;")
106    j        234    ê    Small e, circumflex accent ("&ecirc;")
107    k        235    ë    Small e, dieresis or umlaut mark ("&euml;")
108    l        236    ì    Small i, grave accent ("&igrave;")
109    m        237    í    Small i, acute accent ("&iacute;")
110    n        238    î    Small i, circumflex accent ("&icirc;")
111    o        239    ï    Small i, dieresis or umlaut mark ("&iuml;")
112    p        240    ð    Small eth, Icelandic ("&eth;")
113    q        241    ñ    Small n, tilde ("&ntilde;")
114    r        242    ò    Small o, grave accent ("&ograve;")
115    s        243    ó    Small o, acute accent ("&oacute;")
116    t        244    ô    Small o, circumflex accent ("&ocirc;")
117    u        245    õ    Small o, tilde ("&otilde;")
118    v        246    ö    Small o, dieresis or umlaut mark ("&ouml;")
119    w        247    ÷    Division sign
120    x        248    ø    Small o, slash ("&oslash;")
121    y        249    ù    Small u, grave accent ("&ugrave;")
122    z        250    ú    Small u, acute accent ("&uacute;")
123    {        251    û    Small u, circumflex accent ("&ucirc;")
124    |        252    ü    Small u, dieresis or umlaut mark ("&uuml;")
125    }        253    ý    Small y, acute accent ("&yacute;")
126    ~        254    þ    Small thorn, Icelandic ("&thorn;")
                255    ÿ    Small y, dieresis or umlaut mark ("&yuml;")

The correct way to display "smart quotes", the trademark symbol, etc.

Some commonly-desired characters, such as the trademark symbol, as well as such typographical niceties as "curly" quotes, dashes, and ellipses, are not part of the ISO 8859-1 character set, and so cannot be displayed properly in HTML 2.0. If you put a raw 8-bit character in your file and intend it to be understood with a non-ISO8859-1 meaning, or put a numeric entity reference between 128 and 159 there (such as "&#153;"), then this is incorrect HTML, which will not display as you intended on browsers on other platforms, and maybe not even on other browsers on the same platform -- even when it "looks right" in your own browser.

One correct way to specify such characters in more recent versions of HTML (starting with the "Cougar" proposal -- now superseded by the proposed HTML 4.0 standard -- and/or "internationalized HTML" as specified in RFC 2070) is to use numeric entities greater than 255, which refer to positions in the Unicode character set, as outlined in the Usenet posting below. Unfortunately, these are only begining to be implemented in some newer brower versions at this moment, but will become more widely implemented in the future. (You can see whether your own browser understands these entities by looking at the third column of the table below.)

(See also http://www.w3.org/pub/WWW/TR/WD-entities (from the "Cougar" draft) or http://www.w3.org/TR/WD-html40-970708/sgml/HTMLmisc.ent (HTML 4.0) for relevant entity lists in the proposed HTML standards.)


From: Markus Kuhn <kuhn@cs.purdue.edu>
Newsgroups: comp.text.sgml, comp.std.internat, comp.infosystems.www.authoring.html
Date: Thu, 24 Apr 1997 23:57:52 -0500
Message-ID: <336039D0.FD4@cs.purdue.edu>

[Question: &#146; valid HTML or no?]

The characters 128-159 are not used in ISO 8859-1 and Unicode, the character sets of HTML. MS-Windows uses a superset of ANSI/ISO 8859-1, known to experts as "Code Page 1252 (CP1252)", a Microsoft-specific character set with additional characters in the 128-159 range (also known as the "C1" range).

All the CP1252 characters are also available in Unicode. For example the CP1252 character 146 that you mentioned (RIGHT SINGLE QUOTATION MARK) has the Unicode number 8217, therefore you should use this number in order to conform to the HTML standard. Modern HTML browsers like Netscape 4.0 understand Unicode, and will automatically convert the Unicode character &#8217; back into the character 146 on MS-Windows machines, and into the appropriate character on other systems.

The official CP1252<->Unicode conversion table is printed in the Unicode 2.0 standard for instance, and is available on <ftp://ftp.informatik.uni-erlangen.de/pub/doc/ISO/charsets/> in the file ucs-map-cp1252. [See also the file ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT at the official Unicode site.]

MS-Windows HTML-authoring software definitely should implement the conversion table below! Please forward this mail to the developers of your HTML authoring tool if this is currently done wrong.

The CP1252 characters that are not part of ANSI/ISO 8859-1, and that should therefore always be encoded as Unicode characters greater than 255, are the following:

 Windows   Unicode    Char.
  char.   HTML code   test         Description of Character
  -----     -----     ---          ------------------------
ALT-0130   &#8218;   ‚    Single Low-9 Quotation Mark
ALT-0131   &#402;    ƒ    Latin Small Letter F With Hook
ALT-0132   &#8222;   „    Double Low-9 Quotation Mark
ALT-0133   &#8230;   …    Horizontal Ellipsis
ALT-0134   &#8224;   †    Dagger
ALT-0135   &#8225;   ‡    Double Dagger
ALT-0136   &#710;    ˆ    Modifier Letter Circumflex Accent
ALT-0137   &#8240;   ‰    Per Mille Sign
ALT-0138   &#352;    Š    Latin Capital Letter S With Caron
ALT-0139   &#8249;   ‹    Single Left-Pointing Angle Quotation Mark
ALT-0140   &#338;    Π   Latin Capital Ligature OE
ALT-0145   &#8216;   ‘    Left Single Quotation Mark
ALT-0146   &#8217;   ’    Right Single Quotation Mark
ALT-0147   &#8220;   “    Left Double Quotation Mark
ALT-0148   &#8221;   ”    Right Double Quotation Mark
ALT-0149   &#8226;   •    Bullet
ALT-0150   &#8211;   –    En Dash
ALT-0151   &#8212;   —    Em Dash
ALT-0152   &#732;    ˜    Small Tilde
ALT-0153   &#8482;   ™    Trade Mark Sign
ALT-0154   &#353;    š    Latin Small Letter S With Caron
ALT-0155   &#8250;   ›    Single Right-Pointing Angle Quotation Mark
ALT-0156   &#339;    œ    Latin Small Ligature OE
ALT-0159   &#376;    Ÿ    Latin Capital Letter Y With Diaeresis

-- 
Markus Kuhn, Computer Science grad student, Purdue
University, Indiana, US, email: kuhn@cs.purdue.edu


[New!] Lynx versions 2.7.1ac-0.87 and later now support all of the numeric entities in the table above, as well as all of the entities in the first non-Western European table below (translating as best it can within the limitations of the character set which happens to be in use during a particular terminal session). If your browser doesn't do as good a job, then ask the software company to catch up with Lynx! [Wink]

1997 update -- In future versions of Windows "ANSI", or Code Page 1252, the character ALT-0128 will represent the Euro currency symbol (Unicode character &#8364;).


What about Greek and Math characters?

See http://www.physics.gla.ac.uk/r2h-extras/rtfunicode.html, or look at the entity lists included as part of various proposed HTML standards (http://www.w3.org/pub/WWW/TR/WD-entities and http://www.w3.org/TR/WD-html40-970708/sgml/HTMLsym.ent). Some math and Greek characters are also listed in the table of selected characters from the "Minimum European Subset" of Unicode below. What you should be aware of, is that even if you can get mathematical symbols to be displayed using only HTML (which is far from a certain thing at the current moment), mainstream HTML still doesn't offer a full solution to the problem of layout and formatting of mathematical formulas.


*There are some caveats about <FONT FACE=> available here.

Displaying Non-Western European languages in HTML

Languages that use characters that are not included in the ISO-8859-1 Latin-1 character set cannot be dealt with properly using plain HTML 2.0 (which was described in RFC-1866). However, there are several methods of handling such languages that are more or less "standards-compliant", in that they follow the "Internationalized" HTML 2.0 specification in RFC-2070 and the forthcoming HTML 4.0 draft standard.

As an example, suppose that one wished to present material written in an Eastern European language that needs characters from the ISO-8859-2 Latin 2 character set. Here are the two main approaches:

  1. Send the following line among the HTTP headers of the document sent by the HTTP server (but not as part of the document itself):
    Content-Type: text/html; charset=ISO-8859-2

    You can then use the raw 8-bit Latin 2 characters in the range 160-255 in your document (but because ISO-8859-2 is technically not the SGML "Document Character Set", the apparently corresponding numeric entities "&#160;" - "&#255;" will not be properly interpreted as Latin 2 characters, but rather with their normal Latin 1 meanings, as shown in the chart at the top of this file).


    *A reference on the ISO-8859 character sets is available here and another (partly in French) here.
  2. A more "multilingual" approach is to use the Unicode character set, in which non-ISO-8859-1 characters occur in positions higher than 255. The most radical method of doing this would be to use high Unicode characters directly, as part of a non-single-byte transfer encoding (such as UTF-8 or raw double-byte), accompanied by the appropriate HTTP headers. But it should also be possible to use numeric entities greater than 255, even within a simple ASCII or ISO-8859-1 file, in order to refer to Unicode characters that are not included in ISO-8859-1 Latin-1. This method is only begining to be implemented by browsers at the current moment, however. Early versions of Netscape 4 would only deal with such entities if the MIME character set of the document had been declared to be Unicode in the HTTP headers, or the user had manually configured the Netscape to view the document as Unicode. Netscape v4.04 has apparently fixed this, at least to the extent of correctly interpreting entity references >255 which happen to correspond to glyphs in the currently-used font, but it will not generally change fonts merely because of the presence in an HTML document of an entity that refers to a character which cannot be displayed using the current font (so that it may still be necessary to declare a document directly containing only plain ASCII characters to be in the UTF-8 encoding of Unicode in order to fool Netscape into displaying such character entities correctly -- and this could cause problems for non-Unicode-aware Web browsers).
    The way to get an HTML page which is being delivered using the HTTP protocol to be accompanied with the appropriate MIME header that declares the page to be in UTF-8 depends on the web-server being used; in the case of NCSA httpd or Apache, you can include the following line in an .htaccess file in the same directory:
    AddType text/html;charset=utf-8 xxx

    (where xxx is the extension of HTML files to be delivered with UTF8 character-set headers; at this site I have defined the extension as ".utf8", and you can view a version of this page delivered with a "Content-Type: text/html;Charset=utf-8" header).

Of course, non-Latin-1 characters can't be displayed in any case unless you have an appropriate font on your system, and your browser knows about the font.


*For further information on including non-Latin-1 characters in HTML, go to Alan Flavell's "HTML Internationalisation (i18n) Quickstart".

Characters in the Unicode "Latin Extended-A" block

The following chart gives a list of the characters in the Unicode "Latin Extended-A" block (which contains almost all of the non-ISO-8859-1 characters included in the ISO-8859-2, ISO-8859-3, ISO-8859-4, and ISO-8859-9 character sets), along with the corresponding HTML numeric entity codes as they could be used in recent and future HTML browsers:


*See also the official Unicode documentation on these characters (including a .gif image of a glyph chart that shows how these characters should be displayed).
 HTML
 Char.              Description
 -----              ___________
&#256; Ā  Latin Capital Letter A With Macron      (Capital A Macron)
&#257; ā  Latin Small Letter A With Macron        (Small A Macron)
&#258; Ă  Latin Capital Letter A With Breve       (Capital A Breve)
&#259; ă  Latin Small Letter A With Breve         (Small A Breve)
&#260; Ą  Latin Capital Letter A With Ogonek      (Capital A Ogonek)
&#261; ą  Latin Small Letter A With Ogonek        (Small A Ogonek)
&#262; Ć  Latin Capital Letter C With Acute       (Capital C Acute)
&#263; ć  Latin Small Letter C With Acute         (Small C Acute)
&#264; Ĉ  Latin Capital Letter C With Circumflex  (Capital C Circumflex)
&#265; ĉ  Latin Small Letter C With Circumflex    (Small C Circumflex)
&#266; Ċ  Latin Capital Letter C With Dot Above   (Capital C Dot)
&#267; ċ  Latin Small Letter C With Dot Above     (Small C Dot)
&#268; Č  Latin Capital Letter C With Caron       (Capital C Hacek)
&#269; č  Latin Small Letter C With Caron         (Small C Hacek)
&#270; Ď  Latin Capital Letter D With Caron       (Capital D Hacek)
&#271; ď  Latin Small Letter D With Caron         (Small D Hacek)
&#272; Đ  Latin Capital Letter D With Stroke      (Capital D Bar)
&#273; đ  Latin Small Letter D With Stroke        (Small D Bar)
&#274; Ē  Latin Capital Letter E With Macron      (Capital E Macron)
&#275; ē  Latin Small Letter E With Macron        (Small E Macron)
&#276; Ĕ  Latin Capital Letter E With Breve       (Capital E Breve)
&#277; ĕ  Latin Small Letter E With Breve         (Small E Breve)
&#278; Ė  Latin Capital Letter E With Dot Above   (Capital E Dot)
&#279; ė  Latin Small Letter E With Dot Above     (Small E Dot)
&#280; Ę  Latin Capital Letter E With Ogonek      (Capital E Ogonek)
&#281; ę  Latin Small Letter E With Ogonek        (Small E Ogonek)
&#282; Ě  Latin Capital Letter E With Caron       (Capital E Hacek)
&#283; ě  Latin Small Letter E With Caron         (Small E Hacek)
&#284; Ĝ  Latin Capital Letter G With Circumflex  (Capital G Circumflex)
&#285; ĝ  Latin Small Letter G With Circumflex    (Small G Circumflex)
&#286; Ğ  Latin Capital Letter G With Breve       (Capital G Breve)
&#287; ğ  Latin Small Letter G With Breve         (Small G Breve)
&#288; Ġ  Latin Capital Letter G With Dot Above   (Capital G Dot)
&#289; ġ  Latin Small Letter G With Dot Above     (Small G Dot)
&#290; Ģ  Latin Capital Letter G With Cedilla     (Capital G Cedilla)
&#291; ģ  Latin Small Letter G With Cedilla       (Small G Cedilla)
&#292; Ĥ  Latin Capital Letter H With Circumflex  (Capital H Circumflex)
&#293; ĥ  Latin Small Letter H With Circumflex    (Small H Circumflex)
&#294; Ħ  Latin Capital Letter H With Stroke      (Capital H Bar)
&#295; ħ  Latin Small Letter H With Stroke        (Small H Bar)
&#296; Ĩ  Latin Capital Letter I With Tilde       (Capital I Tilde)
&#297; ĩ  Latin Small Letter I With Tilde         (Small I Tilde)
&#298; Ī  Latin Capital Letter I With Macron      (Capital I Macron)
&#299; ī  Latin Small Letter I With Macron        (Small I Macron)
&#300; Ĭ  Latin Capital Letter I With Breve       (Capital I Breve)
&#301; ĭ  Latin Small Letter I With Breve         (Small I Breve)
&#302; Į  Latin Capital Letter I With Ogonek      (Capital I Ogonek)
&#303; į  Latin Small Letter I With Ogonek        (Small I Ogonek)
&#304; İ  Latin Capital Letter I With Dot Above   (Capital I Dot)
&#305; ı  Latin Small Letter Dotless I
&#306; IJ  Latin Capital Ligature IJ               (Capital I J)
&#307; ij  Latin Small Ligature IJ                 (Small I J)
&#308; Ĵ  Latin Capital Letter J With Circumflex  (Capital J Circumflex)
&#309; ĵ  Latin Small Letter J With Circumflex    (Small J Circumflex)
&#310; Ķ  Latin Capital Letter K With Cedilla     (Capital K Cedilla)
&#311; ķ  Latin Small Letter K With Cedilla       (Small K Cedilla)
&#312; ĸ  Latin Small Letter Kra
&#313; Ĺ  Latin Capital Letter L With Acute       (Capital L Acute)
&#314; ĺ  Latin Small Letter L With Acute         (Small L Acute)
&#315; Ļ  Latin Capital Letter L With Cedilla     (Capital L Cedilla)
&#316; ļ  Latin Small Letter L With Cedilla       (Small L Cedilla)
&#317; Ľ  Latin Capital Letter L With Caron       (Capital L Hacek)
&#318; ľ  Latin Small Letter L With Caron         (Small L Hacek)
&#319; Ŀ  Latin Capital Letter L With Middle Dot
&#320; ŀ  Latin Small Letter L With Middle Dot
&#321; Ł  Latin Capital Letter L With Stroke      (Capital L Slash)
&#322; ł  Latin Small Letter L With Stroke        (Small L Slash)
&#323; Ń  Latin Capital Letter N With Acute       (Capital N Acute)
&#324; ń  Latin Small Letter N With Acute         (Small N Acute)
&#325; Ņ  Latin Capital Letter N With Cedilla     (Capital N Cedilla)
&#326; ņ  Latin Small Letter N With Cedilla       (Small N Cedilla)
&#327; Ň  Latin Capital Letter N With Caron       (Capital N Hacek)
&#328; ň  Latin Small Letter N With Caron         (Small N Hacek)
&#329; ʼn  Latin Small Letter N Preceded By Apostrophe (Small Apostrophe N)
&#330; Ŋ  Latin Capital Letter Eng
&#331; ŋ  Latin Small Letter Eng
&#332; Ō  Latin Capital Letter O With Macron      (Capital O Macron)
&#333; ō  Latin Small Letter O With Macron        (Small O Macron)
&#334; Ŏ  Latin Capital Letter O With Breve       (Capital O Breve)
&#335; ŏ  Latin Small Letter O With Breve         (Small O Breve)
&#336; Ő  Latin Capital Letter O With Double Acute (Capital O Double Acute)
&#337; ő  Latin Small Letter O With Double Acute  (Small O Double Acute)
&#338; Π Latin Capital Ligature OE               (Capital O E)
&#339; œ  Latin Small Ligature OE                 (Small O E)
&#340; Ŕ  Latin Capital Letter R With Acute       (Capital R Acute)
&#341; ŕ  Latin Small Letter R With Acute         (Small R Acute)
&#342; Ŗ  Latin Capital Letter R With Cedilla     (Capital R Cedilla)
&#343; ŗ  Latin Small Letter R With Cedilla       (Small R Cedilla)
&#344; Ř  Latin Capital Letter R With Caron       (Capital R Hacek)
&#345; ř  Latin Small Letter R With Caron         (Small R Hacek)
&#346; Ś  Latin Capital Letter S With Acute       (Capital S Acute)
&#347; ś  Latin Small Letter S With Acute         (Small S Acute)
&#348; Ŝ  Latin Capital Letter S With Circumflex  (Capital S Circumflex)
&#349; ŝ  Latin Small Letter S With Circumflex    (Small S Circumflex)
&#350; Ş  Latin Capital Letter S With Cedilla     (Capital S Cedilla)
&#351; ş  Latin Small Letter S With Cedilla       (Small S Cedilla)
&#352; Š  Latin Capital Letter S With Caron       (Capital S Hacek)
&#353; š  Latin Small Letter S With Caron         (Small S Hacek)
&#354; Ţ  Latin Capital Letter T With Cedilla     (Capital T Cedilla)
&#355; ţ  Latin Small Letter T With Cedilla       (Small T Cedilla)
&#356; Ť  Latin Capital Letter T With Caron       (Capital T Hacek)
&#357; ť  Latin Small Letter T With Caron         (Small T Hacek)
&#358; Ŧ  Latin Capital Letter T With Stroke      (Capital T Bar)
&#359; ŧ  Latin Small Letter T With Stroke        (Small T Bar)
&#360; Ũ  Latin Capital Letter U With Tilde       (Capital U Tilde)
&#361; ũ  Latin Small Letter U With Tilde         (Small U Tilde)
&#362; Ū  Latin Capital Letter U With Macron      (Capital U Macron)
&#363; ū  Latin Small Letter U With Macron        (Small U Macron)
&#364; Ŭ  Latin Capital Letter U With Breve       (Capital U Breve)
&#365; ŭ  Latin Small Letter U With Breve         (Small U Breve)
&#366; Ů  Latin Capital Letter U With Ring Above  (Capital U Ring)
&#367; ů  Latin Small Letter U With Ring Above    (Small U Ring)
&#368; Ű  Latin Capital Letter U With Double Acute (Capital U Double Acute)
&#369; ű  Latin Small Letter U With Double Acute  (Small U Double Acute)
&#370; Ų  Latin Capital Letter U With Ogonek      (Capital U Ogonek)
&#371; ų  Latin Small Letter U With Ogonek        (Small U Ogonek)
&#372; Ŵ  Latin Capital Letter W With Circumflex  (Capital W Circumflex)
&#373; ŵ  Latin Small Letter W With Circumflex    (Small W Circumflex)
&#374; Ŷ  Latin Capital Letter Y With Circumflex  (Capital Y Circumflex)
&#375; ŷ  Latin Small Letter Y With Circumflex    (Small Y Circumflex)
&#376; Ÿ  Latin Capital Letter Y With Diaeresis   (Capital Y Diaeresis)
&#377; Ź  Latin Capital Letter Z With Acute       (Capital Z Acute)
&#378; ź  Latin Small Letter Z With Acute         (Small Z Acute)
&#379; Ż  Latin Capital Letter Z With Dot Above   (Capital Z Dot)
&#380; ż  Latin Small Letter Z With Dot Above     (Small Z Dot)
&#381; Ž  Latin Capital Letter Z With Caron       (Capital Z Hacek)
&#382; ž  Latin Small Letter Z With Caron         (Small Z Hacek)
&#383; ſ  Latin Small Letter Long S

Other selected Unicode characters from the "Minimum European Subset"

This table includes some selected characters from the "Mimimum European Subset" of Unicode that are not also part of ISO 8859-1 or the Unicode "Latin Extended-A" block, and that were not included solely for use in writing non-Latin-alphabet languages, or in order to preserve compatiblity with the old IBM PC / MS-DOS "Code Page 437" character set.


*See also the full list of characters included in the "Minimum European Subset of ISO/IEC 10646-1"
&#402;   ƒ Latin Small Letter F With Hook      (Small Script F)
&#439;   Ʒ Latin Capital Letter Ezh            (Capital Yogh)
&#452;   DŽ Latin Capital Letter DZ With Caron  (Capital D Z Hacek)
&#454;   dž Latin Small Letter DZ With Caron    (Small D Z Hacek)
&#455;   LJ Latin Capital Letter LJ             (Capital L J)
&#457;   lj Latin Small Letter LJ               (Small L J)
&#458;   NJ Latin Capital Letter NJ             (Capital N J)
&#460;   nj Latin Small Letter NJ               (Small N J)
&#478;   Ǟ Latin Capital Letter A With Diaeresis And Macron (Capital A Diaeresis Macron)
&#479;   ǟ Latin Small Letter A With Diaeresis And Macron   (Small A Diaeresis Macron)
&#484;   Ǥ Latin Capital Letter G With Stroke  (Capital G Bar)
&#485;   ǥ Latin Small Letter G With Stroke    (Small G Bar)
&#486;   Ǧ Latin Capital Letter G With Caron   (Capital G Hacek)
&#487;   ǧ Latin Small Letter G With Caron     (Small G Hacek)
&#488;   Ǩ Latin Capital Letter K With Caron   (Capital K Hacek)
&#489;   ǩ Latin Small Letter K With Caron     (Small K Hacek)
&#494;   Ǯ Latin Capital Letter Ezh With Caron (Capital Yogh Hacek)
&#495;   ǯ Latin Small Letter Ezh With Caron   (Small Yogh Hacek)
&#497;   DZ Latin Capital Letter DZ
&#499;   dz Latin Small Letter DZ
&#500;   Ǵ Latin Capital Letter G With Acute
&#501;   ǵ Latin Small Letter G With Acute
&#506;   Ǻ Latin Capital Letter A With Ring Above And Acute
&#507;   ǻ Latin Small Letter A With Ring Above And Acute
&#508;   Ǽ Latin Capital Letter AE With Acute
&#509;   ǽ Latin Small Letter AE With Acute
&#510;   Ǿ Latin Capital Letter O With Stroke And Acute
&#511;   ǿ Latin Small Letter O With Stroke And Acute
&#636;   ɼ Latin Small Letter R With Long Leg
&#658;   ʒ Latin Small Letter Ezh              (Small Yogh)
&#728;   ˘ Breve                               (Spacing Breve)
&#729;   ˙ Dot Above                           (Spacing Dot Above)
&#730;   ˚ Ring Above                          (Spacing Ring Above)
&#731;   ˛ Ogonek                              (Spacing Ogonek)
&#732;   ˜ Small Tilde                         (Spacing Tilde)
&#733;   ˝ Double Acute Accent                 (Spacing Double Acute)
&#913;   Α Greek Capital Letter Alpha
&#914;   Β Greek Capital Letter Beta
&#915;   Γ Greek Capital Letter Gamma
&#916;   Δ Greek Capital Letter Delta
&#917;   Ε Greek Capital Letter Epsilon
&#918;   Ζ Greek Capital Letter Zeta
&#919;   Η Greek Capital Letter Eta
&#920;   Θ Greek Capital Letter Theta
&#921;   Ι Greek Capital Letter Iota
&#922;   Κ Greek Capital Letter Kappa
&#923;   Λ Greek Capital Letter Lamda          (Capital Lambda)
&#924;   Μ Greek Capital Letter Mu
&#925;   Ν Greek Capital Letter Nu
&#926;   Ξ Greek Capital Letter Xi
&#927;   Ο Greek Capital Letter Omicron
&#928;   Π Greek Capital Letter Pi
&#929;   Ρ Greek Capital Letter Rho
&#931;   Σ Greek Capital Letter Sigma
&#932;   Τ Greek Capital Letter Tau
&#933;   Υ Greek Capital Letter Upsilon
&#934;   Φ Greek Capital Letter Phi
&#935;   Χ Greek Capital Letter Chi
&#936;   Ψ Greek Capital Letter Psi
&#937;   Ω Greek Capital Letter Omega
&#945;   α Greek Small Letter Alpha
&#946;   β Greek Small Letter Beta
&#947;   γ Greek Small Letter Gamma
&#948;   δ Greek Small Letter Delta
&#949;   ε Greek Small Letter Epsilon
&#950;   ζ Greek Small Letter Zeta
&#951;   η Greek Small Letter Eta
&#952;   θ Greek Small Letter Theta
&#953;   ι Greek Small Letter Iota
&#954;   κ Greek Small Letter Kappa
&#955;   λ Greek Small Letter Lamda            (Small Lambda)
&#956;   μ Greek Small Letter Mu
&#957;   ν Greek Small Letter Nu
&#958;   ξ Greek Small Letter Xi
&#959;   ο Greek Small Letter Omicron
&#960;   π Greek Small Letter Pi
&#961;   ρ Greek Small Letter Rho
&#962;   ς Greek Small Letter Final Sigma
&#963;   σ Greek Small Letter Sigma
&#964;   τ Greek Small Letter Tau
&#965;   υ Greek Small Letter Upsilon
&#966;   φ Greek Small Letter Phi
&#967;   χ Greek Small Letter Chi
&#968;   ψ Greek Small Letter Psi
&#969;   ω Greek Small Letter Omega
&#7682;  Ḃ Latin Capital Letter B With Dot Above
&#7683;  ḃ Latin Small Letter B With Dot Above
&#7690;  Ḋ Latin Capital Letter D With Dot Above
&#7691;  ḋ Latin Small Letter D With Dot Above
&#7696;  Ḑ Latin Capital Letter D With Cedilla
&#7697;  ḑ Latin Small Letter D With Cedilla
&#7710;  Ḟ Latin Capital Letter F With Dot Above
&#7711;  ḟ Latin Small Letter F With Dot Above
&#7728;  Ḱ Latin Capital Letter K With Acute
&#7729;  ḱ Latin Small Letter K With Acute
&#7744;  Ṁ Latin Capital Letter M With Dot Above
&#7745;  ṁ Latin Small Letter M With Dot Above
&#7766;  Ṗ Latin Capital Letter P With Dot Above
&#7767;  ṗ Latin Small Letter P With Dot Above
&#7776;  Ṡ Latin Capital Letter S With Dot Above
&#7777;  ṡ Latin Small Letter S With Dot Above
&#7786;  Ṫ Latin Capital Letter T With Dot Above
&#7787;  ṫ Latin Small Letter T With Dot Above
&#7808;  Ẁ Latin Capital Letter W With Grave
&#7809;  ẁ Latin Small Letter W With Grave
&#7810;  Ẃ Latin Capital Letter W With Acute
&#7811;  ẃ Latin Small Letter W With Acute
&#7812;  Ẅ Latin Capital Letter W With Diaeresis
&#7813;  ẅ Latin Small Letter W With Diaeresis
&#7922;  Ỳ Latin Capital Letter Y With Grave
&#7923;  ỳ Latin Small Letter Y With Grave
&#8208;  ‐ Hyphen
&#8211;  – En Dash
&#8212;  — Em Dash
&#8213;  ― Horizontal Bar              (Quotation Dash)
&#8215;  ‗ Double Low Line             (Spacing Double Underscore)
&#8216;  ‘ Left Single Quotation Mark  (Single Turned Comma Quotation Mark)
&#8217;  ’ Right Single Quotation Mark (Single Comma Quotation Mark)
&#8218;  ‚ Single Low-9 Quotation Mark (Low Single Comma Quotation Mark)
&#8219;  ‛ Single High-Reversed-9 Quotation Mark (Single Reversed Comma Quotation Mark)
&#8220;  “ Left Double Quotation Mark  (Double Turned Comma Quotation Mark)
&#8221;  ” Right Double Quotation Mark (Double Comma Quotation Mark)
&#8222;  „ Double Low-9 Quotation Mark (Low Double Comma Quotation Mark)
&#8224;  † Dagger
&#8225;  ‡ Double Dagger
&#8226;  • Bullet
&#8227;  ‣ Triangular Bullet
&#8230;  … Horizontal Ellipsis
&#8232;  
 Line Separator
&#8233;  
 Paragraph Separator
&#8240;  ‰ Per Mille Sign
&#8242;  ′ Prime
&#8243;  ″ Double Prime
&#8249;  ‹ Single Left-Pointing Angle Quotation Mark  (Left Pointing Single Guillemet)
&#8250;  › Single Right-Pointing Angle Quotation Mark (Right Pointing Single Guillemet)
&#8252;  ‼ Double Exclamation Mark
&#8254;  ‾ Overline                   (Spacing Overscore)
&#8259;  ⁃ Hyphen Bullet
&#8260;  ⁄ Fraction Slash
&#8319;  ⁿ Superscript Latin Small Letter N
&#8355;  ₣ French Franc Sign
&#8356;  ₤ Lira Sign
&#8359;  ₧ Peseta Sign
&#8453;  ℅ Care Of
&#8470;  № Numero Sign                (Numero)
&#8482;  ™ Trade Mark Sign            (Trademark)
&#8486;  Ω Ohm Sign                   (Ohm)
&#8494;  ℮ Estimated Symbol
&#8539;  ⅛ Vulgar Fraction One Eighth
&#8540;  ⅜ Vulgar Fraction Three Eighths
&#8541;  ⅝ Vulgar Fraction Five Eighths
&#8542;  ⅞ Vulgar Fraction Seven Eighths
&#8706;  ∂ Partial Differential
&#8710;  ∆ Increment
&#8719;  ∏ N-Ary Product
&#8721;  ∑ N-Ary Summation
&#8722;  − Minus Sign
&#8729;  ∙ Bullet Operator
&#8730;  √ Square Root
&#8734;  ∞ Infinity
&#8735;  ∟ Right Angle
&#8745;  ∩ Intersection
&#8747;  ∫ Integral
&#8776;  ≈ Almost Equal To
&#8800;  ≠ Not Equal To
&#8801;  ≡ Identical To
&#8804;  ≤ Less-Than Or Equal To      (Less Than Or Equal To)
&#8805;  ≥ Greater-Than Or Equal To   (Greater Than Or Equal To)
&#64257; fi Latin Small Ligature FI
&#64258; fl Latin Small Ligature FL

Thumbnail sketch of UTF-8

In UTF-8, each 16-bit Unicode character is encoded as a sequence of one, two, or three 8-bit bytes, depending on the value of the character. The following table shows the format of such UTF-8 byte sequences (where the "free bits" shown by x's in the table are combined in the order shown, and interpreted from most significant to least significant):

 Binary format of bytes in sequence:
                                        Number of    Maximum expressible
 1st byte     2nd byte    3rd byte      free bits:      Unicode value:

 0xxxxxxx                                  7           007F hex   (127)
 110xxxxx     10xxxxxx                  (5+6)=11       07FF hex  (2047)
 1110xxxx     10xxxxxx    10xxxxxx     (4+6+6)=16      FFFF hex (65535)

The value of each individual byte indicates its UTF-8 function, as follows:

 00 to 7F hex   (0 to 127):  first and only byte of a sequence.
 80 to BF hex (128 to 191):  continuing byte in a multi-byte sequence.
 C2 to DF hex (194 to 223):  first byte of a two-byte sequence.
 E0 to EF hex (224 to 239):  first byte of a three-byte sequence.
[Other byte values are either not used when encoding 16-bit Unicode characters (i.e. F0 to FD hex), or are not part of any well-formed UTF-8 sequence (i.e. C0, C1, FE, and FF hex); see the links to UTF-8 standards documents below for further details.]

Note that UTF-8 remains a simple single-byte ASCII-compatible encoding, as long as no characters above 127 are directly present. (This means that an HTML document technically declared to be encoded as UTF-8 can remain a normal single-byte ASCII/ISO-8859-1 file, even though it may contain Unicode characters above 255, as long as all characters above 127 are referred to indirectly by ampersand entities.)

Some examples of encoded Unicode characters (in hexadecimal notation):

   16-bit
   Unicode:       UTF-8 sequences:

    0001            01
    007F            7F
    0080            C2  80
    07FF            DF  BF
    0800            E0  A0  80
    FFFF            EF  BF  BF

For more information, see ftp://ftp.informatik.uni-erlangen.de/pub/doc/ISO/charsets/ISO-10646-UTF-8.html (official ISO document) or ftp://ftp.isi.edu/in-notes/rfc2279.txt -- RFC 2279: UTF-8, A Transformation Format of ISO 10646 (a more user-friendly IETF document -- though it does mostly refer to bytes by the official ISO jargon term "octets").


View this page with MIME "Content-Type: text/html;Charset=utf-8" HTTP header (increases probability that non-Latin-1 characters will be displayed correctly in Netscape 4)

- Republic of Pemberley - To our Amazon storefront page
Home | Q | Jane Info