For those people like me who write in vanilla HTML, here is a collection of language tools for inspecting and representing non-ASCII characters as numeric XML entities. These entities can be used in any HTML or XML context to ensure that underlying file encoding is never an issue. This produces foolproof data for languages other than English, avoiding the possible vagaries of third-party software.

Contents

Chinese Character Entry
Pinyin Vowel Entry
Japanese Syllabary Entry
Greek Alphabet Entry
Russian Alphabet Entry
General Encodings


Chinese Character Entry

By far the fastest method for locating Chinese characters is by phonetic groups rather that traditional radicals. The table from that presentation is modified for use here. Click on a character to transfer to clipboard:

  



Keyboard input:  

Clipboard contents:  
 

Pinyin Vowel Entry

While theses vowels with diacritics are named HTML entities, use of the latter leads to errors in XML. Decimal values are thus used for accented vowels in pinyin for conciseness. Click on a vowel to transfer to clipboard:


Clipboard contents:  
 

Most of these vowels with diacritics are also used in major European languages.


Japanese Syllabary Entry

Decimal values in underlying code are converted to hex output for conciseness. Click on a character to transfer to clipboard:

Keyboard input:  

Clipboard contents:  
 

Greek Alphabet Entry

While Greek letters are named HTML entities, use of the latter leads to errors in XML. Decimal values are again used for conciseness. Click on a letter to transfer to clipboard or begin typing:

     

Keyboard input:  

Clipboard contents:  
 

For typing input the following nonobvious translations are used: c for ς, j/J for ψ/Ψ, q/Q for θ/Θ and w/W for ω/Ω. The following English letters are not used: C and v/V.


Russian Alphabet Entry

Decimal values are once more used for conciseness. Click on a letter to transfer to clipboard or begin typing:


Keyboard input:  

Clipboard contents:  
 

For typing input the following nonobvious translations are used: c/C for ц/Ц, h/H for эЭ, j/J for ж/Ж, q/Q for щ/Щ, w/W for ш/Ш and y/Y for й/Й. The following pairs are not currently encoded for typing input: ч/Ч, ъ/Ъ, ы/Ы, ь/Ь, э/Э, ю/Ю and я/Я.


General Encodings

For general ideogram input hex values are used by default. Click on a character to transfer to clipboard:

First in decimal or hex


Keyboard input:  

Clipboard contents:  
 

Interesting ideogram ranges:

Alchemical symbols
Braille notation
Egyptian hieroglyphs
Face emoji
Mahjong tiles
Musical notation

Uploaded 2026.01.25 analyticphysics.com