Character Encoding


 * Table

convmv

 * file name encoding converter
 * man page

iconv

 * command and library to convert character encoding
 * library
 * windows libiconv
 * japanese man page
 * man page
 * has problem with iconv's 2nd argument type. some API has const but in standard non-const. so needs C++ template technique to remove error.

Locale

 * UTF-8 setting
 * linux locale

Literal

 * GCC
 * msvc utf-8 literal
 * Most of the time use ASCII only in source code.
 * Use UTF-8 as possible.

ASCII

 * Most basic character encoding.
 * Most of the modern character encoding is based on this.
 * 0x00 to 0x7F

JIS

 * Shift_JIS table
 * Japanese character encoding table
 * Shift_JIS and Ku,Ten
 * English wikipedia article
 * has JIS X 0208 to Shift_JIS converting formula
 * Calculation ten and ku
 * overview
 * JIS X 0208
 * [0x20 + ku, 0x20 + ten]
 * kanji encoding
 * JIS X 0201
 * ASCII extension with *kana* and japanese symbols
 * Shift_JIS
 * encoding that can allow JIS X 0208 and JIS X 0201
 * was popular until UTF-8 gain status

Latin

 * ISO/IEC 8859-1(usually "Latin-1")
 * codepage 1252
 * Shinonome font has the support.
 * Accent in ASCII
 * unix FAQ

Windows

 * code page table
 * If "_UNICODE" or "UNICODE" isn't defined local code page is used in string passing.
 * unicode api has "W" suffix
 * code page api has "A" suffix
 * to support either of API use 
 * mingw main function
 * Windows macro
 * No wchar_t in Windows

Unicode

 * concept
 * Normalization
 * use to compare unicode string
 * has 4 type
 * utf8proc
 * C library for utf-8 normalization.
 * supports all 4 normalization
 * returned string is allocated with malloc so needs to be deallocated with free
 * has ruby, posgresql binding

wchar_t

 * [http://www.firstobject.com/wchar_t-string-on-linux-osx-windows.htm

UTF-8

 * encoding to use unicode in "const char*" or std::string.
 * newer encoding but well used.
 * needs little complex decode but is easier to use in conventional string system.
 * no duplication with ASCII
 * widely used to exchange string.
 * screen utf-8 settings
 * using UTF-8 in windows
 * unix utf-8 filename
 * python utf-8 filename

UTF-16

 * Mostly used in Windows.
 * needs to decode to use as Unicode so use UTF-32 as possible.

UTF-32

 * uint32_t array of unicode.
 * good for internal use but not good for exchanging.
 * endian matters.

Unicode Iterator

 * Used to convert unicode encoding.
 * Most of the time it needs to be converted to UTF-32 first.
 * Older API(before 1.48.0) doesn't have range check so use the header that has range check.
 * test code
 * source code