aboutsummaryrefslogtreecommitdiffstats
path: root/doc/README.m17n
diff options
context:
space:
mode:
authorFumitoshi UKAI <ukai@debian.or.jp>2003-09-22 21:02:15 +0000
committerFumitoshi UKAI <ukai@debian.or.jp>2003-09-22 21:02:15 +0000
commit604c11affe988bab23c87598c02248fff1d73f43 (patch)
tree6252cbbfd3cf703691a8ddbf1fdee5c1246b5faa /doc/README.m17n
parent* version.c.in: cvs version (diff)
downloadw3m-604c11affe988bab23c87598c02248fff1d73f43.tar.gz
w3m-604c11affe988bab23c87598c02248fff1d73f43.zip
merge m17n patch
add libwc
Diffstat (limited to 'doc/README.m17n')
-rw-r--r--doc/README.m17n451
1 files changed, 451 insertions, 0 deletions
diff --git a/doc/README.m17n b/doc/README.m17n
new file mode 100644
index 0000000..0dd1b78
--- /dev/null
+++ b/doc/README.m17n
@@ -0,0 +1,451 @@
+
+Muntilingualizaion of w3m
+ 2003/03/08
+ H. Sakamoto
+
+Introduction
+
+ I have tried the muntilingualization of w3m (w3m-m17n).
+ The patch for w3m-0.4.1 is available on the following site.
+
+ http://www2u.biglobe.ne.jp/~hsaka/w3m/index.html#m17n
+ patch/w3m-0.4.1-m17n-20030308.tar.gz
+ patch/README.m17n
+
+ It is a development version. And enough test is not preformed because
+ I can understand Japanese only. Please use, test, and report bugs.
+
+ Now, w3m-m17n has following functions.
+
+Supported encoding schemes (character set)
+
+ * Japanese
+ EUC-JP - US_ASCII, JIS X 0208, JIS X 0201, JIS X 0212
+ (EUC-JISX0213) (JIS X 0213)
+ ISO-2022-JP - US_ASCII, JIS X 0208, JIS X 0201, JIS X 0212, etc.
+ ISO-2022-JP-2 - US_ASCII, JIS X 0208, JIS X 0201, JIS X 0212,
+ GB 2312, KS X 1001, ISO 8859-1, ISO 8859-7, etc.
+ ISO-2022-JP-3 - US_ASCII, JIS X 0208, JIS X 0201, JIS X 0213, etc.
+ Shift_JIS(CP932) - US_ASCII, JIS X 0208, JIS X 0201, CP932 extension
+ Shift_JISX0213 - US_ASCII, JIS X 0208, JIS X 0201, JIS X 0213
+ * Chinese (simplified)
+ EUC-CN(GB2312) - US_ASCII, GB 2312
+ ISO-2022-CN - US_ASCII, GB 2312, CNS-11643-1,..7, etc.
+ GBK(CP936) - US_ASCII, GB 2312, GBK
+ GB18030 - US_ASCII, GB 2312, GBK, GB18030, Unicode,
+ HZ-GB-2312 - US_ASCII, GB 2312
+ * Chinese (Taiwan, tradisional)
+ EUC-TW - US_ASCII, CNS 11643-1,..16
+ ISO-2022-CN - US_ASCII, CNS-11643-1,..7, GB 2312, etc.
+ Big5 - Big5
+ HKSCS - Big5, HKSCS
+ * Korean
+ EUC-KR - US_ASCII, KS X 1001 Wansung
+ ISO-2022-KR - US_ASCII, KS X 1001 Wansung, etc.
+ Johab - US_ASCII, KS X 1001 Johab
+ UHC(CP949) - US_ASCII, KS X 1001 Wansung, UHC
+ * Vietnamese
+ TCVN-5712 VN-1, VISCII 1.1, VPS, CP1258
+ * Thai
+ TIS-620 (ISO-8859-11), CP874
+ * Other
+ US_ASCII, ISO-8859-1 กม 10, 13 กม 15,
+ KOI8-R, KOI8-U, NeXT, CP437, CP737, CP775, CP850, CP852, CP855, CP856,
+ CP857, CP860, CP861, CP862, CP863, CP864, CP865, CP866, CP869, CP1006,
+ CP1250, CP1251, CP1252, CP1253, CP1254, CP1255, CP1256, CP1257
+ * Unicode (UCS-4)
+ UTF-8, UTF-7
+
+ NOTE:
+ * The left part of JIS X 0201 and GB 1988 (Chinese ASCII) are
+ treated as US_ASCII because they are used in tags of HTML document.
+ Another variant of US_ASCII is treated without change.
+ * JIS C 6226(old JIS) is treated as JIS X 0208.
+ * The sequence '~\n' of HZ is not supported.
+
+Display
+
+ There are two method for multilingual diplay.
+
+ (1) kterm + ISO-2022-JP/CN/KR
+
+ * kterm can handle JIS X 0213, CNS 11643, if the following patch
+ is applied.
+ http://www.st.rim.or.jp/~hanataka/kterm-6.2.0.ext02.patch.gz
+
+ * Specify the fontList for kterm with -fl option or in ~/.Xdefaults.
+
+ -fl "*--16-*-jisx0213.2000-*,\
+ *--16-*-jisx0212.1990-0,\
+ *--16-*-ksc5601.1987-0,\
+ *--16-*-gb2312.1980-0,\
+ *--16-*-cns11643.1992-*,\
+ *--16-*-iso8859-*"
+
+ Fonts of JIS X 0213 exist in
+ http://www.mars.sphere.ne.jp/imamura/jisx0213.html
+
+ * Set the "display_charset" to ISO-2022-JP(or ISO-2022-JP-2, KR, CN),
+ and "strict_iso2022" to OFF on the option pannel. (see below)
+
+ (2) xterm + UTF-8
+
+ * Use xterm (xterm-140 or later) of XFree86.
+ http://www.clark.net/pub/dickey/xterm/xterm.html
+
+ * Fonts of Unicode exist in
+ http://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html
+ http://openlab.ring.gr.jp/efont/index.html.en
+
+ * Use xterm with -u8 option.
+ The fonts are specified such as
+ -fn "*-medium-*--13-*-iso10646-1" \
+ -fb "*-bold-*--13-*-iso10646-1" \
+ -fw "*-medium-*-ja-13-*-iso10646-1"
+
+ * Set the "display_charset" to UTF-8.
+ And, it is better that "pre_conv" is ON.
+
+ (3) mlterm + ISO-2022-JP/KR/CN
+
+ * Homepage
+ http://mlterm.sourceforge.net/
+
+ * Set encoding of mlterm to ISO-2022-JP/KR/CN or UTF-8.
+
+ * Set the "display_charset" to ISO-2022-JP/KR/CN or UTF-8.
+
+Command line options
+
+ -I <document charset>
+ -O <display/output charset>
+
+ j(p): ISO-2022-JP
+ j(p)2: ISO-2022-JP-2
+ j(p)3: ISO-2022-JP-3
+ cn: ISO-2022-CN
+ kr: ISO-2022-KR
+ e(j): EUC-JP
+ ec,g(b): EUC-CN(GB2312)
+ et: EUC-TW
+ ek: EUC-KR
+ s(jis): Shift_JIS
+ sjisx0213: Shift_JISX0213
+ gbk: GBK
+ gb18030: GB18030
+ h(z): HZ-GB-2312
+ b(ig5): Big5
+ hk(scs): HKSCS
+ jo(hab): Johab
+ uhc: UHC
+ l?: ISO-8859-?
+ t(is): TIS-620(ISO-8859-11)
+ tc(vn): TCVN-5712 VN-1
+ v(iscii): VISCII 1.1
+ vp(s): VPS
+ ko(i8r): KOI8-R
+ koi8u: KOI8-U
+ n(ext): NeXT
+ cp???: CP???
+ w12??: CP12??
+ u(tf8): UTF-8
+ u(tf)7: UTF-7
+
+Option pannel
+
+ display_charset
+ Display charset.
+ document_charset
+ Defalut Document charset.
+ auto_detect
+ Automatic charset detect when loading. (Default: ON)
+ system_charset
+ System charset. It is used for configuration files and file name.
+ follow_locale
+ System charset follows locale($LANG). (Default: ON)
+ ext_halfdump
+ Output with display charset when -halfdump.
+ search_conv
+ Adjust search string for document charset. (Default: ON)
+ use_wide
+ Use multi column characters. (Default: ON)
+ use_combining
+ Use combining characters. (Default: ON)
+ use_language_tag
+ Use Unicode language tags. (Default: ON)
+ ucs_conv
+ Charset conversion using Unicode map. (Default: ON)
+ pre_conv
+ Charset conversion when loading. (Default: OFF)
+ fix_width
+ Fix character width when conversion. (Default: ON)
+ If it is OFF, the rendering may collapse.
+ use_gb12345_map
+ Use GB 12345 Unicode map instead of GB 2312's. (Default: OFF)
+ If it is ON, GB2312 can be converted to Big5, EUC-TW, or EUC-JP.
+ use_jisx0201
+ Use JIS X 0201 Roman for ISO-2022-JP. (Default: OFF)
+ use_jisc6226
+ Use JIS C 6226:1978 for ISO-2022-JP. (Default: OFF)
+ use_jisx0201k
+ Use JIS X 0201 Katakana. (Default: OFF)
+ use_jisx0212
+ Use JIS X 0212:1990 (Supplemental Kanji). (Default: OFF)
+ use_jisx0213
+ Use JIS X 0213:2000 (2000JIS). (Default: OFF)
+ strict_iso2022
+ Strict ISO-2022-JP/KR/CN. (Default: ON)
+ If it is OFF, all ISO 2022 base character set can be displayed
+ with ISO-2022-JP/KR/CN.
+
+ alt_entity
+ Use alternate expression with ASCII for entities. (Default: ON)
+ If it is OFF, entities are treated as ISO 8859-1
+ graphic_char
+ Use graphic char for border of table and menu.
+ If it is OFF, ruled line is used with CJK charset or UTF-8.
+
+Code conversion
+
+ The following special code conversions are supported.
+ * EUC-JP <-> ISO-2022-JP <-> Shift-JIS
+ * EUC-CN <-> ISO-2022-CN <-> HZ-GB-2312
+ * EUC-TW <-> ISO-2022-CN
+ * EUC-KR <-> ISO-2022-KR <-> Johab (only Symbol and Hanja)
+
+ Other conversions are based on Unicode.
+
+Change document charset
+
+ Press '=' (show document infomation), and select document charaset.
+
+ If you specify the following keymaps,
+ keymap C CHARSET
+ keymap M-c DEFAULT_CHARSET
+ you can press `C' to change the current document charset,
+ and `M-c' to change the default document charset.
+
+Line Editing
+
+ Input coding system is followed by display coding system.
+
+ NOTE:
+ * HZ can not be used as input coding system.
+ * Input with ISO-2022-CN or ISO-2022-KR is perhaps failure, because
+ SI(\017) and SO(\016) are already assigned as other command key.
+ (SO is assigned as `next-history'). If you want to use SI and SO,
+ press C-@(^@). After that, SI, SO, SS2, SS3, LS2, and LS3 of
+ 7bit ISO-2022 are recognited. When you press C-@ again, the default
+ binding is set.
+
+Regular expression
+
+ Multilingual regular expression is supported.
+
+-----------------------------------
+Change log
+
+2003/03/08 w3m-0.4.1-m17n-20030308
+ * Base on w3m-0.4.1
+
+2003/02/24 w3m-0.4-m17n-20030224
+ * Base on w3m-0.4
+
+2003/02/11 w3m-0.4rc1-m17n-20030211
+ * Base on w3m-0.4rc1
+
+2003/02/07 w3m-0.3.2.2-m17n-20030207
+ * Base on w3m-0.3.2.2+cvs-1.742
+
+2003/02/01 w3m-0.3.2.2-m17n-20030201
+ * Base on w3m-0.3.2.2+cvs-1.734
+
+2003/01/31 w3m-0.3.2.2-m17n-20030131
+ * Base on w3m-0.3.2.2+cvs-1.732
+
+2003/01/23 w3m-0.3.2.2-m17n-20030123
+ * Base on w3m-0.3.2.2+cvs-1.705
+
+2003/01/22 w3m-0.3.2.2-m17n-20030122
+ * Base on w3m-0.3.2.2+cvs-1.699
+
+2003/01/01 w3m-0.3.2.2-m17n-20030101
+ * Base on w3m-0.3.2.2+cvs-1.655
+
+2002/12/22 w3m-0.3.2.2-m17n-20021222
+ * Base on w3m-0.3.2.2+cvs-1.640
+
+2002/12/19 w3m-0.3.2.2-m17n-20021219
+ * Base on w3m-0.3.2.2+cvs-1.635
+
+2002/12/07 w3m-0.3.2.2-m17n-20021207
+ * Base on w3m-0.3.2.2+cvs-1.599
+ * Fixed a problem on int != long system
+
+2002/11/27 w3m-0.3.2.1-m17n-20021127
+ * Base on w3m-0.3.2.1+cvs-1.562
+
+2002/11/20 w3m-0.3.2-m17n-20021120
+ * Base on w3m-0.3.2+cvs-1.538
+
+2002/11/18
+ * Added UTF-7 to auto detection of charset.
+
+2002/11/16 w3m-0.3.2-m17n-20021116
+ * Base on w3m-0.3.2+cvs-1.526
+
+2002/11/13 w3m-0.3.2-m17n-20021113
+ * Base on w3m-0.3.2+cvs-1.506
+
+2002/11/12 w3m-0.3.2-m17n-20021112
+ * Base on w3m-0.3.2+cvs-1.498
+
+2002/11/09 w3m-0.3.2-m17n-20021109
+ * Base on w3m-0.3.2+cvs-1.490
+
+2002/11/07 w3m-0.3.2-m17n-20021107
+ * Base on w3m-0.3.2
+ * Applied [w3m-dev 03371]
+
+2002/10/22 w3m-0.3.1-m17n-20021022
+ * Base on w3m-0.3.1+cvs-1.444
+
+2002/07/17 w3m-0.3.1-m17n-20020717
+ * Base on w3m-0.3.1
+
+2002/05/29 w3m-0.3-m17n-20020529
+ * Base on w3m-0.3+cvs-1.379.
+
+2002/03/16 w3m-0.3-m17n-20020316
+ * Base on w3m-0.3+cvs-1.353.
+
+2002/03/11 w3m-0.3-m17n-20020311
+ * Base on w3m-0.3+cvs-1.342.
+ * Some bug fixes.
+
+2002/02/16 w3m-0.2.5-m17n-20020216
+ * Base on w3m-0.2.5+cvs-1.319.
+ * Added an option "use_wide"
+
+2002/02/05 w3m-0.2.5-m17n-20020205
+ * Base on w3m-0.2.5+cvs-1.302.
+
+2002/02/02 w3m-0.2.5-m17n-20020202
+ * Base on w3m-0.2.5+cvs-1.291.
+
+2002/01/31 w3m-0.2.4-m17n-20020131
+ * Base on w3m-0.2.4+cvs-1.278.
+
+2002/01/29 w3m-0.2.4-m17n-20020129
+ * Base on w3m-0.2.4+cvs-1.268.
+ * Some bug fixes.
+
+2002/01/28 w3m-0.2.4-m17n-20020128
+ * Base on w3m-0.2.4+cvs-1.265.
+
+2002/01/08 w3m-0.2.4-m17n-20020108
+ * Base on w3m-0.2.4.
+
+2002/01/07
+ * Replaced some wc_conv,wc_Str_conv with wc_conv_strict,wc_Str_conv_strict.
+
+2001/12/31
+ * Added the conversion between HKSCS and Unicode.
+ * Changed the conversion table between Big5 and Unicode.
+ * Deleted the special conversion between Big5 and CNS11643.
+ * Fixed HKSCS.
+
+2001/12/30 w3m-0.2.3.2-m17n-20011230
+ * Base on w3m-0.2.3.2+cvs-1.196.
+
+2001/12/22 w3m-0.2.3.2-m17n-20011222
+ * Base on w3m-0.2.3.2.
+ * [w3m-dev-en 00660] can't compile if INET6 is defined
+ * [w3m-dev-en 00663] double meanings for WC_N_???
+
+2001/12/21 w3m-0.2.3.1-m17n-20011221
+ * Base on w3m-0.2.3.1.
+ * Support of HKSCS, KOI8-U, UTF-7.
+ The conversion table between HKSCS and Unicode is not yet available.
+ * Add the conversion between ISO 8859-16 and Unicode.
+ * Add option 'ext_halfdump'.
+
+2001/04/14 w3m-(0.2.1)-m17n-0.20
+ * Support of UTF-7.
+ * [w3m-dev 01913] ([w3m-dev-en 00452])
+
+2001/04/12 w3m-(0.2.1)-m17n-0.19
+ * TILDE of JISX0212, JISX0213 -> FULLWIDTH TILDE of Unicode.
+ * MICRO SIGN of Unicode -> GREEK SMALL MU of JISX0208.
+ * [w3m-dev 01892], [w3m-dev 01894], [w3m-dev 01898], [w3m-dev 01902]
+
+2001/03/31
+ * Changed implement of <_SYMBOL> again.
+ * When -dump option, "pre_conv" is false as default.
+
+2001/03/29
+ * Support combining characters of TCVN 5712.
+ * [w3m-dev 01873], [w3m-dev-en 00411].
+
+2001/03/28
+ * Setting -suffix="" can be okay in confiugre. (thanks to naddy!)
+ * Bugfix: when #define USE_SSL and #undef USE_SSL_VERIFY, rc.c
+ doesn't compile. (thanks to naddy!)
+ * [w3m-dev 01859].
+ * Bugfix: 0xA0 is error in Shift-JIS.
+ * Changed implement of <_SYMBOL> ([w3m-dev 01852]).
+
+2001/03/24 w3m-(0.2.1)-m17n-0.18
+ * Base on w3m-0.2.1.
+ * [w3m-dev 01703], [w3m-dev 01814], [w3m-dev 01823]
+ * Separated ISO-2022-JP-3 from ISO-2022-JP.
+ * Improved auto detection.
+
+2001/03/23
+ * Base on w3m-0.2.0.
+
+2001/03/21
+ * Added functions (CHARSET and DEFAULT_CHARSET).
+ * Improved document charset detection of frame HTML.
+
+2001/03/20
+ * Conversion from FULL WIDTH variant except ASCII to normal character.
+
+2001/03/18 w3m-(0.1.11-pre-hsaka24)-m17n-0.17
+ * Based on "[w3m-dev 01779] w3m-0.1.11-pre-hsaka24".
+ * Prefer JIS X 0213 than JIS X 0212.
+
+2001/03/14 w3m-(0.1.11-pre-kokb23)-m17n-0.16
+ * Add the conversion between JIS X 0213 and Unicode Extention B.
+ * Bugfix: conversion between JIS X 0213 and Unicode.
+ * Bugfix: treat UHC as Hangul.
+ * Ignore "search_conv" if "pre_conv" is ON.
+
+2001/03/09 w3m-(0.1.11-pre-kokb23)-m17n-0.15
+ * Improvement of wc_wchar_t (mainly for Unicode).
+ * Some bugfixes for Unicode.
+ * Ignore "use_gb12345_map" option when output with GBK or GB18030.
+ * When -dump option, "prev_conv" is always true.
+ * when -dump or -halfdump option, some proccessing is skiped.
+ * Get system charset from the environment variable LC_CTYPE -> LANG -> LC_ALL.
+ * Bugfixes: [w3m-dev 01724], [w3m-dev 01726], [w3m-dev 01752],
+ [w3m-dev 01753], [w3m-dev 01754]
+
+2001/03/06 w3m-(0.1.11-pre-kokb23)-m17n-0.14
+ * Support of Language tag (UTR#7).
+ * Bugfix: conversion between GB18030, Johab and Unicode.
+
+2001/03/04 w3m-(0.1.11-pre-kokb23)-m17n-0.13
+ * Support of GBK(CP936), GB18030, UHC(CP949) !
+ * Unicode mapping table of GB2312 and GB12345 became compatible with
+ CP936, GB18030. (Code point: 0xA1A4, 0xA1AA)
+ * Allow 0xFFFE and 0xFFFF in Uncide (due to compatibility with GB18030).
+ * Bugfix: code point of NBSP in Unicode.
+
+2001/03/03 w3m-(0.1.11-pre-kokb23)-m17n-0.12
+ * I wrote English README.m17n.
+
+-------------------------------------------
+Hironori Sakamoto <hsaka@mth.biglobe.ne.jp>
+ http://www2u.biglobe.ne.jp/~hsaka/
+