PDA

View Full Version : Internet Character Encoding?



Nathan Scott
25th September 2003, 22:05
Hi all,

I've got a question about what the most "universal" type of Japanese character encoding is for web pages. There are a number of browser options, which include:

Shift-JIS
EUC-JP
ISO-2022-JP
Unicode (UTF-8)
Unicode (UTF-7)
Japanese (Auto-Select)

You can also set (in Netscape) an auto-detect for Japanese, which seems to help.

I assume that Unicode refers to a "universal coding" that your browser will detect automatically and correctly (if you choose the wrong encoding, you may get the wrong kanji). This seems like it may be ideal, but how do you write in "Unicode"?

I use KanjiKit 97 to enter kanji, and can only figure out how to enter it in Shift-JIS coding, which seems to be a pretty safe setting for browsing Japanese in general.

To clarify, I'm not having too much trouble viewing Japanese pages, but I'm trying to enter kanji on some of my own web pages that I would like to pop up correctly automatically. As an example, on the following page I'm re-working, I simply stated at the top how to best view the page:

http://www.tsuki-kage.com/shugyo.html

Any suggestions on what the best way is to view and input kanji to internet applications (aside from .jpg)?

Any suggestions appreciated,

hyaku
25th September 2003, 23:32
Usually Shift-JIS

I had problems actually writing pages that were detectable on the net at first.
Shift-JIS did the trick.

Hyakutake Colin

renfield_kuroda
26th September 2003, 00:41
Without getting to horribly detailed, suffice to say there is no one correct way to display Japanese on the web. The de facto standard is Shift-JIS, which is unfortunate because technically it's really crappy but that's what Microsoft invented and uses so it's everywhere. My personal favorite from a technical point of view is JIS (ISO-2002-JP) -- nicely defined standard, very technically elegant, easiest to auto-detect and correct...
Anyway the most important thing is to set the content-encoding correctly in the page, which tells the browser exactly what the page contents are. Otherwise, the browser has to guess, and sometimes guesses badly.
To set the content encoding for Shift-JIS:


<head>
<meta http-equiv="Content-Type" content="text/html; charset=shift-jis">
<title>I am in Shift JIS!</title>
</head>


Hope this helps.
Regards,

r e n

Nathan Scott
26th September 2003, 07:40
Thanks very much guys. That's pretty much the information I was looking for.

Yoroshiku,

Nathan Scott
26th September 2003, 07:45
Actually, one last question.

Does changing the doc type from English:

!doctype html public "-//w3c//dtd html 4.0 transitional//en"

To Japanese:

!doctype html public "-//w3c//dtd html 4.01 transitional//ja"

[< and > removed since the code kept dissappearing]

Make much difference in regards to defaulting a visitor's browser correctly?

Regards,

renfield_kuroda
28th September 2003, 23:58
No. Doctype declarations are for xml, and mostly ignored by browsers. To make sure the browser groks the page encoding correctly, set the meta tag (and if you have control, make sure the server sends the http response appropriately as well -- but in most cases you have no control over the server.)

Regards,

r e n