kitchin
07-05-2005, 03:36 PM
I'm working with a Perl script that replaces all quotes, single and double, with character #146 before putting the data into a database (CVS or MySQL). Is this weird or normal?
I did some research on character #146, which looks like a right-slanted quote in Windows. Lets see how this displays in vBulletin.
Character 0x92, #146 in various character sets:
ASCII
Undefined.
ISO 8859-1, ISO 8859-2, etc. (note, one hyphen in name)
Undefined.
ISO-8859-1 (note, two hyphens), code page 819
PRIVATE USE TWO (PU2).
Code Page 437, CP437 (IMB PC, DOS)
Equivilent to Unicode U+00C6, #0198, and entity 'AElig'.
& #x00C6; [Æ]
& #0198; [Æ]
& AElig; [Æ]
MacRoman
Equivilent to Unicode U+00ED, #0237, and enitity 'iacute'.
& #x00ED; [í]
& #0237; [í]
& iacute; [í]
Windows-1251, Code Page 1252, "ANSI Character Set"
Equivilent to Unicode U+2019, #2019, and entity 'rsquot'.
& #x2019; [’]
& #8217; [’]
& rsquot; [’]
Unicode
U+0146 is undefined.
What about the UTF-8 encoding of Unicode?
Well, UTF-8 is robust of course: no UTF-8 code starts with byte #146.
Here's a what one Wikipedia article (http://en.wikipedia.org/wiki/Table_of_Unicode_characters%2C_128_to_999) says, which sounds sensible:Even if characters in the range 128 to 159 display something sensible on your browser, they cannot be relied upon to display the same thing — or anything at all — on any other browser, so they should never be used.
So I'm thinking of changing the Perl to code to do this: store #146 in the database, but display it as quote, single or & quot;. I don't want to rewrite the whole script.
I did some research on character #146, which looks like a right-slanted quote in Windows. Lets see how this displays in vBulletin.
Character 0x92, #146 in various character sets:
ASCII
Undefined.
ISO 8859-1, ISO 8859-2, etc. (note, one hyphen in name)
Undefined.
ISO-8859-1 (note, two hyphens), code page 819
PRIVATE USE TWO (PU2).
Code Page 437, CP437 (IMB PC, DOS)
Equivilent to Unicode U+00C6, #0198, and entity 'AElig'.
& #x00C6; [Æ]
& #0198; [Æ]
& AElig; [Æ]
MacRoman
Equivilent to Unicode U+00ED, #0237, and enitity 'iacute'.
& #x00ED; [í]
& #0237; [í]
& iacute; [í]
Windows-1251, Code Page 1252, "ANSI Character Set"
Equivilent to Unicode U+2019, #2019, and entity 'rsquot'.
& #x2019; [’]
& #8217; [’]
& rsquot; [’]
Unicode
U+0146 is undefined.
What about the UTF-8 encoding of Unicode?
Well, UTF-8 is robust of course: no UTF-8 code starts with byte #146.
Here's a what one Wikipedia article (http://en.wikipedia.org/wiki/Table_of_Unicode_characters%2C_128_to_999) says, which sounds sensible:Even if characters in the range 128 to 159 display something sensible on your browser, they cannot be relied upon to display the same thing — or anything at all — on any other browser, so they should never be used.
So I'm thinking of changing the Perl to code to do this: store #146 in the database, but display it as quote, single or & quot;. I don't want to rewrite the whole script.