1、编码转换(to Unicode)
(程序代码来源于网络)
Js版
<script>
test = "你好abc"
str = ""
for( i=0; i<test.length; i++ )
{
temp = test.charCodeAt(i).toString(16);
str += "\\u"+ new Array(5-String(temp).length).join("0") +temp;
}
document.write (str)
</script>
vbs版
Function Unicode(str1)
Dim str,temp
str = ""
For i=1 to len(str1)
temp = Hex(AscW(Mid(str1,i,1)))
If len(temp) < 5 Then temp = right("0000" & temp, 4)
str = str & "\u" & temp
Next
Unicode = str
End Function
Function htmlentities(str)
For i = 1 to Len(str)
char = mid(str, i, 1)
If Ascw(char) > 128 then
htmlentities = htmlentities & "&#" & Ascw(char) & ";"
Else
htmlentities = htmlentities & char
End if
Next
End Function
coldfusion版
function nochaoscode(str)
{
var new_str = “”;
for(i=1; i lte len(str);i=i+1){
if(asc(mid(str,i,1)) lt 128){
new_str = new_str & mid(str,i,1);
}else{
new_str = new_str & “&##” & asc(mid(str,i,1));
}
}
return new_str;
}
附:
在php中我们可以用mbstring的mb_convert_encoding函数实现这个正向及反向的转化。 如:
mb_convert_encoding ("你好", "HTML-ENTITIES", "gb2312"); //输出:你好
mb_convert_encoding ("你好", "gb2312", "HTML-ENTITIES"); //输出:你好
如果需要对整个页面转化,则只需要在php文件的头部加上这三行代码:
mb_internal_encoding("gb2312"); // 这里的gb2312是你网站原来的编码
mb_http_output("HTML-ENTITIES");
ob_start('mb_output_handler');
如果没有打开mbstring扩展,可以参考coolcode.cn上的这两篇文章: 在任意字符集下正常显示网页的方法 在任意字符集下正常显示网页的方法(续)
2、HTML实体
HTML 4.01 支持 ISO 8859-1 (Latin-1) 字符集。
提示 实体名是区分大小写的。
备注 同一个符号,可以用“实体名称”和“实体编号”两种方式引用,“实体名称”的优势在于便于记忆,但不能保证所有的浏览器都能顺利识别它,而“实体编号”则没有这种担忧,但它实在不方便记忆。
ASCII中部分实体的新名字
显示 |
描述 |
实体名称 |
实体编号 |
" |
quotation mark |
" |
" |
' |
apostrophe |
' (IE下无效) |
' |
& |
ampersand |
& |
& |
< |
less-than |
< |
< |
> |
greater-than |
> |
> |
ISO 8859-1 符号实体
显示 |
描述 |
实体名称 |
实体编号 |
|
non-breaking space |
|
  |
¡ |
inverted exclamation mark |
¡ |
¡ |
¤ |
currency |
¤ |
¤ |
¢ |
cent |
¢ |
¢ |
£ |
pound |
£ |
£ |
¥ |
yen |
¥ |
¥ |
¦ |
broken vertical bar |
¦ |
¦ |
§ |
section |
§ |
§ |
¨ |
spacing diaeresis |
¨ |
¨ |
© |
copyright |
© |
© |
a |
feminine ordinal indicator |
ª |
ª |
« |
angle quotation mark (left) |
« |
« |
? |
negation |
¬ |
¬ |
- |
soft hyphen |
­ |
­ |
® |
registered trademark |
® |
® |
™ |
trademark |
™ |
™ |
ˉ |
spacing macron |
¯ |
¯ |
° |
degree |
° |
° |
± |
plus-or-minus |
± |
± |
2 |
superscript 2 |
² |
² |
3 |
superscript 3 |
³ |
³ |
′ |
spacing acute |
´ |
´ |
μ |
micro |
µ |
µ |
? |
paragraph |
¶ |
¶ |
· |
middle dot |
· |
· |
? |
spacing cedilla |
¸ |
¸ |
1 |
superscript 1 |
¹ |
¹ |
o |
masculine ordinal indicator |
º |
º |
» |
angle quotation mark (right) |
» |
» |
? |
fraction 1/4 |
¼ |
¼ |
? |
fraction 1/2 |
½ |
½ |
? |
fraction 3/4 |
¾ |
¾ |
? |
inverted question mark |
¿ |
¿ |
× |
multiplication |
× |
× |
÷ |
division |
÷ |
÷ |
ISO 8859-1 字符实体
显示 |
描述 |
实体名称 |
实体编号 |
À |
capital a, grave accent |
À |
À |
Á |
capital a, acute accent |
Á |
Á |
 |
capital a, circumflex accent |
 |
 |
à |
capital a, tilde |
à |
à |
Ä |
capital a, umlaut mark |
Ä |
Ä |
Å |
capital a, ring |
Å |
Å |
Æ |
capital ae |
Æ |
Æ |
Ç |
capital c, cedilla |
Ç |
Ç |
È |
capital e, grave accent |
È |
È |
É |
capital e, acute accent |
É |
É |
Ê |
capital e, circumflex accent |
Ê |
Ê |
Ë |
capital e, umlaut mark |
Ë |
Ë |
Ì |
capital i, grave accent |
Ì |
Ì |
Í |
capital i, acute accent |
Í |
Í |
Î |
capital i, circumflex accent |
Î |
Î |
Ï |
capital i, umlaut mark |
Ï |
Ï |
Ð |
capital eth, Icelandic |
Ð |
Ð |
Ñ |
capital n, tilde |
Ñ |
Ñ |
Ò |
capital o, grave accent |
Ò |
Ò |
Ó |
capital o, acute accent |
Ó |
Ó |
Ô |
capital o, circumflex accent |
Ô |
Ô |
Õ |
capital o, tilde |
Õ |
Õ |
Ö |
capital o, umlaut mark |
Ö |
Ö |
Ø |
capital o, slash |
Ø |
Ø |
ù |
capital u, grave accent |
Ù |
Ù |
ú |
capital u, acute accent |
Ú |
Ú |
? |
capital u, circumflex accent |
Û |
Û |
ü |
capital u, umlaut mark |
Ü |
Ü |
Y |
capital y, acute accent |
Ý |
Ý |
T |
capital THORN, Icelandic |
Þ |
Þ |
? |
small sharp s, German |
ß |
ß |
à |
small a, grave accent |
à |
à |
á |
small a, acute accent |
á |
á |
a |
small a, circumflex accent |
â |
â |
? |
small a, tilde |
ã |
ã |
? |
small a, umlaut mark |
ä |
ä |
? |
small a, ring |
å |
å |
? |
small ae |
æ |
æ |
? |
small c, cedilla |
ç |
ç |
è |
small e, grave accent |
è |
è |
é |
small e, acute accent |
é |
é |
ê |
small e, circumflex accent |
ê |
ê |
? |
small e, umlaut mark |
ë |
ë |
ì |
small i, grave accent |
ì |
ì |
í |
small i, acute accent |
í |
í |
? |
small i, circumflex accent |
î |
î |
? |
small i, umlaut mark |
ï |
ï |
e |
small eth, Icelandic |
ð |
ð |
? |
small n, tilde |
ñ |
ñ |
ò |
small o, grave accent |
ò |
ò |
ó |
small o, acute accent |
ó |
ó |
? |
small o, circumflex accent |
ô |
ô |
? |
small o, tilde |
õ |
õ |
? |
small o, umlaut mark |
ö |
ö |
? |
small o, slash |
ø |
ø |
ù |
small u, grave accent |
ù |
ù |
ú |
small u, acute accent |
ú |
ú |
? |
small u, circumflex accent |
û |
û |
ü |
small u, umlaut mark |
ü |
ü |
y |
small y, acute accent |
ý |
ý |
t |
small thorn, Icelandic |
þ |
þ |
? |
small y, umlaut mark |
ÿ |
ÿ |
其它一些 HTML 所支持的实体
显示 |
描述 |
实体名称 |
实体编号 |
Œ |
capital ligature OE |
Œ |
Œ |
œ |
small ligature oe |
œ |
œ |
Š |
capital S with caron |
Š |
Š |
š |
small S with caron |
š |
š |
Ÿ |
capital Y with diaeres |
Ÿ |
Ÿ |
ˆ |
modifier letter circumflex accent |
ˆ |
ˆ |
˜ |
small tilde |
˜ |
˜ |
|
en space |
  |
  |
|
em space |
  |
  |
|
thin space |
  |
  |
|
zero width non-joiner |
‌ |
‌ |
|
zero width joiner |
‍ |
‍ |
|
left-to-right mark |
‎ |
‎ |
|
right-to-left mark |
‏ |
‏ |
– |
en dash |
– |
– |
— |
em dash |
— |
— |
‘ |
left single quotation mark |
‘ |
‘ |
’ |
right single quotation mark |
’ |
’ |
‚ |
single low-9 quotation mark |
‚ |
‚ |
“ |
left double quotation mark |
“ |
“ |
” |
right double quotation mark |
” |
” |
„ |
double low-9 quotation mark |
„ |
„ |
† |
dagger |
† |
† |
‡ |
double dagger |
‡ |
‡ |
… |
horizontal ellipsis |
… |
… |
‰ |
per mille |
‰ |
‰ |
‹ |
single left-pointing angle quotation |
‹ |
‹ |
› |
single right-pointing angle quotation |
› |
› |
|
euro |
€ |
€ |
|