Linux命令之iconv

本文轉載自查看原文 2012-05-08 20:33 9527 cmd

用途說明

iconv命令是用來轉換文件的編碼方式的（Convert encoding of given files from one encoding to another），比如它可以將UTF8編碼的轉換成GB18030的編碼，反過來也行。JDK中也提供了類似的工具native2ascii。Linux下的iconv開發庫包括iconv_open,iconv_close,iconv等C函數，可以用來在C/C++程序中很方便的轉換字符編碼，這在抓取網頁的程序中很有用處，而iconv命令在調試此類程序時用得着。

常用參數

首先，我們要知道支持的字符編碼有哪些，這個可以用-l參數得到（List known coded character sets）。

格式：iconv -l

其次，是怎樣轉換，如下所示：

格式：iconv -f from-encoding -t to-encoding inputfile

上面的調用方式，會把輸出打印在屏幕上，如果要輸出到文件，可以像下面這樣

格式：iconv -f from-encoding -t to-encoding inputfile -o outputfile

使用示例示例一列出支持的字符編碼

[root@new55 ~]# iconv -l
The following list contain all the coded character sets known. This does
not necessarily mean that all combinations of these names can be used for
the FROM and TO command line parameters. One coded character set can be
listed with several different names (aliases).
437, 500, 500V1, 850, 851, 852, 855, 856, 857, 860, 861, 862, 863, 864, 865,
866, 866NAV, 869, 874, 904, 1026, 1046, 1047, 8859_1, 8859_2, 8859_3, 8859_4,
8859_5, 8859_6, 8859_7, 8859_8, 8859_9, 10646-1:1993, 10646-1:1993/UCS4,
ANSI_X3.4-1968, ANSI_X3.4-1986, ANSI_X3.4, ANSI_X3.110-1983, ANSI_X3.110,
ARABIC, ARABIC7, ARMSCII-8, ASCII, ASMO-708, ASMO_449, BALTIC, BIG-5,
BIG-FIVE, BIG5-HKSCS, BIG5, BIG5HKSCS, BIGFIVE, BS_4730, CA, CN-BIG5, CN-GB,
中間省略掉輸出了。
EUCJP-OPEN, EUCJP-WIN, EUCJP, EUCKR, EUCTW, FI, FR, GB, GB2312, GB13000,
GB18030, GBK, GB_1988-80, GB_198880, GEORGIAN-ACADEMY, GEORGIAN-PS,
GOST_19768-74, GOST_19768, GOST_1976874, GREEK-CCITT, GREEK, GREEK7-OLD,
GREEK7, GREEK7OLD, GREEK8, GREEKCCITT, HEBREW, HP-ROMAN8, HPROMAN8, HU,
中間省略掉輸出了。
TIS620.2529-1, TIS620.2533-0, TIS620, TS-5881, TSCII, UCS-2, UCS-2BE,
UCS-2LE, UCS-4, UCS-4BE, UCS-4LE, UCS2, UCS4, UHC, UJIS, UK, UNICODE,
UNICODEBIG, UNICODELITTLE, US-ASCII, US, UTF-7, UTF-8, UTF-16, UTF-16BE,
UTF-16LE, UTF-32, UTF-32BE, UTF-32LE, UTF7, UTF8, UTF16, UTF16BE, UTF16LE,
UTF32, UTF32BE, UTF32LE, VISCII, WCHAR_T, WIN-SAMI-2, WINBALTRIM,
WINDOWS-31J, WINDOWS-874, WINDOWS-936, WINDOWS-1250, WINDOWS-1251,
WINDOWS-1252, WINDOWS-1253, WINDOWS-1254, WINDOWS-1255, WINDOWS-1256,
WINDOWS-1257, WINDOWS-1258, WINSAMI2, WS2, YU

太多了，我只想知道支持哪些中文格式的。
[root@new55 ~]# iconv -l | grep GB
CN-GB//
CSGB2312//
CSISO58GB1988//
EBCDIC-CP-GB//
GB//
GB2312//
GB13000//
GB18030//
GBK//
GB_1988-80//
GB_198880//
ISO646-GB//

有沒有發現奇怪的地方，每行顯示一個，並且后面加了兩個斜杠。
[root@new55 ~]#

示例二將Google香港的Big5編碼轉換成GBK編碼

[root@new55 ~]# curl -s http://www.google.com.hk/ | iconv -f big5 -t gbk
<!doctype html><html><head><meta http-equiv="content-type" content="text/html; charset=Big5"><title>Google</title><script>window.google={kEI:"tFXZTNHKDcGTkAXpvOHhCA",kEXPI:"26637,27404",kCSI:{e:"26637,27404",ei:"tFXZTNHKDcGTkAXpvOHhCA",expi:"26637,27404"},ml:function(){},kHL:"zh-TW",time:function(){return(new Date).getTime()},log:function(b,d,c){var a=new Image,e=google,g=e.lc,f=e.li;a.onerror=(a.onload=(a.onabort=function(){delete g[f]}));g[f]=a;c=c||"/gen_204?atyp=i&ct="+b+"&cad="+d+"&zx="+google.time();a.src=c;e.li=f+1},lc:[],li:0,Toolbelt:{}};
id=ghead><div id=gbar><nobr><b class="gb1">所有網頁</b> <a onclick=gbar.qs(this) href="http://www.google.com.hk/imghp?hl=zh-tw&tab=wi" class="gb1">圖片</a> <a onclick=gbar.qs(this) href="http://video.google.com.hk/?hl=zh-tw&tab=wv" class="gb1">影片</a> <a onclick=gbar.qs(this) href="http://maps.google.com.hk/maps?hl=zh-tw&tab=wl" class="gb1">地圖</a> <a onclick=gbar.qs(this) f||document.f||document.gs;google.ac.i(form,form.q,'','','',{o:1,sw:1});google.mc = [[14,{}],[64,{}],[105,{}],[22,{"m_error":"\u003Cfont color=red\u003E錯誤：\u003C/font\u003E 伺服器無法完成您的要求。請在 30 秒後再試一次。","m_tip":"按一下以取得詳細資訊。"}],[84,{}]];google.med('init');google.History&&google.History.initialize('/')});if(google.j&&google.j.en&&google.j.xi){window.setTimeout(google.j.xi,0);google.fade=null;}</script></div><script>(function(){
中間省略掉輸出了。
})();
</script>[root@new55 ~]#

示例三將我的JavaEye博客首頁從UTF8轉換成GBK

[root@new55 ~]# curl -s http://codingstandards.javaeye.com/ | iconv -f utf8 -t gbk
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="zh-CN" dir="ltr">
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
    <title>Bash @ Linux - JavaEye技術網站</title>
    <meta name="description" content="" />
    <meta name="keywords" content="codingstandards Bash @ Linux" />
中間省略掉輸出了。
<div class="blog_main">
<div class="blog_title">
<div class="date"><span class='year'>2010</span><span class='sep_year'>-</span><span class='month'>10</span><span class='sep_month'>-</span><span class='day'>17</span></div>
<div class="show_full_flag"><a href='?show_full=true'>全文顯示</a></div>
<h3><a href='/blog/786653'>[置頂] 我使用過的Linux命令系列總目錄</a></h3>
<strong>文章分類:<a href="http://www.javaeye.com/blogs/category/os" style="text-decoration:none;padding-right:10px;">操作系統</a></strong>
</div>
<div class="blog_content">
    我使用過的Linux命令系列總目錄
本文鏈接： http://codingstandards.javaeye.com/blog/786653
iconv: 未知 3345 處的非法輸入序列

最后一行表明有錯，改用下面的就會成功了。
[root@new55 ~]# curl -s http://codingstandards.javaeye.com/ | iconv -f utf8 -t gb18030

此處省略輸出。有興趣的讀者可以試一下，可以完整的顯示整個頁面的源代碼。因為gbk是gb18030的子集，gb18030包含更多的字符。

[root@new55 ~]#

示例四將夢之都的UTF8轉換成GBK

[root@new55 ~]# curl -s http://www.dreamdu.com/ | iconv -futf8 -t gbk
iconv: 未知 0 處的非法輸入序列

那就把前面三個字節去掉試試，果然可以了。

[root@new55 ~]# curl -s http://www.dreamdu.com/ | cut -b 4- | iconv -futf8 -t gbk
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
ml xmlns="http://www.w3.org/1999/xhtml" xml:lang="zh-CN" dir="ltr">
ead>
meta http-equiv="content-type" content="text/html; charset=utf-8" />
meta http-equiv="content-language" content="zh-CN" />
link rel="stylesheet" type="text/css" href="/style.css?v=1" media="screen" />
script type="text/javascript" src="/js.js"></script>
title>夢之都 - 網站設計與開發教程</title>
head>
ody>

中間省略掉輸出。
body>
tml>

發現問題沒有，每行的前面幾個字符都消失了！！！
[root@new55 ~]#

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Linux命令之iconv iconv命令詳解 [Linux]常用命令之【cat/echo/iconv/vi/grep/find/head/tail】 linux iconv文件編碼轉換 iconv命令 gbk 轉 UTF-8 Linux - iconv 使用 - 處理 curl 亂碼問題用linux的iconv函數轉換編碼 linux下iconv ut8轉成gbk Linux下安裝libiconv使php支持iconv函數在Linux下使用iconv轉換字符串編碼