何為字符編碼？

字符編碼為計算機文字的存儲格式，例如英文字母以ASCII編碼存儲，即單字節存儲，其他字符編碼有 UTF-8（通用字符編碼格式），其他區域性編碼格式，例如 ISO-8859（西歐)， windows-1251俄文，中文GB編碼。

為什么需要轉換？

正因各個地區有不同的編碼格式，為了交換信息的目的，就需要將相同字符的從一種編碼格式轉換為另外一種編碼格式。

通用的編碼格式為 UTF-8, 其囊括了世界上所有字符，所以一般為了通用性，文件都以UTF-8編碼（例如網頁支持多語言顯示的情況），其他編碼的語言一般都向UTF-8轉換。

轉換庫LIBICONV

http://www.gnu.org/software/libiconv/#introduction

GNU世界提供了一個開源轉換庫，支持若干編碼和 unicode 編碼之間的轉換。此庫可以再沒有提供編碼轉換的系統上使用。

項目地址 http://savannah.gnu.org/projects/libiconv/

最新的Linux C庫以已經提供 iconv 的轉換，可以不用安裝：

http://davidgao.github.io/LFSCN/chapter06/glibc.html

LFS 之外的某些程序包推薦安裝 GNU libiconv 用於轉換文本編碼。此工程的主頁 (http://www.gnu.org/software/libiconv/) 表示 “此庫提供一個 iconv() 實現，用於沒有提供此實現或無法操作 Unicode 的系統。” Glibc 提供一個 iconv() 實現並且可以操作 Unicode，所以在 LFS 系統上不必安裝 libiconv。

LUAICONV

對於成熟的 lua，對iconv功能進行了封裝，形成了一個專門的庫，提供給LUA應用腳本使用。

官網介紹

http://ittner.github.io/lua-iconv/#download-and-installation

 local iconv = require("iconv")

  cd = iconv.new(to, from)
  cd = iconv.open(to, from)

  nstr, err = cd:iconv(str)

    Converts the 'str' string to the desired charset. This method always
    returns two arguments: the converted string and an error code, which
    may have any of the following values:

    nil
        No error. Conversion was successful.

    iconv.ERROR_NO_MEMORY
        Failed to allocate enough memory in the conversion process.

    iconv.ERROR_INVALID
        An invalid character was found in the input sequence.

    iconv.ERROR_INCOMPLETE
        An incomplete character was found in the input sequence.

    iconv.ERROR_FINALIZED
        Trying to use an already-finalized converter. This usually means
        that the user was tweaking the garbage collector private methods.

    iconv.ERROR_UNKNOWN
        There was an unknown error.

對於LUA 5.1版本，推薦下載 lua-iconv-5 版本，最新的-7版本兼容 LUA5.2

https://github.com/ittner/lua-iconv/releases/tag/lua-iconv-5

安裝運行有報錯：

:~/share_windows/openSource/lua/lua-iconv-lua-iconv-5$ lua test_iconv.lua
lua: error loading module 'iconv' from file './iconv.so':
   ./iconv.so: undefined symbol: libiconv_open
stack traceback:
   [C]: ?
   [C]: in function 'require'
   test_iconv.lua:1: in main chunk
   [C]: ?

經過查證（受到此文啟發 http://tonybai.com/2013/04/25/a-libiconv-linkage-problem/），

分析為先安裝了 libiconv庫，導致此庫的iconv.h拷貝到 usr/local/include/iconv.h

然后編譯 luaiconv工程，編譯文件iconv.c文件時候， gcc先找到 usr/local/include/iconv.h 此文件，以此文件內部的函數聲明為准，編譯出iconv.so

實際上次應該以系統提供的 iconv.h 為准，此文件在 /usr/include/iconv.h

頭文件gcc搜索次序：

:~/share_windows/openSource/lua/lua-iconv-lua-iconv-5$ ld -verbose | grep SEARCH
SEARCH_DIR("=/usr/i686-linux-gnu/lib32"); SEARCH_DIR("=/usr/local/lib32"); SEARCH_DIR("=/lib32"); SEARCH_DIR("=/usr/lib32"); SEARCH_DIR("=/usr/i686-linux-gnu/lib"); SEARCH_DIR("=/usr/local/lib/i386-linux-gnu"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib/i386-linux-gnu"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib/i386-linux-gnu"); SEARCH_DIR("=/usr/lib");

libiconv-------usr/local/include/iconv.h

#ifndef LIBICONV_PLUG
#define iconv_open libiconv_open
#endif
extern LIBICONV_DLL_EXPORTED iconv_t iconv_open (const char* tocode, const char* fromcode);

libiconv -- iconv.c 中 libiconv_open 定義收到宏控制，應該未開啟，或者編譯 luaiconv未鏈接libiconv庫

#if defined __FreeBSD__ && !defined __gnu_freebsd__
/* GNU libiconv is the native FreeBSD iconv implementation since 2002.
It wants to define the symbols 'iconv_open', 'iconv', 'iconv_close'. */
#define strong_alias(name, aliasname) _strong_alias(name, aliasname)
#define _strong_alias(name, aliasname) \
extern __typeof (name) aliasname __attribute__ ((alias (#name)));
#undef iconv_open
#undef iconv
#undef iconv_close
strong_alias (libiconv_open, iconv_open)
strong_alias (libiconv, iconv)
strong_alias (libiconv_close, iconv_close)
#endif

解決方法: 修改實現文件中，引用的 iconv.h 引用方式，將標准方式，修改為自定義，並且寫為全路徑 /usr/include/iconv.h

然后再次 make && make install，運行ok

vim luaiconv.c

#include <lua.h>
#include <lauxlib.h>
#include <stdlib.h>

#include "/usr/include/iconv.h"
#include <errno.h>

安裝運行其它報錯參考：

https://github.com/ittner/lua-iconv/issues/3

生成轉換表實驗

在一些嵌入式系統上，沒有安裝libiconv庫，或者 libc庫中也沒有實現 iconv 功能，但是同時還是需要字符換場景，

可以在編譯服務器上，安裝luaiconv，利用系統的iconv功能，生成一種編碼到另外一種編碼的映射表，然后利用此映射表來，是實現轉換。

例如，將windows-1251轉換為UTF-8

windows-1251 字符編碼參考：

http://www.science.co.il/language/Character-code.asp?s=1251

生成表的LUA代碼：

function serializeTable(val, name, skipnewlines, depth)
    skipnewlines = skipnewlines or false
    depth = depth or 0
    local tmp = string.rep(" ", depth)
    if name then tmp = tmp .. name .. " = " end
    if type(val) == "table" then
        tmp = tmp .. "{" .. (not skipnewlines and "\n" or "")
        for k, v in pairs(val) do
            tmp = tmp .. serializeTable(v, k, skipnewlines, depth + 1) .. "," .. (not skipnewlines and "\n" or "")
        end
        tmp = tmp .. string.rep(" ", depth) .. "}"
    elseif type(val) == "number" then
        tmp = tmp .. tostring(val)
    elseif type(val) == "string" then
        tmp = tmp .. string.format("%q", val)
    elseif type(val) == "boolean" then
        tmp = tmp .. (val and "true" or "false")
    else
        tmp = tmp .. "\"[inserializeable datatype:" .. type(val) .. "]\""
    end
    return tmp
end

local iconv = require("iconv")
-- Set your terminal encoding here
-- local termcs = "iso-8859-1"
local termcs = "utf-8"

function check_one(to, from, text)
  print("\n-- Testing conversion from " .. from .. " to " .. to)
  local cd = iconv.new(to .. "//TRANSLIT", from)
  assert(cd, "Failed to create a converter object.")
  local ostr, err = cd:iconv(text)
  if err == iconv.ERROR_INCOMPLETE then
    print("ERROR: Incomplete input.")
  elseif err == iconv.ERROR_INVALID then
    print("ERROR: Invalid input.")
  elseif err == iconv.ERROR_NO_MEMORY then
    print("ERROR: Failed to allocate memory.")
  elseif err == iconv.ERROR_UNKNOWN then
    print("ERROR: There was an unknown error.")
  end

  print(ostr)
  return ostr
end
 
local result = {}
local num = 255
for i = 0, num do
  print("----------------------------------- i="..i)
  local char = string.char(i)
  local ostr = check_one(termcs, "windows-1251", char)
  print(string.len(ostr))
  local byteStr = ""
  for j = 1, string.len(ostr) do
      local byteVal = string.byte(ostr,j)
      print("byte j=" ..j .. " byteVal=".. byteVal)
      byteStr = byteStr .. "\\" .. byteVal
  end
  print("char i=" ..i .. " byteStr=".. byteStr)
  table.insert(result, byteStr)
end

print("-----------------------------------!!")
s = serializeTable(result)
print(s)

整理后的 windows-1251轉換為UTF-8 的表

lcoal transTbl_1251toutf8 = {
 1 = "\0",
 2 = "\1",
 3 = "\2",
 4 = "\3",
 5 = "\4",
 6 = "\5",
 7 = "\6",
 8 = "\7",
 9 = "\8",
 10 = "\9",
 11 = "\10",
 12 = "\11",
 13 = "\12",
 14 = "\13",
 15 = "\14",
 16 = "\15",
 17 = "\16",
 18 = "\17",
 19 = "\18",
 20 = "\19",
 21 = "\20",
 22 = "\21",
 23 = "\22",
 24 = "\23",
 25 = "\24",
 26 = "\25",
 27 = "\26",
 28 = "\27",
 29 = "\28",
 30 = "\29",
 31 = "\30",
 32 = "\31",
 33 = "\32",
 34 = "\33",
 35 = "\34",
 36 = "\35",
 37 = "\36",
 38 = "\37",
 39 = "\38",
 40 = "\39",
 41 = "\40",
 42 = "\41",
 43 = "\42",
 44 = "\43",
 45 = "\44",
 46 = "\45",
 47 = "\46",
 48 = "\47",
 49 = "\48",
 50 = "\49",
 51 = "\50",
 52 = "\51",
 53 = "\52",
 54 = "\53",
 55 = "\54",
 56 = "\55",
 57 = "\56",
 58 = "\57",
 59 = "\58",
 60 = "\59",
 61 = "\60",
 62 = "\61",
 63 = "\62",
 64 = "\63",
 65 = "\64",
 66 = "\65",
 67 = "\66",
 68 = "\67",
 69 = "\68",
 70 = "\69",
 71 = "\70",
 72 = "\71",
 73 = "\72",
 74 = "\73",
 75 = "\74",
 76 = "\75",
 77 = "\76",
 78 = "\77",
 79 = "\78",
 80 = "\79",
 81 = "\80",
 82 = "\81",
 83 = "\82",
 84 = "\83",
 85 = "\84",
 86 = "\85",
 87 = "\86",
 88 = "\87",
 89 = "\88",
 90 = "\89",
 91 = "\90",
 92 = "\91",
 93 = "\92",
 94 = "\93",
 95 = "\94",
 96 = "\95",
 97 = "\96",
 98 = "\97",
 99 = "\98",
 100 = "\99",
 101 = "\100",
 102 = "\101",
 103 = "\102",
 104 = "\103",
 105 = "\104",
 106 = "\105",
 107 = "\106",
 108 = "\107",
 109 = "\108",
 110 = "\109",
 111 = "\110",
 112 = "\111",
 113 = "\112",
 114 = "\113",
 115 = "\114",
 116 = "\115",
 117 = "\116",
 118 = "\117",
 119 = "\118",
 120 = "\119",
 121 = "\120",
 122 = "\121",
 123 = "\122",
 124 = "\123",
 125 = "\124",
 126 = "\125",
 127 = "\126",
 128 = "\127",
 129 = "\208\130",
 130 = "\208\131",
 131 = "\226\128\154",
 132 = "\209\147",
 133 = "\226\128\158",
 134 = "\226\128\166",
 135 = "\226\128\160",
 136 = "\226\128\161",
 137 = "\226\130\172",
 138 = "\226\128\176",
 139 = "\208\137",
 140 = "\226\128\185",
 141 = "\208\138",
 142 = "\208\140",
 143 = "\208\139",
 144 = "\208\143",
 145 = "\209\146",
 146 = "\226\128\152",
 147 = "\226\128\153",
 148 = "\226\128\156",
 149 = "\226\128\157",
 150 = "\226\128\162",
 151 = "\226\128\147",
 152 = "\226\128\148",
 153 = "",
 154 = "\226\132\162",
 155 = "\209\153",
 156 = "\226\128\186",
 157 = "\209\154",
 158 = "\209\156",
 159 = "\209\155",
 160 = "\209\159",
 161 = "\194\160",
 162 = "\208\142",
 163 = "\209\158",
 164 = "\208\136",
 165 = "\194\164",
 166 = "\210\144",
 167 = "\194\166",
 168 = "\194\167",
 169 = "\208\129",
 170 = "\194\169",
 171 = "\208\132",
 172 = "\194\171",
 173 = "\194\172",
 174 = "\194\173",
 175 = "\194\174",
 176 = "\208\135",
 177 = "\194\176",
 178 = "\194\177",
 179 = "\208\134",
 180 = "\209\150",
 181 = "\210\145",
 182 = "\194\181",
 183 = "\194\182",
 184 = "\194\183",
 185 = "\209\145",
 186 = "\226\132\150",
 187 = "\209\148",
 188 = "\194\187",
 189 = "\209\152",
 190 = "\208\133",
 191 = "\209\149",
 192 = "\209\151",
 193 = "\208\144",
 194 = "\208\145",
 195 = "\208\146",
 196 = "\208\147",
 197 = "\208\148",
 198 = "\208\149",
 199 = "\208\150",
 200 = "\208\151",
 201 = "\208\152",
 202 = "\208\153",
 203 = "\208\154",
 204 = "\208\155",
 205 = "\208\156",
 206 = "\208\157",
 207 = "\208\158",
 208 = "\208\159",
 209 = "\208\160",
 210 = "\208\161",
 211 = "\208\162",
 212 = "\208\163",
 213 = "\208\164",
 214 = "\208\165",
 215 = "\208\166",
 216 = "\208\167",
 217 = "\208\168",
 218 = "\208\169",
 219 = "\208\170",
 220 = "\208\171",
 221 = "\208\172",
 222 = "\208\173",
 223 = "\208\174",
 224 = "\208\175",
 225 = "\208\176",
 226 = "\208\177",
 227 = "\208\178",
 228 = "\208\179",
 229 = "\208\180",
 230 = "\208\181",
 231 = "\208\182",
 232 = "\208\183",
 233 = "\208\184",
 234 = "\208\185",
 235 = "\208\186",
 236 = "\208\187",
 237 = "\208\188",
 238 = "\208\189",
 239 = "\208\190",
 240 = "\208\191",
 241 = "\209\128",
 242 = "\209\129",
 243 = "\209\130",
 244 = "\209\131",
 245 = "\209\132",
 246 = "\209\133",
 247 = "\209\134",
 248 = "\209\135",
 249 = "\209\136",
 250 = "\209\137",
 251 = "\209\138",
 252 = "\209\139",
 253 = "\209\140",
 254 = "\209\141",
 255 = "\209\142",
 256 = "\209\143",
}

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 字符編碼轉換 LoadRunner字符編碼轉換 iconv字符編碼轉換 python 2 or 3 的字符編碼轉換 HTML實體編碼轉換為字符（JavaScript） boost-字符編碼轉換：使用conv 字符串轉換UTF-8編碼 C++ 字符編碼轉換類 python中字符串編碼轉換 js字符串與Unicode編碼互相轉換