C++字符串編碼轉換

本文轉載自查看原文 2021-09-21 21:58 193 C++

C++中字符串有很多種類，詳情參考C++中的字符串類型。本文主要以string類型為例，講一下字符串的編碼，選擇string主要是因為：

byte是字符串二進制編碼的最小結構，字符串本質上就是一個byte數組
C++沒有byte類型，第三方的byte類型通常是char實現的
char可以直接轉換成string，也就是說byte直接轉string

代碼轉自utf8與std::string字符編碼轉換，其它編碼格式的轉換方法類似（先轉雙字節Unicode編碼，再通過轉換為其它編碼的多字節），代碼如下：

std::string UTF8_To_string(const std::string& str)
{
    int nwLen = MultiByteToWideChar(CP_UTF8, 0, str.c_str(), -1, NULL, 0);    
    wchar_t* pwBuf = new wchar_t[nwLen + 1];//加1用於截斷字符串 
    memset(pwBuf, 0, nwLen * 2 + 2);

    MultiByteToWideChar(CP_UTF8, 0, str.c_str(), str.length(), pwBuf, nwLen);

    int nLen = WideCharToMultiByte(CP_ACP, 0, pwBuf, -1, NULL, NULL, NULL, NULL);

    char* pBuf = new char[nLen + 1];
    memset(pBuf, 0, nLen + 1);

    WideCharToMultiByte(CP_ACP, 0, pwBuf, nwLen, pBuf, nLen, NULL, NULL);

    std::string retStr = pBuf;

    delete[]pBuf;
    delete[]pwBuf;

    pBuf = NULL;
    pwBuf = NULL;

    return retStr;
}


std::string string_To_UTF8(const std::string& str)
{
    int nwLen = ::MultiByteToWideChar(CP_ACP, 0, str.c_str(), -1, NULL, 0);

    wchar_t* pwBuf = new wchar_t[nwLen + 1];//加1用於截斷字符串 
    ZeroMemory(pwBuf, nwLen * 2 + 2);

    ::MultiByteToWideChar(CP_ACP, 0, str.c_str(), str.length(), pwBuf, nwLen);

    int nLen = ::WideCharToMultiByte(CP_UTF8, 0, pwBuf, -1, NULL, NULL, NULL, NULL);

    char* pBuf = new char[nLen + 1];
    ZeroMemory(pBuf, nLen + 1);

    ::WideCharToMultiByte(CP_UTF8, 0, pwBuf, nwLen, pBuf, nLen, NULL, NULL);

    std::string retStr(pBuf);

    delete[]pwBuf;
    delete[]pBuf;

    pwBuf = NULL;
    pBuf = NULL;

    return retStr;
}

注：string使用的ANSI編碼，在簡體中文系統下ANSI編碼代表GB2312編碼。

MultiByteToWideChar和WideCharToMultiByte用法參考MultiByteToWideChar和WideCharToMultiByte用法詳解
，方法的第一個參數是指定指針所指字符串內存的編碼格式，內容如下：

Value	Description
CP_ACP	ANSI code page
CP_MACCP	Not supported
CP_OEMCP	OEM code page
CP_SYMBOL	Not supported
CP_THREAD_ACP	Not supported
CP_UTF7	UTF-7 code page
CP_UTF8	UTF-8 code page

兩個方法都會調用兩次，第一次調用最后一個參數（目標字符串長度）為0，方法返回目標字符串長度的長度。第二次調用時，最后一個參數傳入目標字符串長度+1，直接在緩沖區寫入轉換后的字符串。

注：在linux下也有類似的兩個函數：mbstowcs()、wcstombs()，使用方法參考https://blog.csdn.net/yiyaaixuexi/article/details/6174971。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 c++數字和字符串的轉換 C++時間與字符串轉換 C/C++中文的編碼和字符串處理 C/C++中字符串與數字轉換 C++ 將string轉換成char*字符串 c++ 中的數字和字符串的轉換 c++ 各種類型字符串轉換 C++ 字符串與數字之間的轉換 C++實現json字符串與map的轉換 JAVA 字符串編碼轉換