c++字符編碼轉換


c++字符編碼轉換

簡述

字符編碼一直是軟件開發中很麻煩的問題。當前項目開發普遍使用的字符集是utf-8,而windows系統則默認是gbk,linux默認編碼則是utf-8,所以想要開發一個在windows正常運行的軟件,就需要考慮字符集的問題。

c++11新增了很多本地化的功能,包括字符編碼轉換等,主要使用wstring_convert和codecvt相結合進行轉換,下面介紹具體的方法供大家學習(復制粘貼 😉)。

windows:gbk編碼,std::wstring = std::u16string,wchar_t = char16_t (utf-16編碼)

linux:utf-8編碼,std::wstring = std::u32string,wchar_t = char32_t (utf-32編碼)

編碼轉換

  • 依賴的頭文件:

    #include <codecvt>
    #include <locale>
    
  • 轉換方法:

    coding.h

    #ifndef TE_TEST_CODING_H
    #define TE_TEST_CODING_H
    
    #include <string>
    
    
    namespace coding {
    
    #ifdef _WIN32
        //GBK locale name in windows
        inline constexpr const char * GBK_LOCALE_NAME = ".936";
    #else
        inline constexpr const char * GBK_LOCALE_NAME = "zh_CN.GBK";
    #endif
    
        /**
         * utf-8 --> wchar
         * @param _utf8 要求std::string的編碼是utf-8
         * @return 寬字符串
         */
        std::wstring utf8_to_wstr(const std::string& _utf8);
    
        /**
         * wchar --> utf-8
         * @param _wstr 寬字符串
         * @return 轉化為utf-8 編碼的字符串
         */
        std::string wstr_to_utf8(const std::wstring& _wstr);
    
        /**
         * utf-8 --> gbk
         * @param _utf8 utf-8
         * @return gbk
         */
        std::string utf8_to_gbk(const std::string& _utf8);
    
        /**
         * gbk --> utf-8
         * @param _gbk gbk
         * @return utf-8
         */
        std::string gbk_to_utf8(const std::string& _gbk);
    
        /**
         * gbk --> std::wstring
         * @param _gbk gbk
         * @return 寬字符串
         */
        std::wstring gbk_to_wstr(const std::string& _gbk);
    
        /**
         * std::wstring --> gbk
         * @param _wstr 寬字符串
         * @return gbk
         */
        std::string wstr_to_gbk(const std::wstring& _wstr);
    }
    
    
    #endif //TE_TEST_CODING_H
    

    coding.cpp

    #include "coding.h"
    
    #include <codecvt>
    #include <locale>
    
    
    // 包裝 wstring/wbuffer_convert 所用的綁定本地環境平面的工具
    template<class Facet>
    struct deletable_facet : Facet
    {
        template<class ...Args>
        explicit deletable_facet(Args&& ...args) : Facet(std::forward<Args>(args)...) {}
        ~deletable_facet() override = default;
    };
    
    
    std::wstring coding::utf8_to_wstr(const std::string &_utf8) {
        std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
        return converter.from_bytes(_utf8);
    }
    
    std::string coding::wstr_to_utf8(const std::wstring &_wstr) {
        std::wstring_convert<std::codecvt_utf8<wchar_t>> convert;
        return convert.to_bytes(_wstr);
    }
    
    std::string coding::utf8_to_gbk(const std::string &_utf8) {
        std::wstring tmp_wstr = utf8_to_wstr(_utf8);
        return wstr_to_gbk(tmp_wstr);
    }
    
    std::string coding::gbk_to_utf8(const std::string &_gbk) {
        std::wstring tmp_wstr = gbk_to_wstr(_gbk);
        return wstr_to_utf8(tmp_wstr);
    }
    
    std::wstring coding::gbk_to_wstr(const std::string &_gbk) {
        using codecvt = deletable_facet<std::codecvt_byname<wchar_t, char, mbstate_t>>;
        std::wstring_convert<codecvt> convert(new codecvt(GBK_LOCALE_NAME));
        return convert.from_bytes(_gbk);
    }
    
    std::string coding::wstr_to_gbk(const std::wstring& _wstr) {
        using codecvt = deletable_facet<std::codecvt_byname<wchar_t, char, mbstate_t>>;
        std::wstring_convert<codecvt> convert(new codecvt(GBK_LOCALE_NAME));
        return convert.to_bytes(_wstr);
    }
    

補充說明

結構體deletable_facet的作用是公有化codecvt_byname模板類的析構函數,該類的析構函數默認為 protected。部分編譯環境實現允許析構析構方法為保護的對象,但部分(如GUN)要求自定義類,繼承 Facet 並有 public 的析構方法,否則會出現以下問題:

In file included from /usr/include/c++/6.2.1/bits/locale_conv.h:41:0,
                 from /usr/include/c++/6.2.1/locale:43,
                 from main.cpp:3: /usr/include/c++/6.2.1/bits/unique_ptr.h: In instantiation of ‘void std::default_delete<_Tp>::operator()(_Tp*) const [with _Tp = std::codecvt<wchar_t, char, __mbstate_t>]’:
/usr/include/c++/6.2.1/bits/unique_ptr.h:236:17:   required from ‘std::unique_ptr<_Tp, _Dp>::~unique_ptr() [with _Tp = std::codecvt<wchar_t, char, __mbstate_t>; _Dp = std::default_delete<std::codecvt<wchar_t, char, __mbstate_t> >]’
/usr/include/c++/6.2.1/bits/locale_conv.h:218:7:   required from here
/usr/include/c++/6.2.1/bits/unique_ptr.h:76:2: error: ‘virtual std::codecvt<wchar_t, char, __mbstate_t>::~codecvt()’ is protected within this context
delete __ptr;
^~~~~~
In file included from /usr/include/c++/6.2.1/codecvt:41:0,
                 from main.cpp:1:
/usr/include/c++/6.2.1/bits/codecvt.h:426:7: note: declared protected here
       ~codecvt();
       ^

詳情可見官方文檔說明

本文參考了博客並在其基礎上進行補充完善,修復了部分問題。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM