c++字符編碼轉換
簡述
字符編碼一直是軟件開發中很麻煩的問題。當前項目開發普遍使用的字符集是utf-8,而windows系統則默認是gbk,linux默認編碼則是utf-8,所以想要開發一個在windows正常運行的軟件,就需要考慮字符集的問題。
c++11新增了很多本地化的功能,包括字符編碼轉換等,主要使用wstring_convert和codecvt相結合進行轉換,下面介紹具體的方法供大家學習(復制粘貼 😉)。
windows:gbk編碼,std::wstring = std::u16string,wchar_t = char16_t (utf-16編碼)
linux:utf-8編碼,std::wstring = std::u32string,wchar_t = char32_t (utf-32編碼)
編碼轉換
-
依賴的頭文件:
#include <codecvt> #include <locale> -
轉換方法:
coding.h
#ifndef TE_TEST_CODING_H #define TE_TEST_CODING_H #include <string> namespace coding { #ifdef _WIN32 //GBK locale name in windows inline constexpr const char * GBK_LOCALE_NAME = ".936"; #else inline constexpr const char * GBK_LOCALE_NAME = "zh_CN.GBK"; #endif /** * utf-8 --> wchar * @param _utf8 要求std::string的編碼是utf-8 * @return 寬字符串 */ std::wstring utf8_to_wstr(const std::string& _utf8); /** * wchar --> utf-8 * @param _wstr 寬字符串 * @return 轉化為utf-8 編碼的字符串 */ std::string wstr_to_utf8(const std::wstring& _wstr); /** * utf-8 --> gbk * @param _utf8 utf-8 * @return gbk */ std::string utf8_to_gbk(const std::string& _utf8); /** * gbk --> utf-8 * @param _gbk gbk * @return utf-8 */ std::string gbk_to_utf8(const std::string& _gbk); /** * gbk --> std::wstring * @param _gbk gbk * @return 寬字符串 */ std::wstring gbk_to_wstr(const std::string& _gbk); /** * std::wstring --> gbk * @param _wstr 寬字符串 * @return gbk */ std::string wstr_to_gbk(const std::wstring& _wstr); } #endif //TE_TEST_CODING_Hcoding.cpp
#include "coding.h" #include <codecvt> #include <locale> // 包裝 wstring/wbuffer_convert 所用的綁定本地環境平面的工具 template<class Facet> struct deletable_facet : Facet { template<class ...Args> explicit deletable_facet(Args&& ...args) : Facet(std::forward<Args>(args)...) {} ~deletable_facet() override = default; }; std::wstring coding::utf8_to_wstr(const std::string &_utf8) { std::wstring_convert<std::codecvt_utf8<wchar_t>> converter; return converter.from_bytes(_utf8); } std::string coding::wstr_to_utf8(const std::wstring &_wstr) { std::wstring_convert<std::codecvt_utf8<wchar_t>> convert; return convert.to_bytes(_wstr); } std::string coding::utf8_to_gbk(const std::string &_utf8) { std::wstring tmp_wstr = utf8_to_wstr(_utf8); return wstr_to_gbk(tmp_wstr); } std::string coding::gbk_to_utf8(const std::string &_gbk) { std::wstring tmp_wstr = gbk_to_wstr(_gbk); return wstr_to_utf8(tmp_wstr); } std::wstring coding::gbk_to_wstr(const std::string &_gbk) { using codecvt = deletable_facet<std::codecvt_byname<wchar_t, char, mbstate_t>>; std::wstring_convert<codecvt> convert(new codecvt(GBK_LOCALE_NAME)); return convert.from_bytes(_gbk); } std::string coding::wstr_to_gbk(const std::wstring& _wstr) { using codecvt = deletable_facet<std::codecvt_byname<wchar_t, char, mbstate_t>>; std::wstring_convert<codecvt> convert(new codecvt(GBK_LOCALE_NAME)); return convert.to_bytes(_wstr); }
補充說明
結構體deletable_facet的作用是公有化codecvt_byname模板類的析構函數,該類的析構函數默認為 protected。部分編譯環境實現允許析構析構方法為保護的對象,但部分(如GUN)要求自定義類,繼承 Facet 並有 public 的析構方法,否則會出現以下問題:
In file included from /usr/include/c++/6.2.1/bits/locale_conv.h:41:0,
from /usr/include/c++/6.2.1/locale:43,
from main.cpp:3: /usr/include/c++/6.2.1/bits/unique_ptr.h: In instantiation of ‘void std::default_delete<_Tp>::operator()(_Tp*) const [with _Tp = std::codecvt<wchar_t, char, __mbstate_t>]’:
/usr/include/c++/6.2.1/bits/unique_ptr.h:236:17: required from ‘std::unique_ptr<_Tp, _Dp>::~unique_ptr() [with _Tp = std::codecvt<wchar_t, char, __mbstate_t>; _Dp = std::default_delete<std::codecvt<wchar_t, char, __mbstate_t> >]’
/usr/include/c++/6.2.1/bits/locale_conv.h:218:7: required from here
/usr/include/c++/6.2.1/bits/unique_ptr.h:76:2: error: ‘virtual std::codecvt<wchar_t, char, __mbstate_t>::~codecvt()’ is protected within this context
delete __ptr;
^~~~~~
In file included from /usr/include/c++/6.2.1/codecvt:41:0,
from main.cpp:1:
/usr/include/c++/6.2.1/bits/codecvt.h:426:7: note: declared protected here
~codecvt();
^
本文參考了博客並在其基礎上進行補充完善,修復了部分問題。
