[C] 跨平台使用TCHAR——讓Linux等平台也支持tchar.h，解決跨平台時的格式控制字符問題，多國語言的同時顯示（兼容vc/gcc/bcb，支持Windows/Linux/Mac）

本文轉載自查看原文 2013-01-17 16:40 15216 C11 C/ D50 Mac/ c99/ tcharall/ C10 C系列/ G10 Text_文本/ D40 Linux/ gcc/ D30 Windows/ utf8/ --- Program_編程/ VC/ --- Best_重要的/ C00 Language_語言/ --- My_原創/ D00 Platform_平台/ char/ unicode/ tchar/ wchar_t

作者：zyl910

　　將Windows程序移植到Linux等平台時，經常會遇到tchar.h問題與字符串的格式控制字符問題（char串、wchar_t串、TCHAR串混合輸出）。本文探討如何解決這些問題。

一、背景

1.1 歷史

　　傳統的C程序使用char字符串，采用ANSI+DBCS方案來支持當地語言，不能實現多國語言同時顯示。

　　當年微軟在設計Windows NT時考慮到國際化，決定內核支持Unicode，對應wchar_t類型。那時的Unicode只有16位，於是Windows中的wchar_t是16位的。
　　為了兼容老程序，與字符串有關的API一般有兩套——A結尾的表示是ANSI版，使用char字符串；W結尾的是Unicode版，使用wchar_t字符串。
　　兩套API用起來不方便，於是微軟設計了tchar.h，定義了TCHAR類型，使用宏來切換。只需編寫一份代碼，就可分別編譯為ANSI版與Unicode版，分別兼容老系統（win9X）和新系統（winNT）。

　　Linux等平台較晚才支持Unicode，那時已經有成熟的UTF-8編碼方案，兼容傳統的char類型。於是Linux等平台將UTF-8作為默認編碼，這樣不僅支持Unicode多國語言，而且傳統的C標准庫、POSIX等API均能正常工作。兩全其美，不再需要搞兩套API，自然也不需要tchar.h。
　　UTF-8是變長編碼，一個字符可能是1至4字節，處理起來不太方便。於是Linux等平台也提供了wchar_t類型，只不過它是32位的。

　　為什么是32位的的呢，這與Unicode的發展有關。由於Unicode需要收錄的東西太多，16位早就不夠用了。
　　UCS-4 提倡31位的編碼空間，並提出了UTF-32和6字節UTF-8等編碼方案。可是該方案的成本很高。
　　進過折衷考慮，Unicode組織將編碼空間由16位的0至FFFF，升級至21位的0至10FFFF。將傳統16位Unicode編碼稱為UTF-16，並提供代理對（surrogate）方案，用兩個UTF-16字符單元來編碼超過16位的字符。
　　也就是說，如果wchar_t類型是16位的話，那它實際上代表UTF-16編碼——對於在U+0000至U+FFFF之間的字符，每個字符占1個wchar_t；對於在U+10000至U+10FFFF之間的字符，每個字符占2個wchar_t。
　　為了確保每個字符都只占1個wchar_t，那就得將wchar_t定義為32位。這也就是UTF-32編碼。

　　雖然UTF-8編碼方案本身能表達很大的編碼空間（例如6字節UTF-8可編碼31位），但為了規范化，RFC 3629規定UTF-8最長為4字節，即最高21位編碼，超過10FFFF的編碼點是無效的。

1.2 為什么需要讓Linux等平台也支持tchar.h？

　　很多人認為Linux等平台沒必要支持tchar.h，這主要是因為wchar_t的一些問題——
1. UTF-8編碼的char類型能滿足Unicode國際化需求。
2. char類型更容易跨平台。而wchar_t是C95修訂中加入的，到C99標准才有比較完善的支持，故某些舊編譯器對wchar_t支持性不佳、甚至完全不支持。
3. wchar_t的位數不固定。在Windows平台中它是16位，而在Linux等平台中它是32位的。C99標准並沒有嚴格規定wchar_t的位數。
4. wchar_t版函數與char版函數不對稱。在C99的C標准庫中，只有部分字符串函數有wchar_t版。雖然Windows平台上有A、W兩套對稱的API，但其他平台只有一套API。

　　以前我也贊同上述觀點，但是現在我覺得有一個tchar.h會方便很多，理由有——
1. 方便Windows程序移植。很多控制台程序只進行了一些很簡單的字符串操作，不會遇到wchar_t的缺陷。如果僅因缺少tchar.h問題而改動代碼的話，那就成本太高了。
2. 無副作用。對於Linux等只有一套API的平台，可以取消UNICODE宏，這樣tchar.h會將TCHAR映射為char，使用傳統的窄字符串版函數。
3. 避免printf/wprintf混用時的Bug。printf與wprintf內部使用的是不同的緩沖區，混用會造成Bug。統一使用TCHAR能避免該bug。

1.3 字符串的格式控制字符問題

　　除了tchar.h問題外，在跨平台操作字符串時還會遇到格式控制問題。例如這些問題——
1. 在printf中使用哪種格式控制字符來輸出 char字符/字符串？
2. 在printf中使用哪種格式控制字符來輸出 wchar_t字符/字符串？
3. 在printf中使用哪種格式控制字符來輸出 TCHAR字符/字符串？
4. 在wprintf中使用哪種格式控制字符來輸出 char字符/字符串？
5. 在wprintf中使用哪種格式控制字符來輸出 wchar_t字符/字符串？
6. 在wprintf中使用哪種格式控制字符來輸出 TCHAR字符/字符串？

　　C99標准比較保守，不能完全解決上述問題。C99標准中對c、s僅存在“l”長度修正——沒“l”的是char字符串，有“l”的是wchar_t字符串。詳見C99標准的“7.24.2.1 The fwprintf function”。

　　VC++因為需要處理兩套字符串API，所以它對該問題的支持非常完善。VC++中上述6個問題的答案是——
1. hc/hs。
2. lc/ls。
3. c/s。
4. hc/hs。
5. lc/ls。
6. c/s。

　　對於BCB、MingGW等Windows平台上的編譯器，它們也兼容VC++的做法，支持這些格式控制字符。

　　而對於Linux等平台的gcc，它緊跟C99標准，不支持那么多格式控制字符。

　　我以前做過測試，詳見——
http://www.cnblogs.com/zyl910/archive/2012/07/30/wcharfmt.html
[C] wchar_t的格式控制字符（VC、BCB、GCC、C99標准）》

1.4 _tmain入口函數問題

　　標准C使用main函數作為程序入口，其格式為——
int main(int argc, char* argv[])

　　VC++考慮到到TCHAR類型的命令行參數，於是又定義_tmain程序入口，其格式為——
int _tmain(int argc, TCHAR* argv[])

　　目前VC++對_tmain的支持較好，而MinGW等編譯器對_tmain較差，有些只支持C標准的main。

二、解決方案

2.1 auto_tchar.h：使各種編譯器兼容tchar.h

　　我編寫了auto_tchar.h，它根據編譯預處理判斷該編譯器是否支持tchar.h。若支持，便包含編譯器的tchar.h；若不支持，則自己實現tchar.h，參考了 MinGW 的 tchar.h. http://www.mingw.org/。

　　在測試時發現，BCB6的tchar.h中沒有定義TCHAR，只定義了_TCHAR。TCHAR是在winnt.h中定義的。於是做了如下修正——

    // 修正BCB6的tchar.h只有_TCHAR卻沒有TCHAR的問題.
    #if defined(__BORLANDC__) && !defined(_TCHAR_DEFINED)
        typedef _TCHAR    TCHAR, *PTCHAR;
        typedef _TCHAR    TBYTE, *PTBYTE;
        #define _TCHAR_DEFINED
    #endif    // #if defined(__BORLANDC__) && !defined(_TCHAR_DEFINED)

　　使用方法——
1. 將“auto_tchar.h”放在項目的include目錄中。
2. 將原來的“#include <tchar.h>”改為“#include "auto_tchar.h"”。

2.2 prichar.h：解決字符串的格式控制字符問題

　　怎么解決各個編譯器對格式控制字符的差異呢？
　　我從C99標准的inttypes.h找到了靈感。inttypes.h定義了一系列PRI開頭的宏，解決了各種整數的格式控制字符問題。
　　我們也可以這樣做，編寫一個頭文件，里面定義了一系列字符串的PRI宏。同時利用編譯預處理判斷各種編譯器，定義合適的常量。

　　我編寫了prichar.h，定義了這些宏——
SCNcA
SCNsA
SCNcW
SCNsW
SCNcT
SCNsT
PRIcA
PRIsA
PRIcW
PRIsW
PRIcT
PRIsT

　　前綴含義——
PRI: print, 輸出.
SCN: scan, 輸入.

　　中綴含義——
c: char, 字符.
s: string, 字符串.

　　后綴含義——
A: char, 窄字符版.
W: wchar_t, 寬字符版.
T: TCHAR, TCHAR版.

　　使用方法——
1. 將“prichar.h”放在項目的include目錄中。
2. 包含該頭文件（#include "prichar.h"）。
3. 代碼示例——

char* psa = "A漢字ABC_Welcome_歡迎_ようこそ_환영.";
wchar_t* psw = L"W漢字ABC_Welcome_歡迎_ようこそ_환영.";
TCHAR* pst = _T("T漢字ABC_Welcome_歡迎_ようこそ_환영.");

    _tprintf(_T("%")_T(PRIsA)_T("\n"), psa);    // 輸出窄字符串.
    _tprintf(_T("%")_T(PRIsW)_T("\n"), psw);    // 輸出寬字符串.
    _tprintf(_T("%")_T(PRIsT)_T("\n"), pst);    // 輸出TCHAR字符串.

　　注：必須多次使用“_T”宏，不能省略。如果將格式字符串寫成“_T("%"PRIsA"\n")”，在編譯Unicode版時，編譯器將其會展開為“L"%" "hs" "\n"”，然后報告寬字符串不能與窄字符串串聯錯誤（例如VC++報告“error C2308: 串聯不匹配的字符串”）。

2.3 auto_tmain.h：解決_tmain入口函數問題

　　根據編譯預處理判斷該編譯器是否支持_tmain。若支持，便不做額外處理；若不支持，則做一些處理使其支持_tmain。
　　參考了 https://github.com/coderforlife/mingw-unicode-main/blob/master/mingw-unicode.c

　　使用方法——
1. 將“auto_tmain.h”放在項目的include目錄中。
2. 在主源文件包含該頭文件（#include "auto_tmain.h"）。
3. 現在_tmain能正常使用了（int _tmain(int argc, TCHAR* argv[])）。

三、模塊源碼

3.1 auto_tchar.h

　　全部代碼——

auto_tchar.h

////////////////////////////////////////////////////////////
/*
auto_tchar.h: 使各種編譯器兼容tchar.h .
Author: zyl910
Blog: http://www.cnblogs.com/zyl910
URL: http://www.cnblogs.com/zyl910/archive/2013/01/17/tcharall.html
Version: V1.00
Updata: 2013-01-17


測試過的編譯器--
VC: 6, 2003, 2005, 2008, 2010, 2012.
BCB: 6.
GCC: 4.7.1(MinGW-w64), 4.7.0(Fedora 17), 4.6.2(MinGW), llvm-gcc-4.2(Mac OS X Lion 10.7.4, Xcode 4.4.1).


Update
~~~~~~

[2013-01-17] V1.00
* V1.0發布.
* 為了避免包含目錄問題，更名auto_tchar.h（原tchar.h）.
* 解決BCB6的TCHAR問題（tchar.h中沒有定義TCHAR，只定義了_TCHAR。TCHAR是在winnt.h中定義的）.

[2012-11-08] V0.01
* 初步完成.
* 參考了 MinGW 的 tchar.h. http://www.mingw.org/


*/
////////////////////////////////////////////////////////////


#ifndef __AUTO_TCHAR_H_INCLUDED
#define __AUTO_TCHAR_H_INCLUDED

// __AUTO_TCHAR_H_USESYS: 判斷編譯器是否提供了<tchar.h>
#undef __AUTO_TCHAR_H_USESYS
#if defined(_MSC_VER)    // MSVC.
    #define __AUTO_TCHAR_H_USESYS
#elif defined(__BORLANDC__)    // BCB.
    #define __AUTO_TCHAR_H_USESYS
#elif defined(_WIN32)||defined(_WIN64)||defined(__MINGW32__)||defined(__MINGW64__)||defined(__CYGWIN__)
    // 假定Windows平台的編譯器均支持<tchar.h>.    
    #define __AUTO_TCHAR_H_USESYS
#else
    // 假設其他編譯器不支持<tchar.h>.
#endif    // __AUTO_TCHAR_H_USESYS


#ifdef __AUTO_TCHAR_H_USESYS
// 使用編譯器提供的tchar.h .
    #include <tchar.h>
    // 修正BCB6的tchar.h只有_TCHAR卻沒有TCHAR的問題.
    #if defined(__BORLANDC__) && !defined(_TCHAR_DEFINED)
        typedef _TCHAR    TCHAR, *PTCHAR;
        typedef _TCHAR    TBYTE, *PTBYTE;
        #define _TCHAR_DEFINED
    #endif    // #if defined(__BORLANDC__) && !defined(_TCHAR_DEFINED)
#else
// 采用自定義的tchar.h. 參考了 MinGW 的 tchar.h. http://www.mingw.org/

#ifndef    _TCHAR_H_
#define _TCHAR_H_

///* All the headers include this file. */
//#include <_mingw.h>

/*
 * NOTE: This tests _UNICODE, which is different from the UNICODE define
 *       used to differentiate Win32 API calls.
 */
#ifdef    _UNICODE

/*
 * Include <wchar.h> for wchar_t and WEOF if _UNICODE.
 */
#include <wchar.h>

/*
 * Use TCHAR instead of char or wchar_t. It will be appropriately translated
 * if _UNICODE is correctly defined (or not).
 */
#ifndef _TCHAR_DEFINED
#ifndef RC_INVOKED
typedef    wchar_t    TCHAR;
typedef wchar_t _TCHAR;
#endif    /* Not RC_INVOKED */
#define _TCHAR_DEFINED
#endif

/*
 * Use _TEOF instead of EOF or WEOF. It will be appropriately translated if
 * _UNICODE is correctly defined (or not).
 */
#define _TEOF WEOF

/*
 * __TEXT is a private macro whose specific use is to force the expansion of a
 * macro passed as an argument to the macros _T or _TEXT.  DO NOT use this
 * macro within your programs.  It's name and function could change without
 * notice.
 */
#define    __TEXT(q)    L##q

/*  for porting from other Windows compilers */
#if 0  /* no  wide startup module */
#define _tmain      wmain
#define _tWinMain   wWinMain
#define _tenviron   _wenviron
#define __targv     __wargv
#endif

/*
 * Unicode functions
 */
#define    _tprintf    wprintf
#define    _ftprintf    fwprintf
#define    _stprintf    swprintf
#define    _sntprintf    _snwprintf
#define    _vtprintf    vwprintf
#define    _vftprintf    vfwprintf
#define _vstprintf    vswprintf
#define    _vsntprintf    _vsnwprintf
#define    _vsctprintf    _vscwprintf
#define    _tscanf        wscanf
#define    _ftscanf    fwscanf
#define    _stscanf    swscanf
#define    _fgettc        fgetwc
#define    _fgettchar    _fgetwchar
#define    _fgetts        fgetws
#define    _fputtc        fputwc
#define    _fputtchar    _fputwchar
#define    _fputts        fputws
#define    _gettc        getwc
#define    _getts        _getws
#define    _puttc        putwc
#define _puttchar       putwchar
#define    _putts        _putws
#define    _ungettc    ungetwc
#define    _tcstod        wcstod
#define    _tcstol        wcstol
#define _tcstoul    wcstoul
#define    _itot        _itow
#define    _ltot        _ltow
#define    _ultot        _ultow
#define    _ttoi        _wtoi
#define    _ttol        _wtol
#define    _tcscat        wcscat
#define _tcschr        wcschr
#define _tcscmp        wcscmp
#define _tcscpy        wcscpy
#define _tcscspn    wcscspn
#define    _tcslen        wcslen
#define    _tcsncat    wcsncat
#define    _tcsncmp    wcsncmp
#define    _tcsncpy    wcsncpy
#define    _tcspbrk    wcspbrk
#define    _tcsrchr    wcsrchr
#define _tcsspn        wcsspn
#define    _tcsstr        wcsstr
#define _tcstok        wcstok
#define    _tcsdup        _wcsdup
#define    _tcsicmp    _wcsicmp
#define    _tcsnicmp    _wcsnicmp
#define    _tcsnset    _wcsnset
#define    _tcsrev        _wcsrev
#define _tcsset        _wcsset
#define    _tcslwr        _wcslwr
#define    _tcsupr        _wcsupr
#define    _tcsxfrm    wcsxfrm
#define    _tcscoll    wcscoll
#define    _tcsicoll    _wcsicoll
#define    _istalpha    iswalpha
#define    _istupper    iswupper
#define    _istlower    iswlower
#define    _istdigit    iswdigit
#define    _istxdigit    iswxdigit
#define    _istspace    iswspace
#define    _istpunct    iswpunct
#define    _istalnum    iswalnum
#define    _istprint    iswprint
#define    _istgraph    iswgraph
#define    _istcntrl    iswcntrl
#define    _istascii    iswascii
#define _totupper    towupper
#define    _totlower    towlower
#define _tcsftime    wcsftime
/* Macro functions */ 
#define _tcsdec     _wcsdec
#define _tcsinc     _wcsinc
#define _tcsnbcnt   _wcsncnt
#define _tcsnccnt   _wcsncnt
#define _tcsnextc   _wcsnextc
#define _tcsninc    _wcsninc
#define _tcsspnp    _wcsspnp
#define _wcsdec(_wcs1, _wcs2) ((_wcs1)>=(_wcs2) ? NULL : (_wcs2)-1)
#define _wcsinc(_wcs)  ((_wcs)+1)
#define _wcsnextc(_wcs) ((unsigned int) *(_wcs))
#define _wcsninc(_wcs, _inc) (((_wcs)+(_inc)))
#define _wcsncnt(_wcs, _cnt) ((wcslen(_wcs)>_cnt) ? _count : wcslen(_wcs))
#define _wcsspnp(_wcs1, _wcs2) ((*((_wcs1)+wcsspn(_wcs1,_wcs2))) ? ((_wcs1)+wcsspn(_wcs1,_wcs2)) : NULL)

#if 1  /* defined __MSVCRT__ */
/*
 *   These wide functions not in crtdll.dll.
 *   Define macros anyway so that _wfoo rather than _tfoo is undefined
 */
#define _ttoi64     _wtoi64
#define _i64tot     _i64tow
#define _ui64tot    _ui64tow
#define    _tasctime    _wasctime
#define    _tctime        _wctime
#if __MSVCRT_VERSION__ >= 0x0800
#define    _tctime32    _wctime32
#define    _tctime64    _wctime64
#endif /* __MSVCRT_VERSION__ >= 0x0800 */
#define    _tstrdate    _wstrdate
#define    _tstrtime    _wstrtime
#define    _tutime        _wutime
#if __MSVCRT_VERSION__ >= 0x0800
#define    _tutime64    _wutime64
#define    _tutime32    _wutime32
#endif /* __MSVCRT_VERSION__ > 0x0800 */
#define _tcsnccoll  _wcsncoll
#define _tcsncoll   _wcsncoll
#define _tcsncicoll _wcsnicoll
#define _tcsnicoll  _wcsnicoll
#define _taccess    _waccess
#define _tchmod     _wchmod
#define _tcreat     _wcreat
#define _tfindfirst _wfindfirst
#define _tfindnext  _wfindnext
#if __MSVCRT_VERSION__ >= 0x0800
#define _tfindfirst64 _wfindfirst64
#define _tfindfirst32 _wfindfirst32
#define _tfindnext64  _wfindnext64
#define _tfindnext32  _wfindnext32
#endif /* __MSVCRT_VERSION__ > 0x0800 */
#define _tfdopen    _wfdopen
#define _tfopen     _wfopen
#define _tfreopen   _wfreopen
#define _tfsopen    _wfsopen
#define _tgetenv    _wgetenv
#define _tputenv    _wputenv
#define _tsearchenv _wsearchenv
#define  _tsystem    _wsystem
#define _tmakepath  _wmakepath
#define _tsplitpath _wsplitpath
#define _tfullpath  _wfullpath
#define _tmktemp    _wmktemp
#define _topen      _wopen
#define _tremove    _wremove
#define _trename    _wrename
#define _tsopen     _wsopen
#define _tsetlocale _wsetlocale
#define _tunlink    _wunlink
#define _tfinddata_t    _wfinddata_t
#define _tfindfirsti64  _wfindfirsti64
#define _tfindnexti64   _wfindnexti64
#define _tfinddatai64_t _wfinddatai64_t
#if __MSVCRT_VERSION__ >= 0x0601
#define _tfinddata64_t    _wfinddata64_t
#endif
#if __MSVCRT_VERSION__ >= 0x0800
#define _tfinddata32_t    _wfinddata32_t
#define _tfinddata32i64_t _wfinddata32i64_t
#define _tfinddata64i32_t _wfinddata64i32_t
#define _tfindfirst32i64  _wfindfirst32i64
#define _tfindfirst64i32  _wfindfirst64i32
#define _tfindnext32i64   _wfindnext32i64
#define _tfindnext64i32   _wfindnext64i32
#endif /* __MSVCRT_VERSION__ > 0x0800 */
#define _tchdir        _wchdir
#define _tgetcwd    _wgetcwd
#define _tgetdcwd    _wgetdcwd
#define _tmkdir        _wmkdir
#define _trmdir        _wrmdir
#define _tstat        _wstat
#define _tstati64    _wstati64
#define _tstat64    _wstat64
#if __MSVCRT_VERSION__ >= 0x0800
#define _tstat32    _wstat32
#define _tstat32i64    _wstat32i64
#define _tstat64i32    _wstat64i32
#endif /* __MSVCRT_VERSION__ > 0x0800 */
#endif  /* __MSVCRT__ */

/* dirent structures and functions */
#define _tdirent    _wdirent
#define _TDIR         _WDIR
#define _topendir    _wopendir
#define _tclosedir    _wclosedir
#define _treaddir    _wreaddir
#define _trewinddir    _wrewinddir
#define _ttelldir    _wtelldir
#define _tseekdir    _wseekdir

#else    /* Not _UNICODE */

/*
 * TCHAR, the type you should use instead of char.
 */
#ifndef _TCHAR_DEFINED
#ifndef RC_INVOKED
typedef char    TCHAR;
typedef char    _TCHAR;
#endif
#define _TCHAR_DEFINED
#endif

/*
 * _TEOF, the constant you should use instead of EOF.
 */
#define _TEOF EOF

/*
 * __TEXT is a private macro whose specific use is to force the expansion of a
 * macro passed as an argument to the macros _T or _TEXT.  DO NOT use this
 * macro within your programs.  It's name and function could change without
 * notice.
 */
#define    __TEXT(q)    q

/*  for porting from other Windows compilers */
#define _tmain      main
#define _tWinMain   WinMain
#define _tenviron  _environ
#define __targv     __argv

/*
 * Non-unicode (standard) functions
 */

#define    _tprintf    printf
#define _ftprintf    fprintf
#define    _stprintf    sprintf
#define    _sntprintf    _snprintf
#define    _vtprintf    vprintf
#define    _vftprintf    vfprintf
#define _vstprintf    vsprintf
#define    _vsntprintf    _vsnprintf
#define    _vsctprintf    _vscprintf
#define    _tscanf        scanf
#define    _ftscanf    fscanf
#define    _stscanf    sscanf
#define    _fgettc        fgetc
#define    _fgettchar    _fgetchar
#define    _fgetts        fgets
#define    _fputtc        fputc
#define    _fputtchar    _fputchar
#define    _fputts        fputs
#define _tfdopen    _fdopen
#define    _tfopen        fopen
#define _tfreopen    freopen
#define    _tfsopen    _fsopen
#define    _tgetenv    getenv
#define    _tputenv    _putenv
#define    _tsearchenv    _searchenv
#define  _tsystem       system
#define    _tmakepath    _makepath
#define    _tsplitpath    _splitpath
#define    _tfullpath    _fullpath
#define    _gettc        getc
#define    _getts        gets
#define    _puttc        putc
#define _puttchar       putchar
#define    _putts        puts
#define    _ungettc    ungetc
#define    _tcstod        strtod
#define    _tcstol        strtol
#define _tcstoul    strtoul
#define    _itot        _itoa
#define    _ltot        _ltoa
#define    _ultot        _ultoa
#define    _ttoi        atoi
#define    _ttol        atol
#define    _tcscat        strcat
#define _tcschr        strchr
#define _tcscmp        strcmp
#define _tcscpy        strcpy
#define _tcscspn    strcspn
#define    _tcslen        strlen
#define    _tcsncat    strncat
#define    _tcsncmp    strncmp
#define    _tcsncpy    strncpy
#define    _tcspbrk    strpbrk
#define    _tcsrchr    strrchr
#define _tcsspn        strspn
#define    _tcsstr        strstr
#define _tcstok        strtok
#define    _tcsdup        _strdup
#define    _tcsicmp    _stricmp
#define    _tcsnicmp    _strnicmp
#define    _tcsnset    _strnset
#define    _tcsrev        _strrev
#define _tcsset        _strset
#define    _tcslwr        _strlwr
#define    _tcsupr        _strupr
#define    _tcsxfrm    strxfrm
#define    _tcscoll    strcoll
#define    _tcsicoll    _stricoll
#define    _istalpha    isalpha
#define    _istupper    isupper
#define    _istlower    islower
#define    _istdigit    isdigit
#define    _istxdigit    isxdigit
#define    _istspace    isspace
#define    _istpunct    ispunct
#define    _istalnum    isalnum
#define    _istprint    isprint
#define    _istgraph    isgraph
#define    _istcntrl    iscntrl
#define    _istascii    isascii
#define _totupper    toupper
#define    _totlower    tolower
#define    _tasctime    asctime
#define    _tctime        ctime
#if __MSVCRT_VERSION__ >= 0x0800
#define    _tctime32    _ctime32
#define    _tctime64    _ctime64
#endif /* __MSVCRT_VERSION__ >= 0x0800 */
#define    _tstrdate    _strdate
#define    _tstrtime    _strtime
#define    _tutime        _utime
#if __MSVCRT_VERSION__ >= 0x0800
#define    _tutime64    _utime64
#define    _tutime32    _utime32
#endif /* __MSVCRT_VERSION__ > 0x0800 */
#define _tcsftime    strftime
/* Macro functions */ 
#define _tcsdec     _strdec
#define _tcsinc     _strinc
#define _tcsnbcnt   _strncnt
#define _tcsnccnt   _strncnt
#define _tcsnextc   _strnextc
#define _tcsninc    _strninc
#define _tcsspnp    _strspnp
#define _strdec(_str1, _str2) ((_str1)>=(_str2) ? NULL : (_str2)-1)
#define _strinc(_str)  ((_str)+1)
#define _strnextc(_str) ((unsigned int) *(_str))
#define _strninc(_str, _inc) (((_str)+(_inc)))
#define _strncnt(_str, _cnt) ((strlen(_str)>_cnt) ? _count : strlen(_str))
#define _strspnp(_str1, _str2) ((*((_str1)+strspn(_str1,_str2))) ? ((_str1)+strspn(_str1,_str2)) : NULL)

#define _tchmod     _chmod
#define _tcreat     _creat
#define _tfindfirst _findfirst
#define _tfindnext  _findnext
#if __MSVCRT_VERSION__ >= 0x0800
#define _tfindfirst64 _findfirst64
#define _tfindfirst32 _findfirst32
#define _tfindnext64  _findnext64
#define _tfindnext32  _findnext32
#endif /* __MSVCRT_VERSION__ > 0x0800 */
#define _tmktemp    _mktemp
#define _topen      _open
#define _taccess    _access
#define _tremove    remove
#define _trename    rename
#define _tsopen     _sopen
#define _tsetlocale setlocale
#define _tunlink    _unlink
#define _tfinddata_t    _finddata_t
#define _tchdir        _chdir
#define _tgetcwd    _getcwd
#define _tgetdcwd   _getdcwd
#define _tmkdir        _mkdir
#define _trmdir        _rmdir
#define _tstat      _stat

#if 1  /* defined __MSVCRT__ */
/* Not in crtdll.dll. Define macros anyway? */
#define _ttoi64     _atoi64
#define _i64tot     _i64toa
#define _ui64tot    _ui64toa
#define _tcsnccoll  _strncoll
#define _tcsncoll   _strncoll
#define _tcsncicoll _strnicoll
#define _tcsnicoll  _strnicoll
#define _tfindfirsti64  _findfirsti64
#define _tfindnexti64   _findnexti64
#define _tfinddatai64_t _finddatai64_t
#if __MSVCRT_VERSION__ >= 0x0601
#define _tfinddata64_t    _finddata64_t
#endif
#if __MSVCRT_VERSION__ >= 0x0800
#define _tfinddata32_t    _finddata32_t
#define _tfinddata32i64_t _finddata32i64_t
#define _tfinddata64i32_t _finddata64i32_t
#define _tfindfirst32i64  _findfirst32i64
#define _tfindfirst64i32  _findfirst64i32
#define _tfindnext32i64   _findnext32i64
#define _tfindnext64i32   _findnext64i32
#endif /* __MSVCRT_VERSION__ > 0x0800 */
#define _tstati64   _stati64
#define _tstat64    _stat64
#if __MSVCRT_VERSION__ >= 0x0800
#define _tstat32    _stat32
#define _tstat32i64    _stat32i64
#define _tstat64i32    _stat64i32
#endif /* __MSVCRT_VERSION__ > 0x0800 */
#endif  /* __MSVCRT__ */

/* dirent structures and functions */
#define _tdirent    dirent
#define _TDIR         DIR
#define _topendir    opendir
#define _tclosedir    closedir
#define _treaddir    readdir
#define _trewinddir    rewinddir
#define _ttelldir    telldir
#define _tseekdir    seekdir

#endif    /* Not _UNICODE */

/*
 * UNICODE a constant string when _UNICODE is defined else returns the string
 * unmodified.  Also defined in w32api/winnt.h.
 */
#define _TEXT(x)    __TEXT(x)
#define    _T(x)        __TEXT(x)

#endif    /* Not _TCHAR_H_ */


#endif // #ifdef __AUTO_TCHAR_H_USESYS

#endif // #ifndef __AUTO_TCHAR_H_INCLUDED

3.2 prichar.h

　　全部代碼——

prichar.h

////////////////////////////////////////////////////////////
/*
prichar.h : 字符串的格式控制字符.
Author: zyl910
Blog: http://www.cnblogs.com/zyl910
URL: http://www.cnblogs.com/zyl910/archive/2013/01/17/tcharall.html
Version: V1.00
Updata: 2013-01-17


測試過的編譯器--
VC: 6, 2003, 2005, 2008, 2010, 2012.
BCB: 6.
GCC: 4.7.1(MinGW-w64), 4.7.0(Fedora 17), 4.6.2(MinGW), llvm-gcc-4.2(Mac OS X Lion 10.7.4, Xcode 4.4.1).


Update
~~~~~~

[2013-01-17] V1.00
* V1.0發布.


Manual
~~~~~~

參考了C99的“inttypes.h”，為字符類型設計的格式字符串。

前綴--
PRI: print, 輸出.
SCN: scan, 輸入.

中綴--
c: char, 字符.
s: string, 字符串.

后綴--
A: char, 窄字符版.
W: wchar_t, 寬字符版.
T: TCHAR, TCHAR版.


*/
////////////////////////////////////////////////////////////

#ifndef __PRICHAR_H_INCLUDED
#define __PRICHAR_H_INCLUDED


//#include "tchar.h"

#if defined __cplusplus
extern "C" {
#endif

////////////////////////////////////////
// char
////////////////////////////////////////

#if defined(_MSC_VER)||defined(__BORLANDC__)
    // VC、BCB 均支持hc/hs總是代表窄字符.
    #define PRIcA    "hc"
    #define PRIsA    "hs"
#elif defined(__GNUC__)||defined(_WIN32)||defined(_WIN64)
    // GCC的窄版函數有時無法識別hc/hs, 而寬版函數總是支持hc/hs. 假設其他Windows平台的編譯器也是這樣.
    #if defined(_UNICODE)
        #define PRIcA    "hc"
        #define PRIsA    "hs"
    #else
        #define PRIcA    "c"
        #define PRIsA    "s"
    #endif
#else
    // 假定其他平台只支持c/s.
    #define PRIcA    "c"
    #define PRIsA    "s"
#endif


////////////////////////////////////////
// wchar_t
////////////////////////////////////////

// C99標准規定lc/ls總是代表寬字符.
#define PRIcW    "lc"
#define PRIsW    "ls"


////////////////////////////////////////
// TCHAR
////////////////////////////////////////

#if defined(_WIN32)||defined(_WIN64)||defined(_MSC_VER)
    // VC、BCB、MinGW等Windows平台上的編譯器支持c為自適應, 對於窄字符函數是char, 對於寬字符函數是wchar_t.
    #define PRIcT    "c"
    #define PRIsT    "s"
#else
    // 其他平台.
    #if defined(_UNICODE)
        #define PRIcT    PRIcW
        #define PRIsT    PRIsW
    #else
        #define PRIcT    PRIcA
        #define PRIsT    PRIsA
    #endif
#endif


////////////////////////////////////////
// SCN
////////////////////////////////////////

#define SCNcA    PRIcA
#define SCNsA    PRIsA
#define SCNcW    PRIcW
#define SCNsW    PRIsW
#define SCNcT    PRIcT
#define SCNsT    PRIsT


#if defined __cplusplus
};
#endif

#endif    // #ifndef __PRICHAR_H_INCLUDED

3.3 auto_tmain.h

　　全部代碼——

auto_tmain.h

////////////////////////////////////////////////////////////
/*
auto_tmain.h : 使各種編譯器兼容_tmain .
Author: zyl910
Blog: http://www.cnblogs.com/zyl910
URL: http://www.cnblogs.com/zyl910/archive/2013/01/17/tcharall.html
Version: V1.00
Updata: 2013-01-17


Update
~~~~~~

[2013-01-17] V1.00
* V1.0發布.


Manual
~~~~~~

智能地使_tmain可用.
只需在主源文件中加上一行——
#include "auto_tmain.h"


兼容 VC、GCC、BCB。

參考了 https://github.com/coderforlife/mingw-unicode-main/blob/master/mingw-unicode.c

*/
////////////////////////////////////////////////////////////

#ifndef __AUTO_TMAIN_H_INCLUDED
#define __AUTO_TMAIN_H_INCLUDED

#if defined(__GNUC__) && defined(_UNICODE)

#ifndef __MSVCRT__
#error Unicode main function requires linking to MSVCRT
#endif

#include <wchar.h>
#include <stdlib.h>
#include "tchar.h"

#undef _tmain
#ifdef _UNICODE
#define _tmain wmain
#else
#define _tmain main
#endif


extern int _CRT_glob;
extern 
#ifdef __cplusplus
"C" 
#endif
void __wgetmainargs(int*,wchar_t***,wchar_t***,int,int*);

#ifdef MAIN_USE_ENVP
int wmain(int argc, wchar_t *argv[], wchar_t *envp[]);
#else
int wmain(int argc, wchar_t *argv[]);
#endif

int main(void)
{
    wchar_t **enpv, **argv;
    int argc=0, si = 0;
    __wgetmainargs(&argc, &argv, &enpv, _CRT_glob, &si); // this also creates the global variable __wargv
#ifdef MAIN_USE_ENVP
    return wmain(argc, argv, enpv);
#else
    return wmain(argc, argv);
#endif    // #ifdef MAIN_USE_ENVP
}

#endif    // #if defined(__GNUC__) && defined(_UNICODE)

#endif    // #ifndef __AUTO_TMAIN_H_INCLUDED

四、UTF-8編碼下的測試

4.1 說明

　　為了保證代碼的可移植性，推薦使用UTF-8編碼來保存代碼文件。
　　因現在Linux等類UNIX平台默認使用UTF-8編碼，gcc等編譯器也是默認使用UTF-8編碼。而且它們既支持“不帶BOM的UTF-8”（byte order mark，字節序標記），又支持“帶BOM的UTF-8”。
　　VC++ 2003（或更高）開始支持“帶BOM的UTF-8”編碼的代碼文件。但不支持“不帶BOM的UTF-8”編碼的代碼文件，會被誤認為系統默認編碼（如簡體中文平台上會誤認為GBK編碼）。

　　為了保證代碼文件能兼容更多的編譯器，我建議這樣做——
1. 對於源文件（c、cpp），使用“帶BOM的UTF-8”編碼，這樣能保證VC++、gcc等編譯器均能正確編譯。如果你確定程序中的字符串常量均在ASCII碼范圍內，也可嘗試“不帶BOM的UTF-8”編碼。
2. 對於頭文件（h、hpp），使用“不帶BOM的UTF-8”編碼。因為頭文件會在預處理階段包含到源代碼中，多余的BOM字符可能會造成編譯失敗。

　　在VC++中，若想改變代碼文件的編碼，便點擊菜單“文件”->“高級保存選項”，然后在“編碼”復選框中選擇所需編碼，再點擊“確定”。

4.2 測試代碼

　　文件清單——
auto_tchar.h
auto_tmain.h
makefile
prichar.h
Release
tcharall.c
tcharall_2003.sln
tcharall_2003.vcproj
tcharall_2005.sln
tcharall_2005.vcproj
tcharall_2008.sln
tcharall_2008.vcproj
tcharall_2010.sln
tcharall_2010.vcxproj
tcharall_2010.vcxproj.filters
tcharall_2010.vcxproj.user
tcharall_2012.sln
tcharall_2012.vcxproj
tcharall_2012.vcxproj.filters

　　其中tcharall.c使用“帶BOM的UTF-8”編碼，而3個頭文件使用“不帶BOM的UTF-8”編碼。

　　tcharall.c——

////////////////////////////////////////////////////////////
/*
tcharall.c : 測試各種編譯器使用tchar（UTF-8編碼）.
Author: zyl910
Blog: http://www.cnblogs.com/zyl910
URL: http://www.cnblogs.com/zyl910/archive/2013/01/17/tcharall.html
Version: V1.00
Updata: 2013-01-17


Update
~~~~~~

[2013-01-17] V1.00
* V1.0發布.

[2012-11-08] V0.01
* 初步完成.

*/
////////////////////////////////////////////////////////////

#include <stdio.h>
#include <locale.h>
#include <wchar.h>

#include "auto_tchar.h"

#include "prichar.h"
#include "auto_tmain.h"



// Compiler name
#define MACTOSTR(x)    #x
#define MACROVALUESTR(x)    MACTOSTR(x)
#if defined(__ICL)    // Intel C++
#  if defined(__VERSION__)
#    define COMPILER_NAME    "Intel C++ " __VERSION__
#  elif defined(__INTEL_COMPILER_BUILD_DATE)
#    define COMPILER_NAME    "Intel C++ (" MACROVALUESTR(__INTEL_COMPILER_BUILD_DATE) ")"
#  else
#    define COMPILER_NAME    "Intel C++"
#  endif    // #  if defined(__VERSION__)
#elif defined(_MSC_VER)    // Microsoft VC++
#  if defined(_MSC_FULL_VER)
#    define COMPILER_NAME    "Microsoft VC++ (" MACROVALUESTR(_MSC_FULL_VER) ")"
#  elif defined(_MSC_VER)
#    define COMPILER_NAME    "Microsoft VC++ (" MACROVALUESTR(_MSC_VER) ")"
#  else
#    define COMPILER_NAME    "Microsoft VC++"
#  endif    // #  if defined(_MSC_FULL_VER)
#elif defined(__GNUC__)    // GCC
#  if defined(__CYGWIN__)
#    define COMPILER_NAME    "GCC(Cygmin) " __VERSION__
#  elif defined(__MINGW32__)
#    define COMPILER_NAME    "GCC(MinGW) " __VERSION__
#  else
#    define COMPILER_NAME    "GCC " __VERSION__
#  endif    // #  if defined(__CYGWIN__)
#elif defined(__TURBOC__)    // Borland C++
#  if defined(__BCPLUSPLUS__)
#    define COMPILER_NAME    "Borland C++ (" MACROVALUESTR(__BCPLUSPLUS__) ")"
#  elif defined(__BORLANDC__)
#    define COMPILER_NAME    "Borland C (" MACROVALUESTR(__BORLANDC__) ")"
#  else
#    define COMPILER_NAME    "Turbo C (" MACROVALUESTR(__TURBOC__) ")"
#  endif    // #  if defined(_MSC_FULL_VER)
#else
#  define COMPILER_NAME    "Unknown Compiler"
#endif    // #if defined(__ICL)    // Intel C++


char* psa = "A漢字ABC_Welcome_歡迎_ようこそ_환영.";    // 后半段分別包含了 繁體中文、日文、韓文的“歡迎”.
wchar_t* psw = L"W漢字ABC_Welcome_歡迎_ようこそ_환영.";
TCHAR* pst = _T("T漢字ABC_Welcome_歡迎_ようこそ_환영.");


int _tmain(int argc, TCHAR* argv[])
{
    // init.
    setlocale(LC_ALL, "");    // 使用客戶環境的缺省locale.

    // title.
    _tprintf(_T("tcharall v1.00 (%dbit)\n"), (int)(8*sizeof(int*)));
    _tprintf(_T("Compiler: %")_T(PRIsA)_T("\n"), COMPILER_NAME);
    _tprintf(_T("\n"));

    // show
    _tprintf(_T("%")_T(PRIsA)_T("\n"), psa);    // 輸出窄字符串.
    _tprintf(_T("%")_T(PRIsW)_T("\n"), psw);    // 輸出寬字符串.
    _tprintf(_T("%")_T(PRIsT)_T("\n"), pst);    // 輸出TCHAR字符串.
    

    return 0;
}

　　makefile——

# flags
CC = gcc
CFS = -Wall

# args
RELEASE =0
UNICODE =0
BITS =
CFLAGS =

# [args] 生成模式. 0代表debug模式, 1代表release模式. make RELEASE=1.
ifeq ($(RELEASE),0)
    # debug
    CFS += -g
else
    # release
    CFS += -O3 -DNDEBUG
    //CFS += -O3 -g -DNDEBUG
endif

# [args] UNICODE模式. 0代表ansi模式, 1代表unicode模式. make UNICODE=1.
ifeq ($(UNICODE),0)
    # ansi
    CFS +=
else
    # unicode
    CFS += -D_UNICODE -DUNICODE
endif

# [args] 程序位數. 32代表32位程序, 64代表64位程序, 其他默認. make BITS=32.
ifeq ($(BITS),32)
    CFS += -m32
else
    ifeq ($(BITS),64)
        CFS += -m64
    else
    endif
endif

# [args] 使用 CFLAGS 添加新的參數. make CFLAGS="-mavx".
CFS += $(CFLAGS)


.PHONY : all clean

# files
TARGETS = tcharall
OBJS = tcharall.o

all : $(TARGETS)

tcharall : $(OBJS)
    $(CC) -o $@ $^ $(CFS)


tcharall.o : tcharall.c
    $(CC) -c $< $(CFS)


clean :
    rm -f $(OBJS) $(TARGETS) $(addsuffix .exe,$(TARGETS))

4.3 測試結果

　　在以下編譯器中成功編譯——
VC2003：x86版。Unicode=0。
VC2005：x86版、x64版。Unicode=1。
VC2008：x86版。Unicode=1。
VC2010：x86版、x64版。Unicode=1。
VC2012：x86版、x64版。Unicode=1。
GCC 4.6.2（MinGW(20120426)）：x86版。Unicode=0、Unicode=1。
GCC 4.7.1（TDM-GCC(MinGW-w64)）：x64版。Unicode=0、Unicode=1。
GCC 4.7.0（Fedora 17 x64）：x86版、x64版。Unicode=0。
llvm-gcc-4.2（Mac OS X Lion 10.7.4, Xcode 4.4.1）：x86版、x64版。Unicode=0。

　　測試結果——

【VC2003，Unicode=0】
tcharall v1.00 (32bit)
Compiler: Microsoft VC++ (13106030)

A奼夊瓧ABC_Welcome_姝¤繋_銈堛亞銇撱仢_頇橃榿.
W漢字ABC_Welcome_歡迎_ようこそ_
T奼夊瓧ABC_Welcome_姝¤繋_銈堛亞銇撱仢_頇橃榿.


【VC2005，Unicode=1】
tcharall v1.00 (32bit)
Compiler: Microsoft VC++ (140050727)

A漢字ABC_Welcome_歡迎_ようこそ_??.
W漢字ABC_Welcome_歡迎_ようこそ_??.
T漢字ABC_Welcome_歡迎_ようこそ_??.


【VC2008，Unicode=1】
tcharall v1.00 (64bit)
Compiler: Microsoft VC++ (160040219)

A漢字ABC_Welcome_歡迎_ようこそ_??.
W漢字ABC_Welcome_歡迎_ようこそ_??.
T漢字ABC_Welcome_歡迎_ようこそ_??.


【VC2010，Unicode=1】
tcharall v1.00 (64bit)
Compiler: Microsoft VC++ (160040219)

A漢字ABC_Welcome_歡迎_ようこそ_??.
W漢字ABC_Welcome_歡迎_ようこそ_??.
T漢字ABC_Welcome_歡迎_ようこそ_??.


【VC2012，Unicode=1】
tcharall v1.00 (64bit)
Compiler: Microsoft VC++ (170051106)

A漢字ABC_Welcome_歡迎_ようこそ_??.
W漢字ABC_Welcome_歡迎_ようこそ_??.
T漢字ABC_Welcome_歡迎_ようこそ_??.


【GCC 4.6.2（MinGW (20120426)），Unicode=0】
tcharall v1.00 (32bit)
Compiler: GCC(MinGW) 4.6.2

A奼夊瓧ABC_Welcome_姝¤繋_銈堛亞銇撱仢_頇橃榿.
W漢字ABC_Welcome_歡迎_ようこそ_
T奼夊瓧ABC_Welcome_姝¤繋_銈堛亞銇撱仢_頇橃榿.


【GCC 4.6.2（MinGW (20120426)），Unicode=1】
tcharall v1.00 (32bit)
Compiler: GCC(MinGW) 4.6.2

A奼夊瓧ABC_Welcome_姝¤繋_銈堛亞銇撱仢_頇橃榿.
W漢字ABC_Welcome_歡迎_ようこそ_    T漢字ABC_Welcome_歡迎_ようこそ_


【GCC 4.7.1（TDM-GCC(MinGW-w64)），Unicode=0】
tcharall v1.00 (64bit)
Compiler: GCC(MinGW) 4.7.1

A奼夊瓧ABC_Welcome_姝¤繋_銈堛亞銇撱仢_頇橃榿.
W漢字ABC_Welcome_歡迎_ようこそ_T奼夊瓧ABC_Welcome_姝¤繋_銈堛亞銇撱仢_頇橃榿.


【GCC 4.7.1（TDM-GCC(MinGW-w64)），Unicode=1】
tcharall v1.00 (64bit)
Compiler: GCC(MinGW) 4.7.1

A奼夊瓧ABC_Welcome_姝¤繋_銈堛亞銇撱仢_頇橃榿.
W漢字ABC_Welcome_歡迎_ようこそ_.
T漢字ABC_Welcome_歡迎_ようこそ_.


【GCC 4.7.0（Fedora 17 x64），Unicode=0】
tcharall v1.00 (64bit)
Compiler: GCC 4.7.0 20120507 (Red Hat 4.7.0-5)

A漢字ABC_Welcome_歡迎_ようこそ_환영.
W漢字ABC_Welcome_歡迎_ようこそ_환영.
T漢字ABC_Welcome_歡迎_ようこそ_환영.


【llvm-gcc-4.2（Mac OS X Lion 10.7.4, Xcode 4.4.1），Unicode=0】
tcharall v1.00 (64bit)
Compiler: GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)

A漢字ABC_Welcome_歡迎_ようこそ_환영.
W漢字ABC_Welcome_歡迎_ようこそ_환영.
T漢字ABC_Welcome_歡迎_ようこそ_환영.

4.4 測試結果分析

　　VC2003不支持執行字符集（execution character set）轉換，對於窄字符串常量，它直接使用源文件中的UTF-8編碼的字符串常量，而現在系統默認字符集是GBK（簡體中文系統），導致出現“A奼夊瓧ABC_Welcome_姝¤繋_銈堛亞銇撱仢_頇橃榿.”這樣的亂碼。
　　從VC2005開始支持執行字符集轉換，對於窄字符串常量，它會將源文件中的UTF-8編碼字符串，轉成執行字符集（簡體中文系統下是GBK）的字符串常量。於是能正常顯示包含中文的窄字符串。
　　由於簡體中文Windows平台默認使用GBK編碼，韓文“환영”不能轉為GBK編碼，於是輸出“??”。

　　MinGW和MinGW-w64也是存在窄字符串亂碼問題，這是因為它的執行字符集默認為UTF-8編碼，該問題將在下一節詳細討論。
　　現在主要關注寬字符串的輸出。MinGW和MinGW-w64對於能轉碼為窄字符串的（“W漢字ABC_Welcome_歡迎_ようこそ_”能轉為GBK編碼），能正常輸出；但對於不能轉碼為窄字符串的（韓文“환영”不能轉為GBK編碼），會停止輸出，這時MinGW與MinGW-w64存在細微差別——
a) 當編譯為窄字符版時（不定義UNICODE宏，使用printf等窄字符版函數）：MinGW會停止輸出，但能正確換行。而MinGW-w64不僅會停止輸出，而且不能正確換行。
b) 當編譯為寬字符版時（定義UNICODE宏，使用wprintf等寬字符版函數）：MinGW不僅會停止輸出，而且不能正確換行。而MinGW-w64會停止輸出，但能正確換行。

　　Linux和Mac平台默認使用UTF-8編碼，所以能同時顯示英文、中文、日文、韓文，完美的顯示了多國語言。具體細節——
a) 對於窄字符串常量。因為現在的代碼文件是以UTF-8編碼保存的，所以窄字符串常量也是UTF-8編碼。程序運行輸出窄字符串時，終端也是UTF-8編碼，編碼匹配正常輸出。
b) 對於寬字符串常量。編譯器將UTF-8編碼變為UTF-32編碼，生成寬字符串常量。程序運行輸出寬字符串時，因終端是UTF-8編碼，C標准庫將“UTF-32的寬字符串”轉為“UTF-8編碼的窄字符串”再輸出，編碼匹配正常輸出。

　　小結——
1. Linux和Mac等Linux平台默認使用UTF-8編碼，能在終端中完美顯示多國語言。
2. Windows平台的控制台程序默認使用本地編碼（簡體中文系統下是GBK），所以只能顯示本地編碼范圍內的文字。對於范圍外的文字，VC++的庫函數選擇輸出“?”號，而MinGW的庫函數選擇停止輸出。

4.5 解決MinGW窄字符串亂碼問題

　　前面提到MinGW和MinGW-w64的執行字符集默認為UTF-8編碼，而Windows下系統默認字符集是GBK（簡體中文系統），造成輸出窄字符串時亂碼。
　　該問題有兩種解決辦法——
1. 修改命令提示符的編碼為UTF-8。
2. 讓MinGW生成GBK編碼的窄字符串。

4.5.1 修改命令提示符的編碼為UTF-8

　　打開命令提示符，執行以下命令——
chcp 65001
注：chcp命令用於改變命令提示符的代碼頁。65001是UTF-8的代碼頁。

　　設置好編碼后，還需要設置字體，這樣才能正確顯示文字。
　　於是在命令提示符的標題欄上點擊鼠標右鍵，選擇快捷菜單中的“屬性”，打開命令提示符屬性對話框。
　　切換到“字體”頁面，選擇“Lucida Console”字體。然后點擊“確定”保存配置。若會出現對話框，繼續點“確定”。

　　自此便設置好了UTF-8編碼的命令提示符環境，我們可以運行先前MinGW編譯好的可執行文件，測試結果——

tcharall v1.00 (32bit)
Compiler: GCC(MinGW) 4.6.2

A漢字ABC_Welcome_歡迎_ようこそ_환영.
WººؖABC_Welcome_gӭ_¤褦¤³¤½_
T漢字ABC_Welcome_歡迎_ようこそ_환영.

　　可見，窄字符串成功輸出全部的字符。只是“Lucida Console”字體不支持韓文而顯示為方框。
　　但意外的是，寬字符串卻變成了亂碼。這時因為C函數庫仍將寬字符轉為GBK編碼的窄字符串，而現在實際上是使用UTF-8編碼的窄字符串，造成亂碼。深入分析見下一小節。

　　測試完成后，我們應該輸入“chcp 936”命令，將命令提示符的代碼頁改回gbk編碼。

4.5.1.1 UTF-8命令提示符亂碼問題的深入分析

　　當使用chcp命令改變命令提示符的代碼頁時，它會調用SetConsoleCP、SetConsoleOutputCP這兩個Windows API分別設置命令提示符輸入、輸出的代碼頁（65001：UTF-8）。
　　但是，活動代碼頁（Active Codepage，ACP）並沒有發生編碼，GetACP的返回值仍是原值（936：簡體中文GBK）。

　　當輸出寬字符串時，C庫函數會將寬字符串轉為窄字符串。因為現在調用了“setlocale(LC_ALL, "")”使用客戶環境的缺省locale，C庫函數會調用WideCharToMultiByte這個Windows API進行編碼轉換，代碼頁用的是CP_ACP，即使用GetACP的返回值（936：簡體中文GBK）做代碼頁。於是將寬字符串轉為了GBK編碼的窄字符串。
　　但是現在命令提示符輸入、輸出用的是UTF-8編碼（GetConsoleCP、GetConsoleOutputCP的返回值是65001）。編碼不匹配，造成亂碼。

4.5.2 讓MinGW生成GBK編碼的窄字符串

　　給gcc加上“-fexec-charset=<charset>”參數，能夠設置執行字符集。

　　簡體中文系統下默認是GBK編碼，應該使用“-fexec-charset=GBK”參數。
　　但在實際編譯時，gcc報告編譯錯誤——
gcc -c tcharall.c -Wall -g -fexec-charset=GBK
tcharall.c:74:13: error: converting to execution character set: Illegal byte sequence
tcharall.c:76:65: error: converting to execution character set: Illegal byte sequence
make: *** [tcharall.o] Error 1

　　這是因為源碼中含有韓文字符，它不在GBK編碼范圍內，gcc無法轉換編碼。這時得找一個編碼范圍更大的編碼了。

　　簡單介紹一下漢字編碼標准——
GB2312：這是最早的國標漢字標准，采用雙字節編碼，收錄了6763個簡體漢字。
GB13000.1：此標准等同國際標准ISO/IEC 10646.1:1993《信息技術通用多八位編碼字符集（UCS）第一部分：體系結構與基本多文種平面》中的CJK（中日韓統一漢字）子集。該標准專注於漢字的收錄，共包含了20902個漢字（簡體、繁體、日本、朝鮮常用漢字的統一收錄）。
GBK：它是對GB13000.1標准的具體編碼實現。它向下兼容GB2312編碼，仍是采用雙字節編碼，但擴大了編碼空間，以存放2萬多漢字。簡體中文Windows系統使用的就是GBK編碼，所以能同時使用簡體漢字與繁體漢字。
GB18030：這是最新的漢字編碼標准。它向下兼容GBK、GB2312編碼，除了傳統的雙字節編碼外，還增加四字節編碼方案，將編碼空間擴展了260萬。它又收錄了CJK擴充A區、CJK擴充B區等漢字，目前共收錄了70244個漢字。它不僅收錄了漢字，而且還映射了Unicode中的非漢字字符，例如支持韓文字符。

　　所以我們可以使用GB18030編碼，給gcc加上“-fexec-charset=GB18030”參數。
　　測試結果——

　　因簡體中文系統下默認是GBK編碼，GB18030的四字節編碼不能正常顯示，變成了“?”號。
　　一般情況下不會超過GBK編碼范圍，所以該辦法是有效的。

五、GBK編碼下的測試

5.1 說明

　　某些舊編譯器不支持UTF-8編碼，這時只能用本地的默認編碼了。因我用的是簡體中文版的Windows，所以代碼文件的默認編碼是GBK。

　　當代碼文件不是UTF-8時，為了避免亂碼，需要正確的配置輸入字符集與執行字符集——
輸入字符集（input character set）：編譯器使用何種編碼將源文件中的內容轉為Unicode。vc（vc2005或更高版本）根據BOM標記識別輸入字符集，若沒有BOM標記，就使用本地編碼（936：GBK）。gcc默認是UTF-8，使用“-finput-charset=<charset>”參數進行配置。
執行字符集（execution character set）：編譯器使用何種編碼將Unicode字符串轉為窄字符串。vc默認使用本地編碼（936：GBK），vc2010（或更高版本）可在源代碼中寫上“#pragma execution_character_set("utf-8")”進行配置。gcc默認是UTF-8，使用“-fexec-charset=<charset>”參數進行配置。

　　對於VC++，只需將代碼文件保存為本地默認編碼就行了。這正是VC++保存代碼文件時的默認行為。若編碼不符，可點擊菜單“文件”->“高級保存選項”改變編碼。
　　對於gcc，因它的輸入字符集、執行字符集都是UTF-8，所以都要設置。即給gcc加上“-finput-charset=gbk -fexec-charset=gbk”參數。

　　注意源文件與頭文件都要統一使用同一種編碼，否則可能會因編碼不一致而無法編譯。例如gcc會報告以下錯誤——
tcharall_gbk.c:22:19: error: failure to convert gbk to UTF-8
tcharall_gbk.c:24:24: error: failure to convert gbk to UTF-8
tcharall_gbk.c:62:1: error: unknown type name 'TCHAR'

　　當使用“\u”轉義符時，建議給gcc加上“-std=c99”參數，否則會出現以下警告——
tcharall_gbk.c:61:16: warning: universal character names are only valid in C++ and C99 [enabled by default]

5.2 測試代碼

　　文件清單——
auto_tchar.h
auto_tmain.h
makefile
prichar.h
tcharall_gbk.c
tcharall_gbk.dsp
tcharall_gbk.dsw
tcharall_gbk_2003.sln
tcharall_gbk_2003.vcproj
tcharall_gbk_2005.sln
tcharall_gbk_2005.vcproj
tcharall_gbk_bcb6.bpf
tcharall_gbk_bcb6.bpr
tcharall_gbk_bcb6.res

　　其中tcharall_gbk.c和3個頭文件使用GBK編碼。

　　tcharall_gbk.c（因GBK不支持韓文字符，字符串常量稍有改動）——

////////////////////////////////////////////////////////////
/*
tcharall.c : 測試各種編譯器使用tchar（GBK編碼）.
Author: zyl910
Blog: http://www.cnblogs.com/zyl910
URL: http://www.cnblogs.com/zyl910/archive/2013/01/17/tcharall.html
Version: V1.00
Updata: 2013-01-17


Update
~~~~~~

[2013-01-17] V1.00
* V1.0發布.

[2012-11-08] V0.01
* 初步完成.

*/
////////////////////////////////////////////////////////////

#include <stdio.h>
#include <locale.h>
#include <wchar.h>

#include "auto_tchar.h"
#include "prichar.h"

#include "auto_tmain.h"


// Compiler name
#define MACTOSTR(x)    #x
#define MACROVALUESTR(x)    MACTOSTR(x)
#if defined(__ICL)    // Intel C++
#  if defined(__VERSION__)
#    define COMPILER_NAME    "Intel C++ " __VERSION__
#  elif defined(__INTEL_COMPILER_BUILD_DATE)
#    define COMPILER_NAME    "Intel C++ (" MACROVALUESTR(__INTEL_COMPILER_BUILD_DATE) ")"
#  else
#    define COMPILER_NAME    "Intel C++"
#  endif    // #  if defined(__VERSION__)
#elif defined(_MSC_VER)    // Microsoft VC++
#  if defined(_MSC_FULL_VER)
#    define COMPILER_NAME    "Microsoft VC++ (" MACROVALUESTR(_MSC_FULL_VER) ")"
#  elif defined(_MSC_VER)
#    define COMPILER_NAME    "Microsoft VC++ (" MACROVALUESTR(_MSC_VER) ")"
#  else
#    define COMPILER_NAME    "Microsoft VC++"
#  endif    // #  if defined(_MSC_FULL_VER)
#elif defined(__GNUC__)    // GCC
#  if defined(__CYGWIN__)
#    define COMPILER_NAME    "GCC(Cygmin) " __VERSION__
#  elif defined(__MINGW32__)
#    define COMPILER_NAME    "GCC(MinGW) " __VERSION__
#  else
#    define COMPILER_NAME    "GCC " __VERSION__
#  endif    // #  if defined(__CYGWIN__)
#elif defined(__TURBOC__)    // Borland C++
#  if defined(__BCPLUSPLUS__)
#    define COMPILER_NAME    "Borland C++ (" MACROVALUESTR(__BCPLUSPLUS__) ")"
#  elif defined(__BORLANDC__)
#    define COMPILER_NAME    "Borland C (" MACROVALUESTR(__BORLANDC__) ")"
#  else
#    define COMPILER_NAME    "Turbo C (" MACROVALUESTR(__TURBOC__) ")"
#  endif    // #  if defined(_MSC_FULL_VER)
#else
#  define COMPILER_NAME    "Unknown Compiler"
#endif    // #if defined(__ICL)    // Intel C++


char* psa = "A漢字ABC_Welcome_歡迎_ようこそ.";
wchar_t* psw = L"W漢字ABC_Welcome_歡迎_ようこそ_\uD658\uC601.";    // \uD658\uC601是韓文歡迎.
TCHAR* pst = _T("T漢字ABC_Welcome_歡迎_ようこそ.");


int _tmain(int argc, TCHAR* argv[])
{
    // init.
    setlocale(LC_ALL, "");    // 使用客戶環境的缺省locale.

    _tprintf(_T("tcharall_gbk v1.00 (%dbit)\n"), (int)(8*sizeof(int*)));
    _tprintf(_T("Compiler: %")_T(PRIsA)_T("\n"), COMPILER_NAME);
    _tprintf(_T("\n"));

    // show
    _tprintf(_T("%")_T(PRIsA)_T("\n"), psa);    // 輸出窄字符串.
    _tprintf(_T("%")_T(PRIsW)_T("\n"), psw);    // 輸出寬字符串.
    _tprintf(_T("%")_T(PRIsT)_T("\n"), pst);    // 輸出TCHAR字符串.
    

    return 0;
}

　　makefile——

# flags
CC = gcc
CFS = -Wall -std=c99 -finput-charset=gbk -fexec-charset=gbk

# args
RELEASE =0
UNICODE =0
BITS =
CFLAGS =

# [args] 生成模式. 0代表debug模式, 1代表release模式. make RELEASE=1.
ifeq ($(RELEASE),0)
    # debug
    CFS += -g
else
    # release
    CFS += -static -O3 -DNDEBUG
    //CFS += -O3 -g -DNDEBUG
endif

# [args] UNICODE模式. 0代表ansi模式, 1代表unicode模式. make UNICODE=1.
ifeq ($(UNICODE),0)
    # ansi
    CFS +=
else
    # unicode
    CFS += -D_UNICODE -DUNICODE
endif

# [args] 程序位數. 32代表32位程序, 64代表64位程序, 其他默認. make BITS=32.
ifeq ($(BITS),32)
    CFS += -m32
else
    ifeq ($(BITS),64)
        CFS += -m64
    else
    endif
endif

# [args] 使用 CFLAGS 添加新的參數. make CFLAGS="-mavx".
CFS += $(CFLAGS)


.PHONY : all clean

# files
TARGETS = tcharall_gbk
OBJS = tcharall_gbk.o

all : $(TARGETS)

tcharall_gbk : $(OBJS)
    $(CC) -o $@ $^ $(CFS)


tcharall_gbk.o : tcharall_gbk.c
    $(CC) -c $< $(CFS)


clean :
    rm -f $(OBJS) $(TARGETS) $(addsuffix .exe,$(TARGETS))

5.3 測試結果

　　在以下編譯器中成功編譯——
VC6：x86版。Unicode=0。
VC2003：x86版。Unicode=0。
VC2005：x86版、x64版。Unicode=1。
BCB6：x86版。Unicode=0。
GCC 4.6.2（MinGW(20120426)）：x86版。Unicode=0、Unicode=1。
GCC 4.7.1（TDM-GCC(MinGW-w64)）：x86版、x64版。Unicode=0、Unicode=1。
GCC 4.7.0（Fedora 17 x64）：x64版。Unicode=0。
llvm-gcc-4.2（Mac OS X Lion 10.7.4, Xcode 4.4.1）：x64版。Unicode=0。

　　測試結果——

【VC6，Unicode=0】
tcharall v1.00 (32bit)
Compiler: Microsoft VC++ (12008804)

A漢字ABC_Welcome_歡迎_ようこそ.
W漢字ABC_Welcome_歡迎_ようこそ_uD658uC601.
T漢字ABC_Welcome_歡迎_ようこそ.


【VC2003，Unicode=0】
tcharall v1.00 (32bit)
Compiler: Microsoft VC++ (13106030)

A漢字ABC_Welcome_歡迎_ようこそ.
W漢字ABC_Welcome_歡迎_ようこそ_
T漢字ABC_Welcome_歡迎_ようこそ.


【VC2005，Unicode=1】
tcharall_gbk v1.00 (32bit)
Compiler: Microsoft VC++ (140050727)

A漢字ABC_Welcome_歡迎_ようこそ.
W漢字ABC_Welcome_歡迎_ようこそ_??.
T漢字ABC_Welcome_歡迎_ようこそ.


【BCB6，Unicode=0】
tcharall_gbk v1.00 (32bit)
Compiler: Borland C (0x0564)

A漢字ABC_Welcome_歡迎_ようこそ.
W漢字ABC_Welcome_歡迎_ようこそ_
T漢字ABC_Welcome_歡迎_ようこそ.


【GCC 4.6.2（MinGW (20120426)），Unicode=0】
tcharall_gbk v1.00 (32bit)
Compiler: GCC(MinGW) 4.6.2

A漢字ABC_Welcome_歡迎_ようこそ.
W漢字ABC_Welcome_歡迎_ようこそ_
T漢字ABC_Welcome_歡迎_ようこそ.


【GCC 4.6.2（MinGW (20120426)），Unicode=1】
tcharall_gbk v1.00 (32bit)
Compiler: GCC(MinGW) 4.6.2

A漢字ABC_Welcome_歡迎_ようこそ.
W漢字ABC_Welcome_歡迎_ようこそ_T漢字ABC_Welcome_歡迎_ようこそ.


【GCC 4.7.1（TDM-GCC(MinGW-w64)），Unicode=0】
tcharall_gbk v1.00 (64bit)
Compiler: GCC(MinGW) 4.7.1

A漢字ABC_Welcome_歡迎_ようこそ.
W漢字ABC_Welcome_歡迎_ようこそ_T漢字ABC_Welcome_歡迎_ようこそ.


【GCC 4.7.1（TDM-GCC(MinGW-w64)），Unicode=1】
tcharall_gbk v1.00 (64bit)
Compiler: GCC(MinGW) 4.7.1

A漢字ABC_Welcome_歡迎_ようこそ.
W漢字ABC_Welcome_歡迎_ようこそ_.
T漢字ABC_Welcome_歡迎_ようこそ.


【GCC 4.7.0（Fedora 17 x64），Unicode=0】
tcharall_gbk v1.00 (64bit)
Compiler: GCC 4.7.0 20120507 (Red Hat 4.7.0-5)

A����ABC_Welcome_�gӭ_�褦����.
W漢字ABC_Welcome_歡迎_ようこそ_환영.
T����ABC_Welcome_�gӭ_�褦����.


【llvm-gcc-4.2（Mac OS X Lion 10.7.4, Xcode 4.4.1），Unicode=0】
tcharall_gbk v1.00 (64bit)
Compiler: GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)

A????ABC_Welcome_?gӭ_?褦????.
W漢字ABC_Welcome_歡迎_ようこそ_환영.
T????ABC_Welcome_?gӭ_?褦????.

5.4 測試結果分析

　　當使用GBK編碼時，Windows平台下的編譯器大致能正確的輸出字符而不會亂碼。VC++ 6.0不支持“\u”轉義符。
　　Linux和Mac平台下，因編碼不一致窄字符串亂碼，但寬字符串仍然能正確輸出。

參考文獻——
http://www.unicode.org/
《[RFC 3629] UTF-8, a transformation format of ISO 10646》。F. Yergeau，2003-11。http://tools.ietf.org/html/rfc3629
《GB18030-2005 信息技術中文編碼字符集》。國家標准化管理委員會。中國標准出版社，2006-05。
《ISO/IEC 9899:1999 (C99)》。ISO/IEC，1999。www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf
《C語言參考手冊(原書第5版)》。Samuel P.Harbison Ⅲ,Guy L.Steele。機械工業出版社，2003-08。
《[C/C++] 各種C/C++編譯器對UTF-8源碼文件的兼容性測試（VC、GCC、BCB）》。http://www.cnblogs.com/zyl910/archive/2012/07/26/cfile_utf8.html
《[C] wchar_t的格式控制字符（VC、BCB、GCC、C99標准）》。http://www.cnblogs.com/zyl910/archive/2012/07/30/wcharfmt.html

源碼下載——
http://files.cnblogs.com/zyl910/tcharall.rar

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 [C] 跨平台使用Intrinsic函數范例1——使用SSE、AVX指令集處理單精度浮點數組求和（支持vc、gcc，兼容Windows、Linux、Mac） java支持跨平台獲取cpuid、主板id、硬盤id、mac地址（兼容windows、Linux） C#搞跨平台UI，封裝Cef作為Cpf的控件支持Windows，Linux，Mac [C] wchar_t的格式控制字符（VC、BCB、GCC、C99標准） Unity支持的跨平台 Mysql跨平台(Windows,Linux,Mac)使用與安裝使用NPAPI編寫跨平台(Windows/Linux/Mac)跨瀏覽器(Chrome/Firefox/Safari)的插件 Aspnet Core為什么支持跨平台 [C] 讓VC、BCB支持C99的整數類型（stdint.h、inttypes.h）（兼容GCC） C#搞跨平台桌面UI，分別實現Windows，Mac，Linux屏幕截圖