[C] wchar_t的格式控制字符(VC、BCB、GCC、C99標准)


作者:zyl910

  隨着wchar_t類型引入C語言,字符串處理變得越來越復雜。例如字符串輸出有printf、wprintf這兩個函數,當參數中既有char字符串又有wchar_t字符串時,該怎么填寫格式控制字符呢?本文對此進行探討。


一、翻閱文檔

  先翻閱一下各個編譯器的文檔及C99標准,看看它們對格式控制字符的說明。


1.1 VC的文檔

  在MSDN官網上,可以找到printf與wprintf的格式字符串的說明,在《Format Specification Fields: printf and wprintf Functions》(http://msdn.microsoft.com/en-us/library/56e442dc(v=vs.110).aspx)。摘錄——
A format specification, which consists of optional and required fields, has the following form:
% [flags] [width] [.precision] [{h | l | ll | I | I32 | I64}]type

  先點“type”查看類型,進入《printf Type Field Characters》頁面(http://msdn.microsoft.com/en-us/library/hf4y5e3w(v=vs.110).aspx)。摘錄——
printf Type Field Characters

Character
Type Output format
c int or wint_t When used with printf functions, specifies a single-byte character; when used with wprintf functions, specifies a wide character.
C int or wint_t When used with printf functions, specifies a wide character; when used with wprintf functions, specifies a single-byte character.
s String When used with printf functions, specifies a single-byte–character string; when used with wprintf functions, specifies a wide-character string. Characters are displayed up to the first null character or until the precision value is reached.
S String When used with printf functions, specifies a wide-character string; when used with wprintf functions, specifies a single-byte–character string. Characters are displayed up to the first null character or until the precision value is reached.

 


  后退,再點擊《Size Specification》(http://msdn.microsoft.com/en-us/library/tcxf1dw6(v=vs.110).aspx)的鏈接。摘錄——

To specify
Use prefix With type specifier
Single-byte character with printf functions h c or C
Single-byte character with wprintf functions h c or C
Wide character with printf functions l c or C
Wide character with wprintf functions l c or C
Single-byte – character string with printf functions h s or S
Single-byte – character string with wprintf functions h s or S
Wide-character string with printf functions l s or S
Wide-character string with wprintf functions l s or S
Wide character w c
Wide-character string w s
 

 

Thus to print single-byte or wide-characters with printf functions and wprintf functions, use format specifiers as follows.

To print character as
Use function With format specifier
single byte printf c, hc, or hC
single byte wprintf C, hc, or hC
wide wprintf c, lc, lC, or wc
wide printf C, lc, lC, or wc
 

To print strings with printf functions and wprintf functions, use the prefixes h and l analogously with format type-specifiers s and S.


  上面介紹了很多控制字符。整理一下,發現對字符串來說,最有用的是這三個——
hs:printf、wprintf均是char字符串。
ls:printf、wprintf均是wchar_t字符串。
s:printf是char字符串,而wprintf是wchar_t字符串。與TCHAR搭配使用很方便。


1.2 BCB的文檔

  打開BCB6幫助文件中的“C Runtime Library Reference”,在索引中輸入“printf”,能很快找到格式控制字符的說明——

  觀察后可發現,它與VC是兼容的。可以使用hs/ls/s分別處理char/wchar_t/TCHAR字符串。


1.3 GCC的文檔

  我這里裝了Fedora 17,並裝好了GCC 4.7.0。
  打開控制台,輸入“man 3 wprintf”查看wprintf函數的文檔。摘錄——
c
If no l modifier is present, the int argument is converted to a wide character by a call to the btowc(3) function, and the resulting wide character is written. If an l modifier is present, the wint_t (wide character) argument is written.

s
If no l modifier is present: The const char * argument is expected to be a pointer to an array of character type (pointer to a string) containing a multibyte character sequence beginning in the initial shift state. Characters from the array are converted to wide characters (each by a call to the mbrtowc(3) function with a conversion state starting in the initial state before the first byte). The resulting wide characters are written up to (but not including) the terminating null wide character. If a precision is specified, no more wide characters than the number specified are written. Note that the precision determines the number of wide characters written, not the number of bytes or screen positions. The array must contain a terminating null byte, unless a precision is given and it is so small that the number of converted wide characters reaches it before the end of the array is reached.
If an l modifier is present: The const wchar_t * argument is expected to be a pointer to an array of wide characters. Wide characters from the array are written up to (but not including) a terminating null wide character. If a precision is specified, no more than the number specified are written. The array must contain a terminating null wide character, unless a precision is given and it is smaller than or equal to the number of wide characters in the array.


  根據上面的描述,GCC似乎只支持這兩種字符串的格式控制字符——
s:printf、wprintf均是char字符串。
ls:printf、wprintf均是wchar_t字符串。


1.4 C99標准

  在C99標准的“7.24.2.1 The fwprintf function”中介紹了fwprintf等寬字符函數的格式控制字符。摘錄——
7 The length modifiers and their meanings are:

h
Specifies that a following d, i, o, u, x, or X conversion specifier applies to a short int or unsigned short int argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to short int or unsigned short int before printing); or that a following n conversion specifier applies to a pointer to a short int argument.

l (ell)
Specifies that a following d, i, o, u, x, or X conversion specifier applies to a long int or unsigned long int argument; that a following n conversion specifier applies to a pointer to a long int argument; that a following c conversion specifier applies to a wint_t argument; that a following s conversion specifier applies to a pointer to a wchar_t argument; or has no effect on a following a, A, e, E, f, F, g, or G conversion specifier.

……

8 The conversion specifiers and their meanings are:

c
If no l length modifier is present, the int argument is converted to a wide character as if by calling btowc and the resulting wide character is written.
If an l length modifier is present, the wint_t argument is converted to wchar_t and written.

s
If no l length modifier is present, the argument shall be a pointer to the initial element of a character array containing a multibyte character sequence beginning in the initial shift state. Characters from the array are converted as if by repeated calls to the mbrtowc function, with the conversion state described by an mbstate_t object initialized to zero before the first multibyte character is converted, and written up to (but not including) the terminating null wide character. If the precision is specified, no more than that many wide characters are written. If the precision is not specified or is greater than the size of the converted array, the converted array shall contain a null wide character.
If an l length modifier is present, the argument shall be a pointer to the initial element of an array of wchar_t type. Wide characters from the array are written up to (but not including) a terminating null wide character. If the precision is specified, no more than that many wide characters are written. If the precision is not specified or is greater than the size of the array, the array shall contain a null wide character.


  可見,C99標准中c、s僅有“l”長度修正,沒“l”的是char字符串,有“l”的是wchar_t字符串。


1.5 小結

  根據上面的資料,可以整理出一份表格——

  VC和BCB GCC和C99標准
printf wprintf printf wprintf
s char wchar_t char char
S wchar_t char * *
hs char char * *
ls wchar_t wchar_t wchar_t wchar_t

*:未定義。


二、測試程序

  參考了上述文檔,我覺的應該編寫一個測試程序,實際測一下各個編譯器對wchar_t格式控制字符的支持性。
  測試程序的代碼如下——

#include <stdio.h>
#include <locale.h>
#include <string.h>
#include <wchar.h>

char* psa = "CHAR";    // 單字節字符串.
wchar_t* psw = L"WCHAR";    // 寬字符串.
wchar_t* pst = L"TCHAR";    // 類型與printf/wprintf匹配的字符串.

int main()
{
    setlocale(LC_ALL, "");    // 使用系統當前代碼頁.
    
    // test
    wprintf(L"A:\t%hs\n", psa);
    wprintf(L"W:\t%ls\n", psw);
    wprintf(L"T:\t%s\n", pst);
    
    return 0;
}

 

  如果運行正常的話,該程序的輸出結果應該是——
A: CHAR
W: WCHAR
T: TCHAR


三、測試結果

3.1 VC6與BCB6測試

  跟意料中的一樣,VC6與BCB6均正確輸出了——
A: CHAR
W: WCHAR
T: TCHAR


3.2 fedora中的GCC測試

  Fedora 17,GCC 4.7.0——

  第3項的輸出結果有誤是很容易理解的。因為GCC文檔與C99標准都規定“無l時的s代表char字符串”,而pst實際上是一個wchar_t字符串。
  而第1項正確的輸出結果反倒有點迷惑——GCC文檔和C99標准中s不是沒有“h”長度修正嗎。想了一下才明白,文檔上說的是“無l時的s代表char字符串”,因“hs”沒有“l”,所以被識別為char字符串也是符合標准。


3.3 mingw中的GCC測試

  MinGW(20120426),GCC 4.6.2——

  MinGW雖然用的也是GCC編譯器,但為了兼容Windows環境,它調整了格式控制字符規則,與VC保持一致。


四、總結

  根據上面的測試結果,修訂前面的表格——

  VC、BCB、MinGW Linux下的GCC、C99標准
printf wprintf printf wprintf
s char wchar_t char char
S wchar_t char * *
hs char char char char
ls wchar_t wchar_t wchar_t wchar_t

  總結如下——
1) 需要輸出char字符串時,使用“hs”。
2) 需要輸出wchar_t字符串時,使用“ls”。
3) 需要輸出TCHAR字符串時,使用“s”,僅對VC、BCB、MinGW等Windows平台的編譯器有效。

 

參考文獻——
《ISO/IEC 9899:1999 (C99)》。ISO/IEC,1999。www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf
《C99標准》。yourtommy。http://blog.csdn.net/yourtommy/article/details/7495033
《[VS2012] Format Specification Fields: printf and wprintf Functions》。http://msdn.microsoft.com/en-us/library/56e442dc(v=vs.110).aspx
《[VS2012] printf Type Field Characters》。http://msdn.microsoft.com/en-us/library/hf4y5e3w(v=vs.110).aspx
《[VS2012] Size Specification》。http://msdn.microsoft.com/en-us/library/tcxf1dw6(v=vs.110).aspx
《wprintf(3) - Linux manual page》。http://www.kernel.org/doc/man-pages/online/pages/man3/wprintf.3.html

 

源碼下載——
http://files.cnblogs.com/zyl910/wcharfmt.rar


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM