C語言字符集
編譯器在轉換源程序代碼時,所處的環境稱為翻譯環境(translation environment);編譯后程序執行時,所處的環境成為運行環境(execution environment)。對C語言來說,翻譯環境和運行環境是不同的。因此,C語言定義了兩個字符集(character set): 源代碼字符集與運行字符集。源代碼字符集(source character set)是用於組成C源代碼的字符集合,而運行字符集(execution character set)是可以被執行程序解釋的字符集合。在許多C語言的實現版本中,這兩個字符集是一樣。如果不一樣,則編譯器會把源代碼中的字符常量和字符串字面量轉換成運行字符集中的對應元素。
這兩種字符集都包括基本字符集(basic character set)和擴展字符(extended character)。C語言通常沒有指定擴展字符,這通常由本地語言所決定。擴展字符加上基本字符集,組成擴展字符集(extended character set)。
基本源代碼字符集和基本運行字符集都包含了下面的字符類型:
拉丁字母、十進制阿拉伯數字、
下面29個字符:
!“ # % & ` () * + , - . / : ; < = > ? [ \ ] ^ _ { | } ~
5種空白符:
空格、水平制表符、垂直制表符、換行、換頁
基本運行字符集定義了四個不可打印字符集:
null字符(用作字符串終止) \0 、警報(alert) \a 、退格(backspace) \b以及回車(carriage return)\r
C language character set
When the compiler converts source program code, the environment in which it is located is called the translation environment; when the program is executed after compilation, the environment is in the execution environment. For C, the translation environment and the runtime environment are different. Therefore, C language defines two character sets (character set): source code character set and running character set. The source character set is the set of characters used to form the C source code, and the execution character set is the set of characters that can be interpreted by the executing program. In many C implementations, these two character sets are the same. If they are not the same, the compiler will convert the character constants and string literals in the source code into corresponding elements in the running character set.
Both character sets include a basic character set and an extended character. C language usually does not specify extended characters, which is usually determined by the native language. The extended characters plus the basic character set form the extended character set.Both the basic source code character set and the basic run character set include the following character types
:Latin alphabet, decimal Arabic numerals,
The following 29 characters :
! "#% &` () * +,-. /:; <=>? [\] ^ _ {|} ~
5 types of whitespace:Spaces, horizontal tabs, vertical tabs, line breaks, page breaks
The basic running character set defines four non-printable character sets:null characters (used as string termination) \ 0, alert \ a, backspace \ b, and carriage return \ r