操作環境:Windows 10;MySQL8.0.
1.字符(Char)、字節(Byte)與位(Bit)
說明:字節(Byte)與位(Bit)的單位換算關系是固定的(1Byte=8Bit),但是字符與字節的換算關系確實不確定的,而是取決於字符的編碼方式。在MySQL中可以分別使用char_length、length和bit_length函數來查看字符串所包含的字符數、字節數及位數。
1.1首先查看下MySQL的編碼方式,進入MySQL8.0Command Line Client窗口
mysql> show variables like 'character%'; +--------------------------+---------------------------------------------------------+ | Variable_name | Value | +--------------------------+---------------------------------------------------------+ | character_set_client | gbk | | character_set_connection | gbk | | character_set_database | utf8mb4 | | character_set_filesystem | binary | | character_set_results | gbk | | character_set_server | utf8mb4 | | character_set_system | utf8 | | character_sets_dir | C:\Program Files\MySQL\MySQL Server 8.0\share\charsets\ | +--------------------------+---------------------------------------------------------+ 8 rows in set, 1 warning (0.01 sec)
其中,character_set_client、character_set_connection和character_set_results在windows中文版下的客戶端默認的編碼方式是gbk。接着我們先來看gbk編碼下,三個函數的輸出結果:
mysql> select char_length('我愛你');
+-----------------------+
| char_length('我愛你') |
+-----------------------+
| 3 |
+-----------------------+
1 row in set (0.02 sec)
mysql> select length('我愛你');
+------------------+
| length('我愛你') |
+------------------+
| 6 |
+------------------+
1 row in set (0.00 sec)
mysql> select bit_length('我愛你');
+----------------------+
| bit_length('我愛你') |
+----------------------+
| 48 |
+----------------------+
1 row in set (0.00 sec)
1.2重設客戶端字符集
輸入如下語句來設置character_set_client、character_set_connection和character_set_results的字符集:
set names 'uft8';
備注:以上命令是會話級的,關閉客戶端后就會shibi失效,不過我們這里只是用來測試。
再次查看字符集設置:
mysql> show variables like 'character%'; +--------------------------+---------------------------------------------------------+ | Variable_name | Value | +--------------------------+---------------------------------------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8mb4 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8mb4 | | character_set_system | utf8 | | character_sets_dir | C:\Program Files\MySQL\MySQL Server 8.0\share\charsets\ | +-----------
可以看到character_set_client、character_set_connection和character_set_results編碼的字符集都設置成了utf8,然后再來查看char_length()、length()和bit_length()函數的結果:
mysql> select char_length('我愛你'); +-----------------------+ | char_length('我愛你') | +-----------------------+ | 5 | +-----------------------+ 1 row in set, 1 warning (0.00 sec) mysql> select length('我愛你'); +------------------+ | length('我愛你') | +------------------+ | 6 | +------------------+ 1 row in set, 1 warning (0.00 sec) mysql> select bit_length('我愛你'); +----------------------+ | bit_length('我愛你') | +----------------------+ | 48 | +----------------------+ 1 row in set, 1 warning (0.00 sec)
WTF?再次嘗試在MySQL Workbench中進行測試
1.3MySQL Workbench中
查看客戶端字符集,結果如下:
依次使用char_length()、length()和bit_length()函數進行測試:
Result:結合1.1與1.3來看,一個漢字使用gbk編碼占兩個字節(Byte),使用utf8編碼占3個字節(Byte)。至於1.2,尚無合理解釋,歡迎補充!