操作环境:Windows 10;MySQL8.0.
1.字符(Char)、字节(Byte)与位(Bit)
说明:字节(Byte)与位(Bit)的单位换算关系是固定的(1Byte=8Bit),但是字符与字节的换算关系确实不确定的,而是取决于字符的编码方式。在MySQL中可以分别使用char_length、length和bit_length函数来查看字符串所包含的字符数、字节数及位数。
1.1首先查看下MySQL的编码方式,进入MySQL8.0Command Line Client窗口
mysql> show variables like 'character%'; +--------------------------+---------------------------------------------------------+ | Variable_name | Value | +--------------------------+---------------------------------------------------------+ | character_set_client | gbk | | character_set_connection | gbk | | character_set_database | utf8mb4 | | character_set_filesystem | binary | | character_set_results | gbk | | character_set_server | utf8mb4 | | character_set_system | utf8 | | character_sets_dir | C:\Program Files\MySQL\MySQL Server 8.0\share\charsets\ | +--------------------------+---------------------------------------------------------+ 8 rows in set, 1 warning (0.01 sec)
其中,character_set_client、character_set_connection和character_set_results在windows中文版下的客户端默认的编码方式是gbk。接着我们先来看gbk编码下,三个函数的输出结果:
mysql> select char_length('我爱你');
+-----------------------+
| char_length('我爱你') |
+-----------------------+
| 3 |
+-----------------------+
1 row in set (0.02 sec)
mysql> select length('我爱你');
+------------------+
| length('我爱你') |
+------------------+
| 6 |
+------------------+
1 row in set (0.00 sec)
mysql> select bit_length('我爱你');
+----------------------+
| bit_length('我爱你') |
+----------------------+
| 48 |
+----------------------+
1 row in set (0.00 sec)
1.2重设客户端字符集
输入如下语句来设置character_set_client、character_set_connection和character_set_results的字符集:
set names 'uft8';
备注:以上命令是会话级的,关闭客户端后就会shibi失效,不过我们这里只是用来测试。
再次查看字符集设置:
mysql> show variables like 'character%'; +--------------------------+---------------------------------------------------------+ | Variable_name | Value | +--------------------------+---------------------------------------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8mb4 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8mb4 | | character_set_system | utf8 | | character_sets_dir | C:\Program Files\MySQL\MySQL Server 8.0\share\charsets\ | +-----------
可以看到character_set_client、character_set_connection和character_set_results编码的字符集都设置成了utf8,然后再来查看char_length()、length()和bit_length()函数的结果:
mysql> select char_length('我爱你'); +-----------------------+ | char_length('我爱你') | +-----------------------+ | 5 | +-----------------------+ 1 row in set, 1 warning (0.00 sec) mysql> select length('我爱你'); +------------------+ | length('我爱你') | +------------------+ | 6 | +------------------+ 1 row in set, 1 warning (0.00 sec) mysql> select bit_length('我爱你'); +----------------------+ | bit_length('我爱你') | +----------------------+ | 48 | +----------------------+ 1 row in set, 1 warning (0.00 sec)
WTF?再次尝试在MySQL Workbench中进行测试
1.3MySQL Workbench中
查看客户端字符集,结果如下:
依次使用char_length()、length()和bit_length()函数进行测试:
Result:结合1.1与1.3来看,一个汉字使用gbk编码占两个字节(Byte),使用utf8编码占3个字节(Byte)。至于1.2,尚无合理解释,欢迎补充!