【MySQL】MySQL 8.0支持utf8mb4


utf8mb4 utf8mb3 utf8

Refer to

  MySQL在5.5.3之后增加了utf8mb4字符編碼,mb4即most bytes 4。簡單說utf8mb4是utf8的超集並完全兼容utf8,能夠用四個字節存儲更多的字符。而utf8是utf8mb3的別名。標准的UTF-8字符集編碼是可以用1~4個字節去編碼21位字符,但是MySQL其實實現的utf8只是使用3個字節而已,utf8mb4才是真正意義上的utf8

  如果數據庫表字段設置的字符集不是utf8mb4,卻插入類似emjoy表情的時候:

  • 嚴格模式下會出現Incorrect string value: /xF0/xA1/x8B/xBE/xE5/xA2… for column 'name'這樣的錯誤
  • 非嚴格模式下此后的數據會被截斷

排序字符集

Refer to: Collation Naming Conventions

Suffix Meaning Remark
_ai Accent insensitive  
_as Accent sensitive  
_ci Case insensitive 不分區大小寫
_cs case-sensitive 區分大小寫
_bin Binary 二進制存儲,區分大小寫

utf8mb4_ unicode_ ci VS utf8mb4_ general_ ci

Refer to: What's the difference between utf8_general_ci and utf8_unicode_ci

  • utf8_general_ci校對速度快,但准確度稍差。
  • utf8_unicode_ci准確度高,但校對速度稍慢。

數據庫一般默認選擇utf8mb4_general_ci;
如果應用有德語、法語或者俄語,請一定使用utf8mb4_unicode_ci

 

配置

vim /etc/my.cnf [client] default-character-set = utf8mb4 [mysql] default-character-set = utf8mb4 [mysqld] # character-set-client-handshake = FALSE character-set-server = utf8mb4 collation-server = utf8mb4_general_ci init_connect='SET NAMES utf8mb4' 

檢查目前MySQL的字符集

mysql> SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'; +--------------------------+-----------------+ | Variable_name | Value | +--------------------------+-----------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | collation_connection | utf8_general_ci | | collation_database | utf8_general_ci | | collation_server | utf8_general_ci | +--------------------------+-----------------+ 10 rows in set (0.00 sec)

總結

  • MySQL版本5.5.3+
  • MySQL Connector/J Java驅動5.1.13+
  • /etc/my.cnf配置中添加配置,詳見上面
  • 排序字符集選用utf8mb4_unicode_ci
  • 列(字段)> 表 > 數據庫

備注:
  其實只要數據庫支持utfbmb4(show char set),及時MySQL配置(/etc/my.cnf)中配置的默認字符集是utf8,也可以直接指定數據庫、表、字段的字符集為utf8mb4,然后在連接的時候指定字符集為utf8mb4即可。
比如設置Java的連接參數中characterEncoding=utf8mb4

 

查看MySQL支持的字符集列表:

root@localhost:3306.sock [test]> show char set;
+----------+---------------------------------+---------------------+--------+
| Charset  | Description                     | Default collation   | Maxlen |
+----------+---------------------------------+---------------------+--------+
| big5     | Big5 Traditional Chinese        | big5_chinese_ci     |      2 |
| dec8     | DEC West European               | dec8_swedish_ci     |      1 |
| cp850    | DOS West European               | cp850_general_ci    |      1 |
| hp8      | HP West European                | hp8_english_ci      |      1 |
| koi8r    | KOI8-R Relcom Russian           | koi8r_general_ci    |      1 |
| latin1   | cp1252 West European            | latin1_swedish_ci   |      1 |
| latin2   | ISO 8859-2 Central European     | latin2_general_ci   |      1 |
| swe7     | 7bit Swedish                    | swe7_swedish_ci     |      1 |
| ascii    | US ASCII                        | ascii_general_ci    |      1 |
| ujis     | EUC-JP Japanese                 | ujis_japanese_ci    |      3 |
| sjis     | Shift-JIS Japanese              | sjis_japanese_ci    |      2 |
| hebrew   | ISO 8859-8 Hebrew               | hebrew_general_ci   |      1 |
| tis620   | TIS620 Thai                     | tis620_thai_ci      |      1 |
| euckr    | EUC-KR Korean                   | euckr_korean_ci     |      2 |
| koi8u    | KOI8-U Ukrainian                | koi8u_general_ci    |      1 |
| gb2312   | GB2312 Simplified Chinese       | gb2312_chinese_ci   |      2 |
| greek    | ISO 8859-7 Greek                | greek_general_ci    |      1 |
| cp1250   | Windows Central European        | cp1250_general_ci   |      1 |
| gbk      | GBK Simplified Chinese          | gbk_chinese_ci      |      2 |
| latin5   | ISO 8859-9 Turkish              | latin5_turkish_ci   |      1 |
| armscii8 | ARMSCII-8 Armenian              | armscii8_general_ci |      1 |
| utf8     | UTF-8 Unicode                   | utf8_general_ci     |      3 |
| ucs2     | UCS-2 Unicode                   | ucs2_general_ci     |      2 |
| cp866    | DOS Russian                     | cp866_general_ci    |      1 |
| keybcs2  | DOS Kamenicky Czech-Slovak      | keybcs2_general_ci  |      1 |
| macce    | Mac Central European            | macce_general_ci    |      1 |
| macroman | Mac West European               | macroman_general_ci |      1 |
| cp852    | DOS Central European            | cp852_general_ci    |      1 |
| latin7   | ISO 8859-13 Baltic              | latin7_general_ci   |      1 |
| utf8mb4  | UTF-8 Unicode                   | utf8mb4_general_ci  |      4 |
| cp1251   | Windows Cyrillic                | cp1251_general_ci   |      1 |
| utf16    | UTF-16 Unicode                  | utf16_general_ci    |      4 |
| utf16le  | UTF-16LE Unicode                | utf16le_general_ci  |      4 |
| cp1256   | Windows Arabic                  | cp1256_general_ci   |      1 |
| cp1257   | Windows Baltic                  | cp1257_general_ci   |      1 |
| utf32    | UTF-32 Unicode                  | utf32_general_ci    |      4 |
| binary   | Binary pseudo charset           | binary              |      1 |
| geostd8  | GEOSTD8 Georgian                | geostd8_general_ci  |      1 |
| cp932    | SJIS for Windows Japanese       | cp932_japanese_ci   |      2 |
| eucjpms  | UJIS for Windows Japanese       | eucjpms_japanese_ci |      3 |
| gb18030  | China National Standard GB18030 | gb18030_chinese_ci  |      4 |
+----------+---------------------------------+---------------------+--------+
41 rows in set (0.00 sec)
 

建議

數據庫在設置字符集的時候,設置成utf8mb4格式!


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM