因為一個字符校對問題,我的大廠面試掛了


摘要:字符集是一套符號和編碼。校對規則是在字符集內用於比較字符的一套規則。 

本文分享自華為雲社區《一個字符校對規則引發的血案》,原文作者:DRS技術快客 。

問題現場

我們先看一個建表語句

CREATE TABLE collate_test (
val1 char(32) COLLATE utf8mb4_general_ci,
val2 char(32)
) CHARACTER SET utf8mb4;

當我們在MySQL5.7和MySQL8.0上建表,都能建成功,但是當建成功之后,我們都執行SQL:SELECT * FROM collate_test WHERE val1=val2的時候:
在5.7上執行

mysql> SELECT * FROM collate_test WHERE val1=val2;
Empty set (0.00 sec)

在8.0上執行

mysql> SELECT * FROM collate_test WHERE val1=val2;
ERROR 1267 (HY000): Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8mb4_0900_ai_ci,IMPLICIT) for operation '='

很奇怪,為什么會出現utf8mb4_0900_ai_ci呢?

我們查看MySQL的資料https://dev.mysql.com/doc/refman/8.0/en/charset-mysql.html 發現,原來MySQL8.0在UTF8mb4字符集下面的默認排序規則為utf8mb4_0900_ai_ci

現場分析

然后我們再分別來看一下建表語句:SHOW CREATE TABLE collate_test
在5.7上執行

 show create table collate_test;
+--------------+--------------------------------------------------------------------------------------------------------------------------------------+
| Table        | Create Table                                                                                                                         |
+--------------+--------------------------------------------------------------------------------------------------------------------------------------+
| collate_test | CREATE TABLE `collate_test` (
  `val1` char(32) DEFAULT NULL,
  `val2` char(32) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 |
+--------------+--------------------------------------------------------------------------------------------------------------------------------------+

在8.0上執行

show create table collate_test;
+--------------+--------------------------------------------------------------------------------------------------------------------------------------+
| Table        | Create Table                                                                                                                         |
+--------------+--------------------------------------------------------------------------------------------------------------------------------------+
| collate_test | CREATE TABLE `collate_test` (
  `val1` char(32) DEFAULT NULL,
  `val2` char(32) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 |
+--------------+--------------------------------------------------------------------------------------------------------------------------------------+

看出來差別了,8.0上建表的時候,被加上了collate屬性
在8.0執行

mysql> SHOW CHARACTER SET WHERE Charset="utf8mb4";
+---------+---------------+--------------------+--------+
| Charset | Description   | Default collation  | Maxlen |
+---------+---------------+--------------------+--------+
| utf8mb4 | UTF-8 Unicode | utf8mb4_0900_ai_ci |      4 |
+---------+---------------+--------------------+--------+
1 row in set (0.01 sec)

原來8.0中建表的時候,當指定字符集為utf8mb4的時候,它的默認collation就是utf8mb4_0900_ai_ci,而mysql不允許兩個互斥的校驗規則的數據做對比,而utf8mb4_0900_ai_ci與utf8mb4_general_ci是互斥的

擴展問題

這里面問題比較簡單,因為一般我們不會對同一個表的不同字段設置相同字符集不同校對規則,但是在不同的表結構之前,我們有可能不經意之間就犯了這個錯誤,例如,聯表,觸發器等。

聯表查詢

比如下面兩個表

CREATE TABLE collate_general(
val1 char(32)
) COLLATE utf8mb4_general_ci;

CREATE TABLE collate_0900 (
val2 char(32)
) COLLATE utf8mb4_0900_ai_ci;

當我們聯表查詢的時候

mysql> select * from collate_general,collate_0900 where val1=val2;
ERROR 1267 (HY000): Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8mb4_0900_ai_ci,IMPLICIT) for operation '='

觸發器

比如我們先建一個表和觸發器(為舉例需要,觸發器並無實際意義)

CREATE TABLE collate_trigger(
val1 char(32)
) COLLATE utf8mb4_general_ci;

DELIMITER ||
CREATE TRIGGER trigger_0900 AFTER INSERT ON collate_trigger FOR EACH ROW
BEGIN
    DECLARE val2 VARCHAR(32);
    SET val2=new.val1;
    SELECT val1 into val2 from collate_trigger WHERE val1=val2;
END||
DELIMITER ;

當我們向表中插入數據的時候

mysql> insert into collate_trigger values ('abc');
ERROR 1267 (HY000): Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8mb4_0900_ai_ci,IMPLICIT) for operation '='

然后我們看一下建表語句

mysql> show create table collate_trigger;
+-----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table           | Create Table                                                                                                                                                  |
+-----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| collate_trigger | CREATE TABLE `collate_trigger` (
  `val1` char(32) COLLATE utf8mb4_general_ci DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci |
+-----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+

我們再看一下觸發器

mysql> show create trigger trigger_0900\G
*************************** 1. row ***************************
               Trigger: trigger_0900
              sql_mode: ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION
SQL Original Statement: CREATE DEFINER=`root`@`localhost` TRIGGER `trigger_0900` AFTER INSERT ON `collate_trigger` FOR EACH ROW BEGIN
    DECLARE val2 VARCHAR(32);
    SET val2=new.val1;
    SELECT val1 into val2 from collate_trigger WHERE val1=val2;
END
  character_set_client: gbk
  collation_connection: gbk_chinese_ci
    Database Collation: utf8mb4_0900_ai_ci
               Created: 2021-05-31 15:24:44.40

發現沒有,觸發器的Database collation為utf8mb4_0900_ai_ci,在觸發器的比較語句中,val1為collate_trigger的字段,collation為utf8mb4_general_ci,val2為觸發器trigger_0900的自有字段,collation為utf8mb4_0900_ai_ci。

本文中舉例都比較簡單直接,客戶真實業務場景可能都比較復雜,但是所遇問題的原因都是一樣的。由此可見,在處理MySQL之前的版本升級到8.0版本的時候,字符集校驗規則一定要注意了

 

點擊關注,第一時間了解華為雲新鮮技術~


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM