使用UTF8字符集存儲中文生僻字


使用UTF8字符集存儲中文生僻字

一、相關學習BLOG

https://www.cnblogs.com/jyzhao/p/8654412.html
http://blog.itpub.net/781883/viewspace-1411259/
https://www.qqxiuzi.cn/bianma/zifuji.php
https://blog.csdn.net/iteye_7853/article/details/82516888

 

二、需求詳情:
客戶提出,關於氮卓斯汀變更為氮䓬斯汀,系統出現亂碼問題
產生問題的原因為:oracle數據庫字符集為:ZHS16GBK,對於部分生僻字是無法正常保存的。

三、客戶提出的解決方案:
1. 修改數據庫字符集為:UTF-8。此方法需對oracle字符集進行修改,但修改后,可能會將原有數據全部變成亂碼。
2. 程序改造:將所有會涉及到生僻字的字段(例如產品名稱、通用名等),存入數據庫時,轉碼為16進制存,然后讀取時再進行解碼后展示到頁面。此方法涉及修改代碼龐大,且數據庫內容可讀性很差,手動刷數據、導出數據難度也很大。

四、解決思路:
1)直接修改數據庫字符集,除非是子集修改為超集,否則不建議修改,從上述鏈接blog可以發現強行將db字符集從gbk修改為utf8后,plsql登錄提示存在字符不匹配現象;
2)應用程序修改,代碼量大,且可讀寫性太差;
3)建議將生僻字業務表,遷移至utf8 db庫中存儲(與開發人員溝通,實際存儲生僻字的表只有20余個,可以單獨對這些表進行遷移,業務修改查詢表的代碼(通過db_link),或者直接連接新的db,再或者通過創建db_link+同義詞指向遷移后的遠程表進行查詢不修改應用代碼(應用不修改,無感知);

五、實驗測試
1.測試環境導出業務表
2.導入到UTF8環境下,進行讀寫測試


5.1源環境導出

修改字符集報錯
SQL> alter database character set al32utf8;
alter database character set al32utf8
*1 行出現錯誤:
ORA-12712: 新字符集必須為舊字符集的超集
SQL> select * from nls_database_parameters where parameter like '%CHARACTERSET%'
PARAMETER VALUE
------------------------------ ------------------------------
NLS_CHARACTERSET ZHS16GBK
NLS_NCHAR_CHARACTERSET AL16UTF16

SQL> conn scott/tiger
SQL> create table test(id int,c_name varchar2(200));
表已創建。
SQL> insert into test values(1,'板藍根');
SQL> insert into test values(2,'氮䓬斯汀');
SQL> commit;
SQL> insert into test values(3,'氮卓斯汀');
SQL> commit;

SQL> select * from test
ID C_NAME
---------- --------------------
1 板藍根
2 氮?斯汀
3 氮卓斯汀

C:\Users\Thinkpad>exp scott/tiger FILE=C:\Users\Thinkpad\Desktop\temp\hr_test.dmp TABLES=test
Export: Release 11.2.0.4.0 - Production on 星期三 6月 26 13:20:58 2019
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
連接到: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
已導出 ZHS16GBK 字符集和 AL16UTF16 NCHAR 字符集
即將導出指定的表通過常規路徑...
. . 正在導出表 TEST導出了 3 行
成功終止導出, 沒有出現警告。

 

 

5.2目標環境導入

 

SQL> select * from nls_database_parameters where parameter like '%CHARACTERSET%';
PARAMETER VALUE
------------------------------ ------------------------------
NLS_CHARACTERSET AL32UTF8
NLS_NCHAR_CHARACTERSET AL16UTF16

$env|grep LANG
NLS_LANG=american_america.ZHS16GBK
LANG=en_US.UTF-8

enmo:/home/oracleimp scott/tiger file=/home/oracle/hr_test.dmp full=y
Import: Release 11.2.0.4.0 - Production on Wed Jun 26 01:27:22 2019
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Export file created by EXPORT:V11.02.00 via conventional path
import done in ZHS16GBK character set and AL16UTF16 NCHAR character set
import server uses AL32UTF8 character set (possible charset conversion)
. importing SCOTT's objects into SCOTT
. importing SCOTT's objects into SCOTT
. . importing table "TEST" 3 rows imported
Import terminated successfully without warnings.

SQL> select * from test;
ID C_NAME
---------- ------------------------------
1 
2 
3 ˹͡
以上Oracle進行字符轉換后,中文字符直接配置為Null

修改語言格式,讓Oracle無需進行字符轉換
export NLS_LANG=american_america.AL32UTF8

enmo:/home/oracleimp scott/tiger file=/home/oracle/hr_test.dmp full=y
Export file created by EXPORT:V11.02.00 via conventional path
import done in AL32UTF8 character set and AL16UTF16 NCHAR character set
export client uses ZHS16GBK character set (possible charset conversion)
. importing SCOTT's objects into SCOTT
. importing SCOTT's objects into SCOTT
. . importing table "TEST" 3 rows imported
Import terminated successfully without warnings.
enmo:/home/oraclesqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Wed Jun 26 02:40:32 2019
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

 

數據驗證

 

SQL> conn scott/tiger
Connected.
SQL> select * from test;
ID C_NAME
------------------------------------
1 板藍根
2 氮?斯汀
3 氮卓斯汀
本次數據是有了,

UTF8字符集
SQL> select dump('氮卓斯汀') from dual;
DUMP('氮卓斯汀')
--------------------------------------------------------------
Typ=96 Len=12: 230,176,174,229,141,147,230,150,175,230,177,128

GBK字符集
SQL> select dump('氮卓斯汀') from dual;
DUMP('氮卓斯汀')
---------------------------------------------
Typ=96 Len=8: 181,170,215,191,203,185,205,161

SQL> desc scott.test
名稱 是否為空? 類型
----------------------------------------- -------- ----------------------------
ID NUMBER(38)
C_NAME VARCHAR2(200)

對於兩套環境test表字段進行收縮,可以發現UTF8字符集表,實際存儲是使用三個字節存儲一個漢字
UTF8
SQL> alter table scott.test modify c_name varchar2(8);
alter table scott.test modify c_name varchar2(8)
*
ERROR at line 1:
ORA-01441: cannot decrease column length because some value is too big 
SQL> alter table scott.test modify c_name varchar2(12);
Table altered.

GBK
GBK存儲中文兩個字節存儲一個漢字
SQL> alter table scott.test modify c_name varchar2(8);
表已更改。

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM