隱式類型轉換簡介
通常ORACLE數據庫存在顯式類型轉換(Explicit Datatype Conversion)和隱式類型轉換(Implicit Datatype Conversion)兩種類型轉換方式。如果進行比較或運算的兩個值的數據類型不同時(源數據的類型與目標數據的類型),而且此時又沒有轉換函數時,那么ORACLE必須將其中一個值進行類型轉換,使其能夠運算。這就是所謂的隱式類型轉換。其中隱式類型轉換是自動進行的,當然,只有在這種轉換是有意義的時候,才會自動進行。
Data Conversion
Generally an expression cannot contain values of different datatypes. For example, an expression cannot multiply 5 by 10 and then add 'JAMES'. However, Oracle supports both implicit and explicit conversion of values from one datatype to another.
關於隱式類型轉換,建議翻看官方文檔“Data Type Comparison Rules”章節,下面是官方文檔中的隱式類型轉換矩陣。從下面這個表格,我們就能對哪些數據類型能進行轉換一目了然。
隱式轉換的規則:
其實隱式類型轉換發生在很多地方,只是我們很多時候沒有留意罷了,不打算一一舉例,自行翻閱官方文檔的介紹,摘抄隱式類型轉換的一些常見的規則如下:
The following rules govern implicit data type conversions:
- During INSERT and UPDATE operations, Oracle converts the value to the data type of the affected column.
- During SELECT FROM operations, Oracle converts the data from the column to the type of the target variable.
- When manipulating numeric values, Oracle usually adjusts precision and scale to allow for maximum capacity. In such cases, the numeric data type resulting from such operations can differ from the numeric data type found in the underlying tables.
- When comparing a character value with a numeric value, Oracle converts the character data to a numeric value.
- Conversions between character values or NUMBER values and floating-point number values can be inexact, because the character types and NUMBER use decimal precision to represent the numeric value, and the floating-point numbers use binary precision.
- When converting a CLOB value into a character data type such as VARCHAR2, or converting BLOB to RAW data, if the data to be converted is larger than the target data type, then the database returns an error.
- During conversion from a timestamp value to a DATE value, the fractional seconds portion of the timestamp value is truncated. This behavior differs from earlier releases of Oracle Database, when the fractional seconds portion of the timestamp value was rounded.
- Conversions from BINARY_FLOAT to BINARY_DOUBLE are exact.
- Conversions from BINARY_DOUBLE to BINARY_FLOAT are inexact if the BINARY_DOUBLE value uses more bits of precision that supported by the BINARY_FLOAT.
- When comparing a character value with a DATE value, Oracle converts the character data to DATE.
- When you use a SQL function or operator with an argument of a data type other than the one it accepts, Oracle converts the argument to the accepted data type.
- When making assignments, Oracle converts the value on the right side of the equal sign (=) to the data type of the target of the assignment on the left side.
- During concatenation operations, Oracle converts from noncharacter data types to CHAR or NCHAR.
- During arithmetic operations on and comparisons between character and noncharacter data types, Oracle converts from any character data type to a numeric, date, or rowid, as appropriate. In arithmetic operations between CHAR/VARCHAR2 and NCHAR/NVARCHAR2, Oracle converts to a NUMBER.
- Most SQL character functions are enabled to accept CLOBs as parameters, and Oracle performs implicit conversions between CLOB and character types. Therefore, functions that are not yet enabled for CLOBs can accept CLOBs through implicit conversion. In such cases, Oracle converts the CLOBs to CHAR or VARCHAR2 before the function is invoked. If the CLOB is larger than 4000 bytes, then Oracle converts only the first 4000 bytes to CHAR.
- When converting RAW or LONG RAW data to or from character data, the binary data is represented in hexadecimal form, with one hexadecimal character representing every four bits of RAW data. Refer to "RAW and LONG RAW Data Types" for more information.
- Comparisons between CHAR and VARCHAR2 and between NCHAR and NVARCHAR2 types may entail different character sets. The default direction of conversion in such cases is from the database character set to the national character set. Table 2-9 shows the direction of implicit conversions between different character types.
對上面官方文檔資料的翻譯如下,如有不對或不夠確切的地方,敬請指出
1. 對於INSERT和UPDATE操作,ORACLE會把插入值或者更新值隱式轉換為對應字段的數據類型。
2. 對於SELECT語句,ORACLE會把字段的數據類型隱式轉換為變量的數據類型。
3. 當處理數值時,ORACLE通常會調整精度和小數位,以實現最大容量。在這種情況下,由此類操作產生的數字數據類型可能與在基礎表中找到的數字數據類型不同。
4. 當比較一個字符型和數值型的值時,ORACLE會把字符型的值隱式轉換為數值型。
5. 字符值或NUMBER值與浮點數值之間的轉換可能不准確,因為字符類型和NUMBER使用十進制精度表示數字值,而浮點數則使用二進制精度。
6. 將CLOB值轉換為字符數據類型(例如VARCHAR2)或將BLOB轉換為RAW數據時,如果要轉換的數據大於目標數據類型,則數據庫將返回錯誤。
7. 當timestamp類型轉換為DATE時(按照第三條,隱式轉換不應該把timestamp轉換為date,除非insert這樣的),timestamp后幾位會被truncated忽略,至於忽略幾位,取決於數據庫版本。
8. 從BINARY_FLOAT到BINARY_DOUBLE的轉換是准確的。
9. 從BINARY_DOUBLE到BINARY_FLOAT的轉換是不精確的,因為BINARY_DOUBLE精度更高。
10. 當比較字符型和日期型的數據時,ORACLE會把字符型轉換為日期型。
11. 如果調用函數(過程)或運算符操作時,如果輸入參數的數據類型與函數(存儲過程)定義的參數數據類型不一致或不是可接受的數據類型時,則ORACLE會把輸入參數的數據類型轉換為函數或者過程定義的數據類型。
12. 當使用賦值符號(等號)時,右邊的類型轉換為左邊的類型
13. 當連接操作(concatenation,一般為||)時,ORACLE會隱式轉換非字符型到字符型
14. 如果字符類型的數據和非字符類型的數據(如number、date、rowid等)作算術運算,則ORACLE會將字符類型的數據轉換為合適的數據類型,這些數據類型可能是number、date、rowid等。
如果CHAR/VARCHAR2 和NCHAR/NVARCHAR2之間作算術運算,則ORACLE會將她們都轉換為number類型的數據再做比較。
15. 比較CHAR/VARCHAR2 和NCHAR/NVARCHAR2時,如果兩者字符集不一樣,則默認的轉換方式是將數據編碼從數據庫字符集轉換為國家字符集
下面簡單舉兩個例子,看看隱式轉換發生的場景:
例子:
SQL> create table test(object_id varchar2(12), object_name varchar2(64));
Table created.
SQL> insert into test
2 select object_id, object_name from dba_objects;
63426 rows created.
SQL> commit;
Commit complete.
SQL> create index ix_test_n1 on test(object_id);
Index created.
SQL> select count(*) from test where object_id=20;
COUNT(*)
----------
1
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------
SQL_ID 4bh7yzj5ma0ks, child number 0
-------------------------------------
select count(*) from test where object_id=20
Plan hash value: 1950795681
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 45 (100)| |
| 1 | SORT AGGREGATE | | 1 | 8 | | |
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------
|* 2 | TABLE ACCESS FULL| TEST | 3 | 24 | 45 (20)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(TO_NUMBER("OBJECT_ID")=20)
Note
-----
- dynamic sampling used for this statement
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
23 rows selected.
如上所示,這個發生隱式轉換是因為這個規則: “當比較一個字符型和數值型的值時,ORACLE會把字符型的值隱式轉換為數值型”(對於SELECT語句,ORACLE會把字段的數據類型隱式轉換為變量的數據類型。似乎這個規則也對),此時由於隱式轉換發生在OBJECT_ID字段上(TO_NUMBER("OBJECT_ID")),導致執行計划走全表掃描。如果我們稍微修改一下SQL的寫法,就會發現執行計划會走INDEX RANGE SCAN。 如下所示:
SQL> select count(*) from test where object_id='20';
COUNT(*)
----------
1
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID 7800f6da7c909, child number 0
-------------------------------------
select count(*) from test where object_id='20'
Plan hash value: 4037411162
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 1 (100)| |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
|* 2 | INDEX RANGE SCAN| IX_TEST_N1 | 1 | 6 | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_ID"='20')
19 rows selected.
下面再介紹一個案例(當比較字符型和日期型的數據時,ORACLE會把字符型轉換為日期型。),這種轉換雖然大部分情況下都是正常的,但是有時候會成為一個隱藏的邏輯炸彈,當NLS_DATE_FORMAT環境變量改變時,則有可能出現錯誤或邏輯錯誤。
SQL> SELECT *
2 FROM scott.emp
3 WHERE hiredate between '01-JAN-1981' and '01-APR-1981';
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30
7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------
SQL_ID czyc76busj56d, child number 0
-------------------------------------
SELECT * FROM scott.emp WHERE hiredate between '01-JAN-1981' and
'01-APR-1981'
Plan hash value: 3956160932
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------
|* 1 | TABLE ACCESS FULL| EMP | 2 | 74 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(("HIREDATE"<=TO_DATE(' 1981-04-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "HIREDATE">=TO_DATE(' 1981-01-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss')))
21 rows selected.
隱式類型轉換問題
Implicit and Explicit Data Conversion
Oracle recommends that you specify explicit conversions, rather than rely on implicit or automatic conversions, for these reasons:
· SQL statements are easier to understand when you use explicit datatype conversion functions.
· Implicit datatype conversion can have a negative impact on performance, especially if the datatype of a column value is converted to that of a constant rather than the other way around.
· Implicit conversion depends on the context in which it occurs and may not work the same way in every case. For example, implicit conversion from a datetime value to a VARCHAR2 value may return an unexpected year depending on the value of the NLS_DATE_FORMAT parameter.
· Algorithms for implicit conversion are subject to change across software releases and among Oracle products. Behavior of explicit conversions is more predictable.
雖然隱式轉換在很多地方自動發生,但是不推薦使用隱式類型轉換,Oracle官方建議指定顯式類型轉換,而不要依賴隱式或自動轉換,主要有下面一下原因:
使用顯式類型轉換函數時,SQL語句更易於理解。
隱式類型轉換可能會對性能產生負面影響,尤其是如果將列值的數據類型轉換為常量而不是相反的數據類型轉換操作時。
隱式轉換取決於發生這種轉換的上下文,在不同的情況下,隱式轉換的工作方式可能不同。例如,從日期時間值到VARCHAR2值的隱式轉換可能會返回錯誤(意外)的年份,具體取決於NLS_DATE_FORMAT參數的值。
隱式轉換算法可能會在軟件版本之間以及Oracle產品之間發生變化。明確轉換的行為更容易預測。否則有可能埋下一個大坑。
如果在索引表達式中發生隱式類型轉換,則Oracle數據庫可能不使用索引,因為它是pre-conversion data type.。這可能會對性能產生負面影響。
Tom Kyte的這篇博文On Implicit Conversions and More,還總結了隱式數據類型轉換會帶來的一些問題:
The resulting code typically has logic bombs in it. The code appears to work in certain circumstances but will not work in others.
- The resulting code relies on default settings. If someone changes the default settings, the way the code works will be subject to change. (A DBA changing a setting can make your code work entirely differently from the way it does today.)
- The resulting code can very well be subject to SQL injection bugs.
- The resulting code may end up performing numerous unnecessary repeated conversions (negatively affecting performance and consuming many more resources than necessary).
- The implicit conversion may be precluding certain access paths from being available to the optimizer, resulting in suboptimal query plans. (In fact, this is exactly what is happening to you!)
隱式轉換可能會阻止某些訪問路徑無法用於優化器,從而導致查詢計划不理想。 (實際上,這正是您數據庫當中正在發生的事情!)
- Implicit conversions may prevent partition elimination.
其實上面已經有相關例子介紹,下面介紹一個例子,主要用來說明,隱式類型轉換不一定導致執行計划不走索引,只有當隱式轉換函數出現在查詢條件中的索引字段上,而且左值的類型被隱式轉為了右值的類型時才會出現嚴重性能問題。
SQL> drop table test;
Table dropped.
SQL> create table test
2 as
3 select * from dba_objects;
Table created.
SQL> create index ix_test_n1 on test(object_id);
Index created.
SQL> select count(*) from test where object_id='20';
COUNT(*)
----------
1
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID 29jmhh43kkrv4, child number 0
-------------------------------------
select count(*) from test where object_id='20'
Plan hash value: 4037411162
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 1 (100)| |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
|* 2 | INDEX RANGE SCAN| IX_TEST_N1 | 10 | 130 | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_ID"=20)
Note
-----
- dynamic sampling used for this statement
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
23 rows selected.
SQL>
其實SQL語句發生了隱式轉換,而且轉換的地方在字符串’20'上面,轉換為數字20。這樣的變化沒有發生在OBJECT_ID列上面。其次,這種轉換沒有發生在左值列上面,沒有影響到IX_TEST_N1的路徑。
所以以后,如果遇到”隱式轉換一定不走索引嗎?”或”隱式類型轉換一定導致索引失效嗎?”這類問題,你都要辯證的來分析,不能一概而論。
下面介紹一個綁定變量發生隱式類型轉換的例子:
SQL> create table test
2 as
3 select * from dba_objects;
Table created.
SQL> commit;
Commit complete.
SQL> create index ix_test_object_name on test(object_name);
Index created.
SQL> variables v_object_name nvarchar2(30);
SP2-0734: unknown command beginning "variables ..." - rest of line ignored.
SQL>
SQL> variable v_object_name nvarchar2(30);
SQL> exec :v_object_name :='I_OBJ1';
PL/SQL procedure successfully completed.
SQL> select count(*) from test where object_name=:v_object_name;
COUNT(*)
----------
1
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID ft05prnxtpk9u, child number 0
-------------------------------------
select count(*) from test where object_name=:v_object_name
Plan hash value: 1950795681
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 113 (100)| |
| 1 | SORT AGGREGATE | | 1 | 66 | | |
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
|* 2 | TABLE ACCESS FULL| TEST | 10 | 660 | 113 (11)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(SYS_OP_C2C("OBJECT_NAME")=:V_OBJECT_NAME)
Note
-----
- dynamic sampling used for this statement
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
23 rows selected.
這里發生隱式類型轉換,是因為隱式類型規則:“比較CHAR/VARCHAR2 和NCHAR/NVARCHAR2時,如果兩者字符集不一樣,則默認的轉換方式是將數據編碼從數據庫字符集轉換為國家字符集” ,而此時是借助內部函數SYS_OP_C2C實現的
SYS_OP_C2C is an internal function which does an implicit conversion of varchar2 to national character set using TO_NCHAR function. Thus, the filter completely changes as compared to the filter using normal comparison.
如何找出存在隱式轉換的SQL?
有些公司可能對發布的SQL進行全面審計,能夠從源頭上杜絕大多數存在隱式類型轉換的SQL,但是大多數公司可能沒有這個能力或資源來實現這個目標,那么,最重要的就是如何找出數據庫中存在隱式轉換的SQL,關於如何找出存在隱式數據類型轉換的SQL,一般有下面兩個SQL:
SELECT
SQL_ID,
PLAN_HASH_VALUE
FROM
V$SQL_PLAN X
WHERE
X.FILTER_PREDICATES LIKE '%INTERNAL_FUNCTION%'
GROUP BY
SQL_ID,
PLAN_HASH_VALUE;
SELECT
SQL_ID,
PLAN_HASH_VALUE
FROM
V$SQL_PLAN X
WHERE
X.FILTER_PREDICATES LIKE '%SYS_OP_C2C%'
GROUP BY
SQL_ID,
PLAN_HASH_VALUE;
但是需要注意的是,即使執行計划中存在INTERNAL_FUNCTION,也不一定說明SQL語句出現了隱式數據類型轉換,關於這個問題,參考我的博客“ORACLE數據庫中執行計划出現INTERNAL_FUNCTION一定是隱式轉換嗎?”。 所以還必須對找出的相關SQL進行仔細甄別、鑒定。
另外,這篇博客“ORACLE中內部函數SYS_OP_C2C和隱式類型轉換”,也值得對隱式類型轉換了解不深的同學看看。
如何避免隱式類型轉換呢?
1:在數據庫設計階段和寫SQL期間,盡量遵循一致的原則,避免不必要的數據類型轉換。
在建模時,要統一字段類型,尤其是和其它表進行關聯的相關字段必須保證數據類型一致。這樣可以避免不必要的隱式數據類型轉換。
查詢SQL中條件與字段類型保持一致,另外,確保綁定變量的數據類型。使其與對應字段的數據類型一致
2:使用轉換函數,進行顯示類型轉換。
例如有下面一些常見的類型轉換函數:
· TO_CHAR:把DATE或NUMBER轉換成字符串;
· TO_DATE:把NUMBER、CHAR或VARCHAR2轉換成DATE。當用到時間戳時,可以用到TO_TIMESTAMP或TO_TIMESTAMP_TZ。
· TO_NUMBER: 把CHAR或VARCHAR2轉換成NUMBER。
3:創建帶有SYS_OP_C2C的函數索引。
這種方法比較少用,不過確實也是特殊場景下的一種優化方法。
參考資料:
https://blogs.oracle.com/oraclemagazine/on-implicit-conversions-and-more
https://docs.oracle.com/cd/E21764_01/apirefs.1111/e12048/cql_elements.htm#CQLLR290
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Data-Type-Comparison-Rules.html#GUID-98BE3A78-6E33-4181-B5CB-D96FD9DC1694