Mycat分片規則詳解

本文轉載自查看原文 2019-03-19 10:31 2798 數據庫分庫分表/ mycat

1、分片枚舉

通過在配置文件中配置可能的枚舉 id，自己配置分片，本規則適用於特定的場景，比如有些業務需要按照省份或區縣來做保存，而全國省份區縣固定的，這類業務使用本條規則，配置如下：

<tableRule name="sharding-by-intfile">
  <rule>
    <columns>user_id</columns>
    <algorithm>hash-int</algorithm>
  </rule>
</tableRule>
<function name="hash-int" class="io.mycat.route.function.PartitionByFileMap">
  <property name="mapFile">partition-hash-int.txt</property>
  <property name="type">0</property>
  <property name="defaultNode">0</property>
</function>

配置說明

標簽屬性	說明
columns	標識將要分片的表字段
algorithm	分片函數
mapFile	標識配置文件名稱
type	默認值為 0，0 表示 Integer，非零表示 String
defaultNode	默認節點:小於 0 表示不設置默認節點，大於等於 0 設置默認節點

partition-hash-int.txt 配置：

10000=0
10010=1
DEFAULT_NODE=1      //默認節點

注意
默認節點的作用：枚舉分片時，如果碰到不識別的枚舉值，就讓它路由到默認節點
如果不配置默認節點（defaultNode 值小於 0 表示不配置默認節點），碰到不識別的枚舉值就會報錯
like this：can’t find datanode for sharding column:column_name val:ffffffff

2、固定分片 hash 算法

本條規則類似於十進制的求模運算，區別在於是二進制的操作,是取 id 的二進制低 10 位，即 id 二進制 &1111111111。
此算法的優點在於如果按照 10 進制取模運算，在連續插入 1-10 時候 1-10 會被分到 1-10 個分片，增大了插入的事務控制難度，而此算法根據二進制則可能會分到連續的分片，減少插入事務事務控制難度。

<tableRule name="rule1">
  <rule>
    <columns>user_id</columns>
    <algorithm>func1</algorithm>
  </rule>
</tableRule>
<function name="func1" class="io.mycat.route.function.PartitionByLong">
  <property name="partitionCount">2,1</property>
  <property name="partitionLength">256,512</property>
</function>

配置說明：

標簽屬性	說明
columns	標識將要分片的表字段
algorithm	分片函數
partitionCount	分片個數列表
partitionLength	分片范圍列表

分區長度：
默認為最大 2^n=1024 ，即最大支持 1024 分區。

約束：
count，length 兩個數組的長度必須是一致的；
1024 = sum((count[i]*length[i]))
count 和 length 兩個向量的點積恆等於 1024。

如果需要平均分配設置：平均分為 4 分片，partitionCount*partitionLength=1024。

<function name="func1" class="io.mycat.route.function.PartitionByLong">
    <property name="partitionCount">4</property>
    <property name="partitionLength">256</property>
</function>

3、范圍約定

此分片適用於，提前規划好分片字段某個范圍屬於哪個分片。

<tableRule name="auto-sharding-long">
    <rule>
        <columns>user_id</columns>
        <algorithm>rang-long</algorithm>
    </rule>
</tableRule>
<function name="rang-long" class="io.mycat.route.function.AutoPartitionByLong">
    <property name="mapFile">autopartition-long.txt</property>
    <property name="defaultNode">0</property>
</function>

配置說明：

標簽屬性	說明
columns	標識將要分片的表字段
algorithm	分片函數
mapFile	標識配置文件名稱
defaultNode	超過范圍后的默認節點

所有的節點配置都是從 0 開始，及 0 代表節點 1，此配置非常簡單，即預先制定可能的 id 范圍到某個分片：

# range start-end ,data node index
# K=1000,M=10000.
0-500M=0
500M-1000M=1
1000M-1500M=2
或

0-10000000=0
10000001-20000000=1

4、取模

此規則為對分片字段求摸運算。

<tableRule name="mod-long">
    <rule>
        <columns>user_id</columns>
        <algorithm>mod-long</algorithm>
    </rule>
</tableRule>
<function name="mod-long" class="io.mycat.route.function.PartitionByMod">
    <!-- how many data nodes -->
    <property name="count">3</property>
</function>

配置說明：

標簽屬性	說明
columns	標識將要分片的表字段
algorithm	分片函數
count	分片數量

根據 id 進行十進制求模預算，相比固定分片 hash，此種在批量插入時可能存在批量插入單事務插入多數據分片，增大事務一致性難度。

5、按日期（天）分片

此規則為按天分片。

<tableRule name="sharding-by-date">
    <rule>
        <columns>create_time</columns>
        <algorithm>sharding-by-date</algorithm>
    </rule>
</tableRule>
<function name="sharding-by-date" class="io.mycat.route.function.PartitionByDate">
    <property name="dateFormat">yyyy-MM-dd</property>
    <property name="sBeginDate">2014-01-01</property>
    <property name="sEndDate">2014-01-02</property>
    <property name="sPartionDay">10</property>
</function>

配置說明：

標簽屬性	說明
columns	標識將要分片的表字段
algorithm	分片函數
dateForma	日期格式
sBeginDate	開始日期
sEndDate	結束日期
sPartionDay	分區天數，即默認從開始日期算起，分隔 10 天一個分區

如果配置了 sEndDate 則代表數據達到了這個日期的分片后循環從開始分片插入。
注意
在查詢時，如果需要查詢時間段應該使用between...and，使用>=或者<=會查詢所有分片。

6、取模范圍約束

此種規則是取模運算與范圍約束的結合，主要為了后續數據遷移做准備，即可以自主決定取模后數據的節點分布。

<tableRule name="sharding-by-pattern">
    <rule>TopESA - Win Cpp
        <columns>user_id</columns>
        <algorithm>sharding-by-pattern</algorithm>
    </rule>
</tableRule>
<function name="sharding-by-pattern" class="io.mycat.route.function.PartitionByPattern">
    <property name="patternValue">256</property>
    <property name="defaultNode">2</property>
    <property name="mapFile">partition-pattern.txt</property>
</function>

partition-pattern.txt

# id partition range start-end ,data node index
###### first host configuration
1-32=0
33-64=1
65-96=2
97-128=3
######## second host configuration
129-160=4
161-192=5
193-224=6
225-256=7
0-0=7

配置說明：

標簽屬性	說明
columns	標識將要分片的表字段
algorithm	分片函數
patternValue	求模基數
defaoultNod	默認節點
mapFile	配置文件路徑

配置文件中，1-32 即代表 id%256 后分布的范圍，如果在 1-32 則在分區 1，其他類推
如果 id 非數字，則會分配在 defaoultNode 默認節點。

7、截取數字做 hash 求模范圍約束

此種規則類似於取模范圍約束，此規則支持數據符號字母取模。

<tableRule name="sharding-by-prefixpattern">
    <rule>
        <columns>user_id</columns>
        <algorithm>sharding-by-prefixpattern</algorithm>
    </rule>
</tableRule>
<function name="sharding-by-pattern" class="io.mycat.route.function.PartitionByPrefixPattern">
    <property name="patternValue">256</property>
    <property name="prefixLength">5</property>
    <property name="mapFile">partition-pattern.txt</property>
</function>

配置說明：

標簽屬性	說明
columns	標識將要分片的表字段
algorithm	分片函數
patternValue	求模基數
prefixLength	ASCII 截取的位數
mapFile	配置文件路徑

partition-pattern.txt

# range start-end ,data node index
# ASCII
# 8-57=0-9 阿拉伯數字
# 64、65-90=@、A-Z
# 97-122=a-z
###### first host configuration
1-4=0
5-8=1
9-12=2
13-16=3
###### second host configuration
17-20=4
21-24=5
25-28=6
29-32=7
0-0=7

配置文件中，1-32 即代表 id%256 后分布的范圍，如果在 1-32 則在分區 1，其他類推。
此種方式類似取模范圍約束，只不過采取的是將列種獲取前 prefixLength 位列所有 ASCII 碼的和進行求模。
sum%patternValue ,獲取的值，在范圍內的分片數

8、應用指定

此規則是在運行階段有應用自主決定路由到那個分片。

<tableRule name="sharding-by-substring">
    <rule>
        <columns>user_id</columns>
        <algorithm>sharding-by-substring</algorithm>
    </rule>
</tableRule>
<function name="sharding-by-substring" class="io.mycat.route.function.PartitionDirectBySubString">
    <property name="startIndex">0</property><!-- zero-based -->
    <property name="size">2</property>
    <property name="partitionCount">8</property>
    <property name="defaultPartition">0</property>
</function>

配置說明：

標簽屬性	說明
columns	標識將要分片的表字段
algorithm	分片函數
partitionCount	分區數
defaultPartition	默認分區

此方法為直接根據字符子串（必須是數字）計算分區號（由應用傳遞參數，顯式指定分區號）。

例如：id=05-100000002，在此配置中代表根據 id 中從 startIndex=0，開始，截取 siz=2 位數字即 05，05 就是獲取的分區，如果沒傳默認分配到 defaultPartition。

9、截取數字 hash 解析

此規則是截取字符串中的 int 數值 hash 分片。

<tableRule name="sharding-by-stringhash">
    <rule>
        <columns>user_id</columns>
        <algorithm>sharding-by-stringhash</algorithm>
    </rule>
</tableRule>
<function name="sharding-by-stringhash" class="io.mycat.route.function.PartitionByString">
    <property name="partitionLength">512</property><!-- zero-based -->
    <property name="partitionCount">2</property>
    <property name="hashSlice">0:2</property>
</function>

配置說明：

標簽屬性	說明
columns	標識將要分片的表字段
algorithm	分片函數
partitionLength	字符串hash求模基數
partitionCount	分區數
hashSlice	預算位，即根據子字符串中 int 值 hash 運算。 0 means str.length(), -1 means str.length()-1

注意
hashSlice可以理解為substring（start，end），start為0則只表示0；
例1：值“45abc”，hash預算位0:2 ，取其中45進行計算
例2：值“aaaabbb2345”，hash預算位-4:0 ，取其中2345進行計算

10、一致性 hash

一致性 hash 預算有效解決了分布式數據的擴容問題。

<tableRule name="sharding-by-murmur">
    <rule>
        <columns>user_id</columns>
        <algorithm>murmur</algorithm>
    </rule>
</tableRule>
<function name="murmur" class="io.mycat.route.function.PartitionByMurmurHash">
    <!-- 默認是 0 -->
    <property name="seed">0</property>
    <!-- 要分片的數據庫節點數量，必須指定，否則沒法分片 -->
    <property name="count">2</property>
    <!-- 一個實際的數據庫節點被映射為這么多虛擬 節點，默認是 160 倍，也就是虛擬節點數是物理節點數的 160 倍 -->
    <property name="virtualBucketTimes">160</property>
    <!-- 節點的權重，沒有指定權重的節點默認是 1。以 properties 文件的格式填寫，以從 0 開始到 count-1 的整數值也就是節點索引為 key，以節點權重值為值。所有權重值必須是正整數，否則以 1 代替 -->
    <property name="weightMapFile">weightMapFile</property>
    <!-- 用於測試時觀察各物理節點與虛擬節點的分布情況，如果指定了這個屬性，會把虛擬節點的 murmur hash 值與物理節 點的映射按行輸出到這個文件，沒有默認值，如果不指定，就不會輸出任何東西 -->
    <property name="bucketMapPath">/etc/mycat/bucketMapPath</property>
</function>

11、按單月小時拆分

此規則是單月內按照小時拆分，最小粒度是小時，可以一天最多 24 個分片，最少 1 個分片，一個月完后下月從頭開始循環。每個月月尾，需要手工清理數據。

<tableRule name="sharding-by-hour">
    <rule>
        <columns>create_time</columns>
        <algorithm>sharding-by-hour</algorithm>
    </rule>
</tableRule>
<function name="sharding-by-hour" class="io.mycat.route.function.LatestMonthPartion">
    <property name="splitOneDay">24</property>
</function>

配置說明：

標簽屬性	說明
columns	標識將要分片的表字段（字符串類型yyyyMMddHH）
algorithm	分片函數
splitOneDay	一天切分的分片數

注意
分片字段必須為字符串格式，否則分片不成功，默認存到第一個分片里面；
保存的時間格式必須為‘yyyymmddHH’格式，不能多也不能少字符，否則分片不成功，默認存到第一個分片里面；

12、范圍求模分片

先進行范圍分片計算出分片組，組內再求模。
優點可以避免擴容時的數據遷移，又可以一定程度上避免范圍分片的熱點問題。
綜合了范圍分片和求模分片的優點，分片組內使用求模可以保證組內數據比較均勻，分片組之間是范圍分片，可以兼顧范圍查詢。
最好事先規划好分片的數量，數據擴容時按分片組擴容，則原有分片組的數據不需要遷移。由於分片組內數據比較均勻，所以分片組內可以避免熱點數據問題。

<tableRule name="auto-sharding-rang-mod">
    <rule>
        <columns>id</columns>
        <algorithm>rang-mod</algorithm>
    </rule>
</tableRule>
<function name="rang-mod" class="io.mycat.route.function.PartitionByRangeMod">
    <property name="mapFile">partition-range-mod.txt</property>
    <property name="defaultNode">21</property>
</function>

配置說明：

標簽屬性	說明
columns	標識將要分片的表字段
algorithm	分片函數
mapFile	配置文件路徑
defaultNode	超過范圍后的默認節點順序號，節點從 0 開始。

partition-range-mod.txt

# 以下配置一個范圍代表一個分片組，=號后面的數字代表該分片組所擁有的分片的數量。
# range start-end ,data node group size
0-200M=5 //代表有 5 個分片節點
200M1-400M=1
400M1-600M=4
600M1-800M=4
800M1-1000M=6

注意
如上0-200M存入到5個分片中，開始范圍-結束范圍=該分片組有多少個分片。如果超過配置范圍需要增加分片組。

13、日期范圍HASH分片

思想與范圍求模一致，當由於日期在取模會有數據集中問題，所以改成 hash 方法。
先根據日期分組，再根據時間 hash 使得短期內數據分布的更均勻。
優點可以避免擴容時的數據遷移，又可以一定程度上避免范圍分片的熱點問題。要求日期格式盡量精確些，不然達不到局部均勻的目的

<tableRule name="range-date-hash">
    <rule>
        <columns>col_date</columns>
        <algorithm>range-date-hash</algorithm>
    </rule>
</tableRule>
<function name="range-date-hash" class="io.mycat.route.function.PartitionByRangeDateHash">
    <property name="sBeginDate">2014-01-01 00:00:00</property>
    <property name="sPartionDay">365</property>
    <property name="dateFormat">yyyy-MM-dd HH:mm:ss</property>
    <property name="groupPartionSize">3</property>
</function>

配置說明：

標簽屬性	說明
columns	標識將要分片的表字段
algorithm	分片函數
sBeginDate	開始日期
sPartionDay	多少天一個分片
dateFormat	日期格式
groupPartionSize	分片組的大小

注意
從sBeginDate時間開始計算，每sPartionDay天的數據為一個分片組，每個分片組可以分布在groupPartionSize個分片上面。上面的例子最多可以有三天進行分片，如果超出則會拋出以下異常。

Cause: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Can't find a valid data node for specified node index :ALAN_TEST -> RANGE_DATE -> 2019-01-11 12:00:00 -> Index : 4
The error may involve com.mycat.test.model.AlanTest.insert-Inline
The error occurred while setting parameters

14、冷熱數據分片

根據日期查詢日志數據冷熱數據分布，最近 n 個月的到實時交易庫查詢，超過 n 個月的按照 m 天分片。

<tableRule name="sharding-by-date">
    <rule>
        <columns>create_time</columns>
        <algorithm>sharding-by-hotdate</algorithm>
    </rule>
</tableRule>
<function name="sharding-by-hotdate" class="io.mycat.route.function.PartitionByHotDate">
    <property name="dateFormat">yyyy-MM-dd</property>
    <property name="sLastDay">10</property>
    <property name="sPartionDay">30</property>
</function>

配置說明：

標簽屬性	說明
columns	標識將要分片的表字段
algorithm	分片函數
dateFormat	日期格式
sLastDay	熱數據的時間
sPartionDay	冷數據的分片天數（按照天數分片）

注意
冷數據按照這個范圍進行分片，例如上面的規則配置，今天是2019年1月21日，往前推10天為2019年1月12日，則2019年1月12日之前的數據為冷數據，該批冷數據的分片規則為30天一個分片，即2018-12-12至2019-01-11的數據放入第1個分片，2018-11-12至2018-12-11的數據放入第2個分片...以此類推，如果數據庫分區不夠，則在保存的時候會拋出以下異常

Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Can't find a valid data node for specified node index :ALAN_TEST -> CREATE_DATE -> 2018-11-09 12:00:00 -> Index : 3

15、自然月分片

按月份列分區，每個自然月一個分片，格式 between 操作解析的范例。

<tableRule name="sharding-by-month">
    <rule>
        <columns>create_time</columns>
        <algorithm>sharding-by-month</algorithm>
    </rule>
</tableRule>
<function name="sharding-by-month" class="io.mycat.route.function.PartitionByMonth">
    <property name="dateFormat">yyyy-MM-dd</property>
    <property name="sBeginDate">2014-01-01</property>
</function>

配置說明：

標簽屬性	說明
columns	標識將要分片的表字段
algorithm	分片函數
dateFormat	日期格式
sBeginDate	開始日期（無默認值）
"sEndDate	結束日期（無默認值）

注意

默認設置，節點數量必須是12個，每12個月循環從開始分片插入
如配置了sBeginDate="2019-01"月是第0個分片，從該時間按月遞增，無最大節點
配置了sBeginDate = "2015-01-01"sEndDate = "2015-12-01"該配置可以看成和第一個一致
配置了sBeginDate = "2015-01-01"sEndDate = "2015-03-01"該配置標識只有 3 個節點；很難與月份對應上；平均分散到 3 個節點上

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Mycat 分片規則詳解--日期（天）分片 Mycat 分片規則詳解--范圍取模分片 Mycat 分片規則詳解--ER關系表分片 mycat系列-Mycat 分片規則 MyCat的分片規則 mycat分片規則 Mycat 分片規則詳解--數據遷移及節點擴容 mycat的10種分片規則 MyCAT常用分片規則之分片枚舉 Mycat探索之旅（5）----常用的分片規則