mycat分片規則

本文轉載自查看原文 2017-11-01 12:17 2796 數據庫中間件

 
         配置：schema文件rule字段，rule文件name字段 
        

 
         （1）分片枚舉：sharding-by-intfile 
        

 
         （2）主鍵范圍：auto-sharding-long 
        

 
         （3）一致性hash：sharding-by-murmur 
        

 
         （4）字符串hash解析：sharding-by-stringhash 
        

 
         （5）按日期（天）分片：sharding-by-date 
        

 
         （6）按單月小時拆分：sharding-by-hour 
        

 
         （6）自然月分片：sharding-by-month 
        

 
         --------常見的10種分片方法-------- 
        

          1、 
         枚舉法 
        

 
         <tableRule name="sharding-by-intfile"> 
        

 
             <rule> 
        

 
               <columns>user_id</columns> 
        

 
               <algorithm>hash-int</algorithm> 
        

 
             </rule> 
        

 
           </tableRule> 
        

 
         <function name="hash-int" class="io.mycat.route.function.PartitionByFileMap"> 
        

 
             <property name="mapFile">partition-hash-int.txt</property> 
        

 
             <property name="type">0</property> 
        

 
             <property name="defaultNode">0</property> 
        

 
           </function> 
        

 
         理解： 
        

 
         切分規則根據文件(partition-hash-int.txt)。此種分片規則理解為枚舉分區，會比較適合於取值固定的場合，比如說性別（0,1），省份（固定值）。 
        

 
         優點： 
        

 
         用逗號分隔可以把多個值放在一個分區里面。 
        

 
         缺點： 
        

 
         其他非枚舉情況不適合。 
        

 
         枚舉分區：sharding-by-intfile 
        

          2、 
         范圍約定 
        

 
         <tableRule name="auto-sharding-long"> 
        

 
             <rule> 
        

 
               <columns>user_id</columns> 
        

 
               <algorithm>rang-long</algorithm> 
        

 
             </rule> 
        

 
           </tableRule> 
        

 
         <function name="rang-long" class="io.mycat.route.function.AutoPartitionByLong"> 
        

 
             <property name="mapFile">autopartition-long.txt</property> 
        

 
           </function> 
        

 
         理解： 
        

 
         切分規則根據文件(autopartition-long.txt)。一種范圍切分的方式，制定基准列的取值范圍，然后把這一范圍的所有數據都放到一個DN上面。 
        

 
         優點： 
        

 
         適用於整體數量可知或總數量為固定值的情況。 
        

 
         缺點： 
        

 
         dn划分節點是事先建好的，需要擴展時比較麻煩。 
        

 
         潛在的問題，如果在短時間發生海量的順序插入操作，而每一個DN（分庫）設定的數量比較高(比如說一個DN設定的放1000W條數據),那么在這個時候,會出現某一個DN（分庫）IO壓力非常高，而其他幾個DN（分庫）完全沒有IO操作，就會出現類似於DB中常見的熱塊/熱盤的現象。 
        

          3、 
         求模法 
        

 
         <tableRule name="mod-long"> 
        

 
             <rule> 
        

 
               <columns>user_id</columns> 
        

 
               <algorithm>mod-long</algorithm> 
        

 
             </rule> 
        

 
           </tableRule> 
        

 
           <function name="mod-long" class="io.mycat.route.function.PartitionByMod"> 
        

 
            <!-- how many data nodes  --> 
        

 
             <property name="count">3</property> 
        

 
           </function> 
        

 
         理解： 
        

 
         切分規則根據配置中輸入的數值n。此種分片規則將數據分成n份（通常dn節點也為n），從而將數據均勻的分布於各節點上。 
        

 
         優點： 
        

 
         這種策略可以很好的分散數據庫寫的壓力。比較適合於單點查詢的情景。 
        

 
         缺點： 
        

 
         一旦出現了范圍查詢，就需要MyCAT去合並結果，當數據量偏高的時候，這種跨庫查詢+合並結果消耗的時間有可能會增加很多，尤其是還出現了order by的時候。 
        

          4、 
         固定分片hash算法 
        

 
         <tableRule name="rule1"> 
        

 
             <rule> 
        

 
               <columns>user_id</columns> 
        

 
               <algorithm>func1</algorithm> 
        

 
             </rule> 
        

 
         </tableRule> 
        

 
           <function name="func1" class="io.mycat.route.function.PartitionByLong"> 
        

 
             <property name="partitionCount">2,1</property> 
        

 
             <property name="partitionLength">256,512</property> 
        

 
           </function> 
        

 
         理解： 
        

 
         切分規則根據配置中輸入的數值對。上面columns 標識將要分片的表字段，algorithm 分片函數，partitionCount 分片個數列表，partitionLength 分片范圍列表。（均分時比求模法更靈活） 
        

 
         分區長度：默認為最大2^n=1024 ,即最大支持1024分區 
        

 
         約束 :count,length兩個數組的長度必須是一致的。 
        

 
         優點： 
        

 
         這種策略比較靈活，可以均勻分配也可以非均勻分配，各節點的分配比例和容量大小由count,length兩個參數決定。 
        

 
         缺點： 
        

 
         跟求模法類似。 
        

          5、 
         日期列分區法 
        

 
         <tableRule name="sharding-by-date"> 
        

 
               <rule> 
        

 
                 <columns>create_time</columns> 
        

 
                 <algorithm>sharding-by-date</algorithm> 
        

 
               </rule> 
        

 
            </tableRule>  
        

 
         <function name="sharding-by-date" class="io.mycat.route.function..PartitionByDate"> 
        

 
            <property name="dateFormat">yyyy-MM-dd</property> 
        

 
             <property name="sBeginDate">2014-01-01</property> 
        

 
             <property name="sPartionDay">10</property> 
        

 
           </function> 
        

 
         理解： 
        

 
         切分規則根據配置中輸入的各項值。配置中配置了格式，開始日期，分區天數，即默認從開始日期算起，分隔10天一個分區。 
        

          6、 
         通配取模 
        

 
         <tableRule name="sharding-by-pattern"> 
        

 
               <rule> 
        

 
                 <columns>user_id</columns> 
        

 
                 <algorithm>sharding-by-pattern</algorithm> 
        

 
               </rule> 
        

 
            </tableRule> 
        

 
         <function name="sharding-by-pattern" class="io.mycat.route.function.PartitionByPattern"> 
        

 
             <property name="patternValue">256</property> 
        

 
             <property name="defaultNode">2</property> 
        

 
             <property name="mapFile">partition-pattern.txt</property> 
        

 
           </function> 
        

 
         理解： 
        

 
         切分規則根據配置中輸入的數值以及文件（partition-pattern.txt）。patternValue 即求模基數，defaoultNode 默認節點，如果不配置了默認，則默認是0即第一個結點。配置文件中，1-32 即代表id%256后分布的范圍，如果在1-32則在分區1，其他類推，如果id非數字數據，則會分配在defaoultNode 默認節點配置文件中，1-32 即代表id%256后分布的范圍，如果在1-32則在分區1，其他類推，如果id非數字數據，則會分配在defaoultNode 默認節點。 
        

 
         優點： 
        

 
         這種策略可以很好的分散數據庫寫的壓力。比較適合於單點查詢的情景。 
        

 
         缺點： 
        

 
         一旦出現了范圍查詢，就需要MyCAT去合並結果，當數據量偏高的時候，這種跨庫查詢+合並結果消耗的時間有可能會增加很多，尤其是還出現了order by的時候。 
        

          7、 
         ASCII求模通配 
        

 
         <tableRule name="sharding-by-prefixpattern"> 
        

 
               <rule> 
        

 
                 <columns>user_id</columns> 
        

 
                 <algorithm>sharding-by-prefixpattern</algorithm> 
        

 
               </rule> 
        

 
            </tableRule> 
        

 
         <function name="sharding-by-pattern" class="io.mycat.route.function.PartitionByPrefixPattern"> 
        

 
             <property name="patternValue">256</property> 
        

 
             <property name="prefixLength">5</property> 
        

 
             <property name="mapFile">partition-pattern.txt</property> 
        

 
           </function> 
        

 
         理解： 
        

 
         切分規則根據配置中輸入的數值及文件（partition-pattern.txt）。patternValue 即求模基數，prefixLength ASCII 截取的位數。此種方式類似方式6通配取模只不過采取的是將列種獲取前prefixLength位列所有ASCII碼的和進行求模sum%patternValue ,獲取的值，在通配范圍內的也就是分片數。 
        

          8、 
         編程指定 
        

 
         <tableRule name="sharding-by-substring"> 
        

 
               <rule> 
        

 
                 <columns>user_id</columns> 
        

 
                 <algorithm>sharding-by-substring</algorithm> 
        

 
               </rule> 
        

 
            </tableRule> 
        

 
         <function name="sharding-by-substring" class="io.mycat.route.function.PartitionDirectBySubString"> 
        

 
             <property name="startIndex">0</property> <!-- zero-based --> 
        

 
             <property name="size">2</property> 
        

 
             <property name="partitionCount">8</property> 
        

 
             <property name="defaultPartition">0</property> 
        

 
           </function> 
        

 
         理解： 
        

 
         此方法為直接根據字符子串（必須是數字）計算分區號（由應用傳遞參數，顯式指定分區號）。 
        

 
         例如id=05-100000002在此配置中代表根據id中從startIndex=0，開始，截取siz=2位數字即05，05就是獲取的分區，如果沒傳默認分配到defaultPartition。 
        

          9、 
         字符串拆分hash解析 
        

 
         <tableRule name="sharding-by-stringhash"> 
        

 
               <rule> 
        

 
                 <columns>user_id</columns> 
        

 
                 <algorithm>sharding-by-stringhash</algorithm> 
        

 
               </rule> 
        

 
            </tableRule> 
        

 
         <function name="sharding-by-substring" class="io.mycat.route.function.PartitionByString"> 
        

 
             <property name=length>512</property> <!-- zero-based --> 
        

 
             <property name="count">2</property> 
        

 
             <property name="hashSlice">0:2</property> 
        

 
           </function> 
        

 
         理解： 
        

 
         函數中length代表字符串hash求模基數，count分區數，hashSlice hash預算位 
        

 
         即根據子字符串 hash運算。 
        

          10、 
         一致性hash 
        

 
         <tableRule name="sharding-by-murmur"> 
        

 
               <rule> 
        

 
                 <columns>user_id</columns> 
        

 
                 <algorithm>murmur</algorithm> 
        

 
               </rule> 
        

 
            </tableRule> 
        

 
         <function name="murmur" class="io.mycat.route.function.PartitionByMurmurHash"> 
        

 
               <property name="seed">0</property><!-- 默認是0--> 
        

 
               <property name="count">2</property><!-- 要分片的數據庫節點數量，必須指定，否則沒法分片—> 
        

 
               <property name="virtualBucketTimes">160</property><!-- 一個實際的數據庫節點被映射為這么多虛擬節點，默認是160倍，也就是虛擬節點數是物理節點數的160倍--> 
        

 
               <!-- 
        

 
               <property name="weightMapFile">weightMapFile</property> 
        

 
                              節點的權重，沒有指定權重的節點默認是1。以properties文件的格式填寫，以從0開始到count-1的整數值也就是節點索引為key，以節點權重值為值。所有權重值必須是正整數，否則以1代替 --> 
        

 
               <!-- 
        

 
               <property name="bucketMapPath">/etc/mycat/bucketMapPath</property> 
        

 
                               用於測試時觀察各物理節點與虛擬節點的分布情況，如果指定了這個屬性，會把虛擬節點的murmur hash值與物理節點的映射按行輸出到這個文件，沒有默認值，如果不指定，就不會輸出任何東西 --> 
        

 
           </function> 
        

 
         優點： 
        

 
         一致性hash預算有效解決了分布式數據的擴容問題，前1-9中id規則都多少存在數據擴容難題，而10規則解決了數據擴容難點 
        

 
         上述整理的分片規則，部分驗證、詳細的理解以及優缺點信息還未補全，希望能與大家共同學習探討填補空缺。 
        

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Mycat分片規則詳解 MyCAT常用分片規則之分片枚舉 Mycat探索之旅（5）----常用的分片規則 Mycat 分片規則詳解--數據遷移及節點擴容 MyCat 介紹、分片規則、調優的內容收集 Mycat跨分片Join mycat分片及主從（二） Mycat 分片策略 mycat 中的dataHost和dataNode 以及如何按月分片 MyCat + PostgreSQL不支持二級子表分片