對siwft有些了解的人都知道,Ring是swift中非常核心的組件,它決定着數據如何在集群中分布。Swift根據設置的partition_power決定集群中的分區數量(2的partition_power次方),並根據一致性哈希算法將分區分配到不同的node上,並將數據分布到對應的分區上。
因此,構建Ring就成為swift初始化必須經歷的過程。簡單說來:
- 新的Ring創建的過程:
- ring-builder根據device weight計算出每個設備上應該被分配的分區的數量。(2的partition_power次方得到分區總數,再根據weight和設備數進行分配)
- ring-builder將每個分區的副本分配到對應的device上。
- 根據一個old ring創建新new ring的過程:
- 重新計算每個device上的分區數量;
- 收集需要被重新分配的分區:
- 1)將被移除的device上的所有分區添加到gathered list;
- 2)將由於添加新device而產生的需要被分配出去的分區添加到gathered list;
- 3)將所有device上經過重新分配后多出來的分區添加到gathered list。
- 使用上述“新的Ring創建的過程”的方法分配gathered list中的分區到devices中。
那么swift-ring-builder命令又是如何執行的呢?本文簡單旨在介紹swift-ring-builder命令,通過源碼可以發現,swift-ring-builder命令的功能基本上都是通過RingBuilder實例的相關方法實現的,因此更加原理和細節的東西,將會在后續閱讀RingBuilder的源碼后再進行總結。So,莫噴我掛羊頭賣狗肉啦 ^_~
1. swift-ring-builder 做了什么?
Rings是通過swift-ring-builder這個工具手動創建的,swift-ring-builder將分區與設備關聯,並將該數據寫入一個優化過的Python數據結構,壓縮、序列化后寫入磁盤,以供rings創建的數據可以被導入到服務器中。更新rings的機制非常簡單,服務器通過檢查創建rings的文件的最后更新日期來判斷它和自己內存中的版本哪一個更新,從而決定是否需要重新載入rings創建數據。本段中所說的“Python數據結構”是一個如下所示的字典輸出結構:
def to_dict(self): """ Returns a dict that can be used later with copy_from to restore a RingBuilder. swift-ring-builder uses this to pickle.dump the dict to a file and later load that dict into copy_from. """ return {'part_power': self.part_power, 'replicas': self.replicas, 'min_part_hours': self.min_part_hours, 'parts': self.parts, 'devs': self.devs, 'devs_changed': self.devs_changed, 'version': self.version, '_replica2part2dev': self._replica2part2dev, '_last_part_moves_epoch': self._last_part_moves_epoch, '_last_part_moves': self._last_part_moves, '_last_part_gather_start': self._last_part_gather_start, '_remove_devs': self._remove_devs}
swift-ring-builder命令的基本結構為:
swift-ring-builder <builder_file> <action> [params]
swift-ring-builder根據<action>執行相應的動作,生成builder file存儲在<builder_file>指定的文件中,生成指導創建ring的文件xxx.ring.gz。在此之前,它會將原來的<builder_file>和xxx.ring.gz備份到backups文件夾中。
圖1 swift-ring-builder創建的builder file和ring.gz
圖2 swift-ring-builder備份的builder file和ring.gz
add
create
list_parts
rebalance
remove
search
set_info
set_min_part_hours
set_weight
set_replicas
validate
write_ring
接下來我們對這些命令進行羅列,並作出相關解釋。英文的文檔內容可以通過直接運行“swift-ring-builder”命令獲得。
swift-ring-builder <builder_file> Shows information about the ring and the devices within.
顯示ring以及ring中設備的信息,swift-1.8.0中對device新增了一個region屬性 swift-ring-builder <builder_file> add z<zone>-<ip>:<port>/<device_name>_<meta> <weight> [z<zone>-<ip>:<port>/<device_name>_<meta> <weight>] ... Adds devices to the ring with the given information. No partitions will be assigned to the new device until after running 'rebalance'. This is so you can make multiple device changes and rebalance them all just once.
使用給出的信息添加新的設備到ring上。add操作不會分配partitions到新的設備上,只有運行了'rebalance'命令后才會進行分區的分配。
因此,這種機制可以允許你一次添加多個設備,並只執行一次rebalance實現對這些設備的分區分配。
swift-ring-builder <builder_file> create <part_power> <replicas> <min_part_hours> Creates <builder_file> with 2^<part_power> partitions and <replicas>. <min_part_hours> is number of hours to restrict moving a partition more than once.
使用2的<part_power>次方個分區和<replicas>副本數創建<builder_file>.<min_part_hour>是一個分區被連續移動兩次之間的最小時間間隔 swift-ring-builder <builder_file> list_parts <search-value> [<search-value>] .. Returns a 2 column list of all the partitions that are assigned to any of the devices matching the search values given. The first column is the assigned partition number and the second column is the number of device matches for that partition. The list is ordered from most number of matches to least. If there are a lot of devices to match against, this command could take a while to run.
返回一個兩列的列表,包含與搜索值相匹配的所有設備的所有分區。
第一列是關聯的分區編號
第二列是與分區匹配的設備編號
列表按匹配的編號大小從大到小排序,如果有很多設備與搜索符合,則這個命令需要多運行一會兒
swift-ring-builder <builder_file> rebalance Attempts to rebalance the ring by reassigning partitions that haven't been recently reassigned.
rebalance命令嘗試重新平衡環,通過重新分配分區最近沒有被重新分配的分區。
swift-ring-builder <builder_file> remove <search-value> [search-value ...] Removes the device(s) from the ring. This should normally just be used for a device that has failed. For a device you wish to decommission, it's best to set its weight to 0, wait for it to drain all its data, then use this remove command. This will not take effect until after running 'rebalance'. This is so you can make multiple device changes and rebalance them all just once.
remove命令將設備從ring中移除。一般情況下,這個命令應該僅用在那些失敗的設備上。
如果你想將一個設備退役掉,那么最好的方式是將它的weight設置為0,待它將其上所有的數據都移走之后,再使用這個命令移除設備。
remove操作不會重新分配partitions,只有運行了'rebalance'命令后才會進行分區的分配。因此,這種機制可以允許你一次添加刪除個設備,並只執行一次rebalance實現對這些設備的分區分配。
swift-ring-builder <builder_file> search <search-value> Shows information about matching devices.
顯示匹配的設備的信息 swift-ring-builder <builder_file> set_info <search-value> <ip>:<port>/<device_name>_<meta> [<search-value> <ip>:<port>/<device_name>_<meta>] ... For each search-value, resets the matched device's information. This information isn't used to assign partitions, so you can use 'write_ring' afterward to rewrite the current ring with the newer device information. Any of the parts are optional in the final <ip>:<port>/<device_name>_<meta> parameter; just give what you want to change. For instance set_info d74 _"snet: 5.6.7.8" would just update the meta data for device id 74.
set_info命令會重新設置每一個與<search-value>相匹配的設備信息。這個信息不會用來重新分配分區,因此你可以使用'write_ring'來直接重寫當前的ring。
<ip>:<port>/<device_name>_<meta>參數的任意一個部分都是可選的,你只需要給出你需要更改的部分。
比如,set_info d74 _"snet: 5.6.7.8"就僅僅會把id為74的設備的元數據更新為"snet: 5.6.7.8"
swift-ring-builder <builder_file> set_min_part_hours <hours> Changes the <min_part_hours> to the given <hours>. This should be set to however long a full replication/update cycle takes. We're working on a way to determine this more easily than scanning logs.
set_min_part_hours命令將<min_part_hours>設置為參數給定的<hours>.
這個時間應該被設置的至少滿足一個完整的replication/update周期。我們正在努力找到一個方法可以比看日志更簡單的決定這個時間
swift-ring-builder <builder_file> set_weight <search-value> <weight> [<search-value> <weight] ... Resets the devices' weights. No partitions will be reassigned to or from the device until after running 'rebalance'. This is so you can make multiple device changes and rebalance them all just once.
重新設置設備的weight。set_weight操作后,設備上的partition不會重新分配,只有運行了'rebalance'命令后才會進行分區的分配。
因此,這種機制可以允許你一次添加多個設備,並只執行一次rebalance實現對這些設備的分區分配。
swift-ring-builder <builder_file> set_replicas <replicas>
Changes the replica count to the given <replicas>. <replicas> may
be a floating-point value, in which case some partitions will have
floor(<replicas>) replicas and some will have ceiling(<replicas>)
in the correct proportions.A rebalance is needed to make the change take effect.
set_replicas命令用於使用參數中的<replicas>來設置副本數。
<replicas>可以是一個浮點數,因此在一些場景中一些分區的副本數可能是floor(<replicas>),也可能是(<replicas>),這取決於正確的比例。
需要執行一個rebalance命令來使副本設置生效。該命令是swift-1.8.0新增的。
swift-ring-builder <builder_file> validate Just runs the validation routines on the ring. 僅運行builder的validate方法,使ring生效
swift-ring-builder <builder_file> write_ring Just rewrites the distributable ring file. This is done automatically after a successful rebalance, so really this is only useful after one or more 'set_info' calls when no rebalance is needed but you want to send out the new device information.
write_ring命令僅是用來重寫分部環境下的ring文件。這個命令會在成功執行一個rebalance操作后唄自動執行。
因此,它僅在你執行了一次或多次'set_info'命令,不想rebalance卻想保留新信息時使用。
3. 參數格式
在進行search設備的時候,<search_value>的格式如下:
d<device_id>z<zone>-<ip>:<port>/<device_name>_<meta>
這個格式中的任意一個部分都是可選的,例如:
z1 Matches devices in zone 1 z1-1.2.3.4 Matches devices in zone 1 with the ip 1.2.3.4 1.2.3.4 Matches devices in any zone with the ip 1.2.3.4 z1:5678 Matches devices in zone 1 using port 5678 :5678 Matches devices that use port 5678 /sdb1 Matches devices with the device name sdb1 _shiny Matches devices with shiny in the meta data _"snet: 5.6.7.8" Matches devices with snet: 5.6.7.8 in the meta data [::1] Matches devices in any zone with the ip ::1 z1-[::1]:5678 Matches devices in zone 1 with ip ::1 and port 5678
下面是一個指定最精確的例子:
d74z1-1.2.3.4:5678/sdb1_"snet: 5.6.7.8"
4. 返回碼含義
0 = operation successful 1 = operation completed with warnings 2 = error