Ceph是一個可靠、自動重均衡、自動恢復的分布式存儲系統,根據場景划分可以將Ceph分為三大塊,分別是對象存儲、塊設備和文件系統服務。塊設備存儲是Ceph的強項。
Ceph的主要優點是分布式存儲,在存儲每一個數據時,都會通過計算得出該數據存儲的位置,盡量將數據分布均衡,不存在傳統的單點故障的問題,可以水平擴展。
Ceph架構
Ceph架構
RADOS自身是一個完整的分布式對象存儲系統,它具有可靠、智能、分布式等特性,Ceph的高可靠、高可拓展、高性能、高自動化都是由這一層來提供的,用戶數據的存儲最終也都是通過這一層來進行存儲的,RADOS可以說就是Ceph的核心組件。
RADOS系統主要由兩部分組成,分別是OSD和Monitor。
基於RADOS層的上一層是LIBRADOS,LIBRADOS是一個庫,它允許應用程序通過訪問該庫來與RADOS系統進行交互,支持多種編程語言,比如C、C++、Python等。
基於LIBRADOS層開發的又可以看到有三層,分別是RADOSGW、RBD和CEPH FS。
RADOSGW:RADOSGW是一套基於當前流行的RESTFUL協議的網關,並且兼容S3和Swift。
RBD:RBD通過Linux內核客戶端和QEMU/KVM驅動來提供一個分布式的塊設備。
CEPH FS:CEPH FS通過Linux內核客戶端和FUSE來提供一個兼容POSIX的文件系統。
Ceph核心組件RADOS
RADOS系統主要由兩部分組成,分別是OSD和Monitor。
Ceph OSD:OSD的英文全稱是Object Storage Device,它的主要功能是存儲數據、復制數據、平衡數據、恢復數據等,與其它OSD間進行心跳檢查等,並將一些變化情況上報給Ceph Monitor。一般情況下一塊硬盤對應一個OSD,由OSD來對硬盤存儲進行管理,當然一個分區也可以成為一個OSD。
Ceph Monitor:由該英文名字我們可以知道它是一個監視器,負責監視Ceph集群,維護Ceph集群的健康狀態,同時維護着Ceph集群中的各種Map圖,比如OSD Map、Monitor Map、PG Map和CRUSH Map,這些Map統稱為Cluster Map,Cluster Map是RADOS的關鍵數據結構,管理集群中的所有成員、關系、屬性等信息以及數據的分發,比如當用戶需要存儲數據到Ceph集群時,OSD需要先通過Monitor獲取最新的Map圖,然后根據Map圖和object id等計算出數據最終存儲的位置。
為保證高可用性, Ceph 存儲集群應該保存兩份以上的對象副本。Ceph OSD 守護進程自動在其它 Ceph 節點上創建對象副本來確保數據安全和高可用性。
Ceph 監視器維護着集群運行圖的主副本。為保證高可用性,監視器也實現了集群化。一個監視器集群確保了當某個監視器失效時的高可用性。
Ceph數據分布算法
Ceph是為大規模分布式存儲而設計的,數據分布算法必須能夠滿足在大規模的集群下數據依然能夠快速的准確的計算存放位置,同時能夠在硬件故障或擴展硬件設備時做到盡可能小的數據遷移,Ceph的CRUSH算法就是精心為這些特性設計的。
在說明CRUSH算法的基本原理之前,先介紹幾個概念和它們之間的關系。
Object: 當用戶要將數據存儲到Ceph集群時,存儲數據都會被分割成多個Object,每個Object都有一個object id,每個Object的大小是可以設置的,默認是4MB,Object可以看成是Ceph存儲的最小存儲單元。
PG:由於Object的數量很多,所以Ceph引入了PG的概念用於管理Object,每個Object最后都會通過CRUSH計算映射到某個PG中,一個PG可以包含多個Object。
PG與OSD的關系:PG也需要通過CRUSH計算映射到OSD中去存儲,如果是二副本的,則每個PG都會映射到二個OSD,比如[OSD#1,OSD#2],那么OSD#1是存放該PG的主副本,OSD#2是存放該PG的從副本,保證了數據的冗余。
把對象映射到歸置組在 OSD 和客戶端間創建了一個間接層。由於 Ceph 集群必須能增大或縮小、並動態地重均衡。如果讓客戶端“知道”哪個 OSD 有哪個對象,就會導致客戶端和 OSD 緊耦合;相反, CRUSH 算法把對象映射到歸置組、然后再把各歸置組映射到一或多個 OSD ,這一間接層可以讓 Ceph 在 OSD 守護進程和底層設備上線時動態地重均衡。下列圖表描述了 CRUSH 如何將對象映射到歸置組、再把歸置組映射到 OSD 。
PG和PGP的關系:pg是用來存放object的,pgp相當於是pg存放osd的一種排列組合,我舉個例子,比如有3個osd,osd.1、osd.2和osd.3,副本數是2,如果pgp的數目為1,那么pg存放的osd組合就只有一種,可能是[osd.1,osd.2],那么所有的pg主從副本分別存放到osd.1和osd.2,如果pgp設為2,那么其osd組合可以兩種,可能是[osd.1,osd.2]和[osd.1,osd.3],是不是很像我們高中數學學過的排列組合,pgp就是代表這個意思。一般來說應該將pg和pgp的數量設置為相等。
object、pg、pool、osd、存儲磁盤的關系
本質上CRUSH算法是根據存儲設備的權重來計算數據對象的分布的,權重的設計可以根據該磁盤的容量和讀寫速度來設置,比如根據容量大小可以將1T的硬盤設備權重設為1,2T的就設為2,在計算過程中,CRUSH是根據Cluster Map、數據分布策略和一個隨機數共同決定數組最終的存儲位置的。
Cluster Map里的內容信息包括存儲集群中可用的存儲資源及其相互之間的空間層次關系,比如集群中有多少個支架,每個支架中有多少個服務器,每個服務器有多少塊磁盤用以OSD等。
數據分布策略是指可以通過Ceph管理者通過配置信息指定數據分布的一些特點,比如管理者配置的故障域是Host,也就意味着當有一台Host起不來時,數據能夠不丟失,CRUSH可以通過將每個pg的主從副本分別存放在不同Host的OSD上即可達到,不單單可以指定Host,還可以指定機架等故障域,除了故障域,還有選擇數據冗余的方式,比如副本數或糾刪碼。
CEPH網絡配置參考
網絡配置對構建高性能 Ceph 存儲集群來說相當重要。 Ceph 存儲集群不會代表 Ceph 客戶端執行請求路由或調度,相反, Ceph 客戶端(如塊設備、 CephFS 、 REST 網關)直接向 OSD 請求,然后OSD為客戶端執行數據復制,也就是說復制和其它因素會額外增加集群網的負載。
我們的快速入門配置提供了一個簡陋的 Ceph 配置文件,其中只設置了監視器 IP 地址和守護進程所在的主機名。如果沒有配置集群網,那么 Ceph 假設你只有一個“公共網”。只用一個網可以運行 Ceph ,但是在大型集群里用單獨的“集群”網可顯著地提升性能。
我們建議用兩個網絡運營 Ceph 存儲集群:一個公共網(前端)和一個集群網(后端)。為此,各節點得配備多個網卡。
運營兩個獨立網絡的考量主要有:
1. 性能: OSD 為客戶端處理數據復制,復制多份時 OSD 間的網絡負載勢必會影響到客戶端和 Ceph 集群的通訊,包括延時增加、產生性能問題;恢復和重均衡也會顯著增加公共網延時。關於 Ceph 如何復制參見伸縮性和高可用性;關於心跳流量參見監視器與 OSD 的交互。
2. 安全: 大多數人都是良民,很少的一撮人喜歡折騰拒絕服務攻擊( DoS )。當 OSD 間的流量失控時,歸置組再也不能達到 active + clean 狀態,這樣用戶就不能讀寫數據了。挫敗此類攻擊的一種好方法是維護一個完全獨立的集群網,使之不能直連互聯網;另外,請考慮用消息簽名防止欺騙攻擊。
使用ceph-deploy工具部署ceph
官方中文文檔:
http://docs.ceph.org.cn/
實驗環境
10.30.1.221 192.168.9.211 ceph-host-01
10.30.1.222 192.168.9.212 ceph-host-02
10.30.1.223 192.168.9.213 ceph-host-03
10.30.1.224 192.168.9.214 ceph-host-04
系統:CentOS7.6
每個主機上有2塊空閑盤
ceph集群節點系統這里采用了centos7.6 64位。總共5台ceph節點機,每台節點機啟動2個osd角色,每個osd對應一塊物理磁盤。
對於Ceph 10.x,最好使用4.x內核。如果必須使用老內核,你應該使用FUSE作為客戶端
升級系統內核
cat >>/etc/yum.repos.d/CentOS-altarch.repo<<EOF
# CentOS-Base.repo
#
# The mirror system uses the connecting IP address of the client and the
# update status of each mirror to pick mirrors that are updated to and
# geographically close to the client. You should use this for CentOS updates
# unless you are manually picking other mirrors.
#
# If the mirrorlist= does not work for you, as a fall back you can try the
# remarked out baseurl= line instead.
#
#
[kernel]
name=CentOS-$releasever - Kernel
baseurl=https://mirrors.tuna.tsinghua.edu.cn/centos-altarch/7/kernel/x86_64/
enabled=1
gpgcheck=0
EOF
yum clean all
yum install kernel -y
更新引導
grub2-mkconfig -o /boot/grub2/grub.cfg
grub2-set-default 0
系統優化
echo '* - nofile 65535' >> /etc/security/limits.conf
ulimit -SHn 65535
cat > /etc/sysctl.conf <<EOF
kernel.sysrq = 0
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
net.core.wmem_default = 8388608
net.core.rmem_default = 8388608
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.netdev_max_backlog = 262144
net.core.somaxconn = 262144
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.ip_forward = 0
net.ipv4.ip_local_port_range = 5000 65000
net.ipv4.tcp_fin_timeout = 1
net.ipv4.tcp_keepalive_time = 30
net.ipv4.tcp_max_orphans = 3276800
net.ipv4.tcp_max_syn_backlog = 262144
net.ipv4.tcp_max_tw_buckets = 6000
net.ipv4.tcp_mem = 94500000 915000000 927000000
net.ipv4.tcp_no_metrics_save=1
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_sack = 1
net.ipv4.tcp_syn_retries = 1
net.ipv4.tcp_synack_retries = 1
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_wmem = 4096 16384 16777216
fs.file-max=65536
fs.inotify.max_queued_events=99999999
fs.inotify.max_user_watches=99999999
fs.inotify.max_user_instances=65535
net.core.default_qdisc=fq
EOF
sysctl -p
關閉selinux和防火牆
setenforce 0
sed -i 's/SELINUX=.*/SELINUX=disabled/' /etc/selinux/config
systemctl stop firewalld
systemctl disable firewalld
systemctl disable NetworkManager
systemctl stop NetworkManager
安裝網絡守時服務
Openstack節點之間必須時間同步,不然可能會導致創建雲主機不成功。
# yum install chrony -y
# vim /etc/chrony.conf #修改NTP配置
server
0.centos.pool.ntp.org iburst
server
1.centos.pool.ntp.org iburst
server
2.centos.pool.ntp.org iburst
server
3.centos.pool.ntp.org iburst
# systemctl enable chronyd.service#設置NTP服務開機啟動
# systemctl start chronyd.service#啟動NTP對時服務
# chronyc sources#驗證NTP對時服務
210 Number of sources = 1
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^? ControllerNode 0 6 0 - +0ns[ +0ns] +/- 0ns
設置時區
timedatectl set-timezone Asia/Shanghai
常用軟件包安裝
yum install -y vim net-tools wget lrzsz deltarpm tree screen lsof tcpdump nmap sysstat iftop
更換centos源
wget -O /etc/yum.repos.d/CentOS-Base.repo
http://mirrors.aliyun.com/repo/Centos-7.repo
提前安裝好epel源
yum install epel-release -y
注:使用阿里的epel源會使安裝變快點
wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo
1.安裝ceph-deloy
1.1配置主機名,配置host文件,本例ceph-deploy安裝在其中一個節點上。
[root@ceph-host-01 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.30.1.221 ceph-host-01
10.30.1.222 ceph-host-02
10.30.1.223 ceph-host-03
10.30.1.224 ceph-host-04
注:主機名一定要於/etc/hosts中的一致
1.2使用ssh-keygen生成key,並用ssh-copy-id復制key到各節點機。
[root@ceph-host-01 ~]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:iVPfxuQVphRA8v2//XsM+PxzWjYrx5JnnHTbBdNYwTw root@ceph-host-01
The key's randomart image is:
+---[RSA 2048]----+
| ..o.o.=..|
| o o o E.|
| . . + .+.|
| o o = o+ .|
| o S . =..o |
| . .. .oo|
| o=+X|
| +o%X|
| B*X|
+----[SHA256]-----+
以將key復制到ceph-host-02為例
[root@ceph-host-01 ~]# ssh-copy-id ceph-host-02
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'ceph-host-02 (10.30.1.222)' can't be established.
ECDSA key fingerprint is SHA256:VsMfdmYFzxV1dxKZi2OSp8QluRVQ1m2lT98cJt4nAFU.
ECDSA key fingerprint is MD5:de:07:2f:5c:13:9b:ba:0b:e5:0e:c2:db:3e:b8:ab:bd.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@ceph-host-02's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'ceph-host-02'"
and check to make sure that only the key(s) you wanted were added.
1.3安裝ceph-deploy.
安裝前我們配置下yum源,這里使用的是較新的nautilus版本
[root@ceph-host-01 ~]# cat /etc/yum.repos.d/ceph.repo
[Ceph]
name=Ceph packages for $basearch
baseurl=http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/$basearch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
[Ceph-noarch]
name=Ceph noarch packages
baseurl=http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/noarch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
[ceph-source]
name=Ceph source packages
baseurl=http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/SRPMS
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
注:直接安裝官方ceph源的命令如下
yum install -y
https://mirrors.aliyun.com/ceph/rpm-nautilus/el7/noarch/ceph-release-1-0.el7.noarch.rpm
[root@ceph-host-01 ~]# yum install ceph-deploy python-setuptools python2-subprocess32 -y
2.創建ceph monitor角色
2.1在使用ceph-deploy部署的過程中會產生一些配置文件,建議先創建一個目錄,例如cpeh-cluster
[root@ceph-host-01 ~]# mkdir -pv ceph-cluster
[root@ceph-host-01 ~]# cd ceph-cluster
2.2初始化mon節點,准備創建集群:
[root@ceph-host-01 ceph-cluster]# ceph-deploy new ceph-host-01 ceph-host-02 ceph-host-03
更改生成的 ceph 集群配置文件
[root@ceph-host-01 ceph-cluster]# cat ceph.conf
[global]
fsid = a480fcef-1c4b-48cb-998d-0caed867b5eb
mon_initial_members = ceph-host-01, ceph-host-02, ceph-host-03
mon_host = 10.30.1.221,10.30.1.222,10.30.1.223
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
mon clock drift allowed = 2
mon clock drift warn backoff = 30
public_network = 10.30.1.0/24
cluster_network = 192.168.9.0/24
max_open_files = 131072
mon_pg_warn_max_per_osd = 1000
mon_max_pg_per_osd = 1000
osd pool default pg num = 256
osd pool default pgp num = 256
osd pool default size = 3
osd pool default min size = 1
mon_osd_full_ratio = .90
mon_osd_nearfull_ratio = .80
osd_deep_scrub_randomize_ratio = 0.01
[mon]
mon_allow_pool_delete = true
mon_osd_down_out_interval = 600
mon_osd_min_down_reporters = 3
[mgr]
mgr modules = dashboard
[mds]
mds cache memory limit = 10737418240
mds cache size = 250000
mds_max_export_size = 20971520
mds_bal_interval = 10
mds_bal_sample_interval = 3.000000
[osd]
osd_journal_size = 20480
osd_max_write_size = 1024
osd mkfs type = xfs
osd_recovery_op_priority = 1
osd_recovery_max_active = 1
osd_recovery_max_single_start = 1
osd_recovery_threads = 1
osd_recovery_max_chunk = 1048576
osd_max_backfills = 1
osd_scrub_begin_hour = 22
osd_scrub_end_hour = 7
osd_recovery_sleep = 0
[client]
rbd_cache = true
rbd_cache_writethrough_until_flush = true
rbd_concurrent_management_ops = 10
rbd_cache_size = 67108864
rbd_cache_max_dirty = 50331648
rbd_cache_target_dirty = 33554432
rbd_cache_max_dirty_age = 2
rbd_default_format = 2
注:以上是經過考慮后的優化配置,生產環境對配置進行增刪后謹慎使用
2.3所有節點安裝ceph程序
使用ceph-deploy來安裝ceph程序,也可以單獨到每個節點上手動安裝ceph,根據配置的yum源不同,會安裝不同版本的ceph
[root@ceph-host-01 ceph-cluster]# ceph-deploy install --no-adjust-repos ceph-host-01 ceph-host-02 ceph-host-03 ceph-host-04
# 不加--no-adjust-repos 會一直使用ceph-deploy提供的默認的源,很坑
提示:若需要在集群各節點獨立安裝ceph程序包,其方法如下:
# yum install -y
https://mirrors.aliyun.com/ceph/rpm-nautilus/el7/noarch/ceph-release-1-0.el7.noarch.rpm
# yum install ceph ceph-radosgw -y
2.4配置初始mon節點,並收集所有密鑰
[root@ceph-host-01 ceph-cluster]# ceph-deploy mon create-initial
2.5查看啟動服務
# ps -ef|grep ceph
ceph 1916 1 0 12:05 ? 00:00:03 /usr/bin/ceph-mon -f --cluster ceph --id ceph-host-01 --setuser ceph --setgroup ceph
2.6在管理節點把配置文件和 admin 密鑰拷貝到管理節點和 Ceph 節點
[root@ceph-host-01 ceph-cluster]# ceph-deploy admin ceph-host-01 ceph-host-02 ceph-host-03 ceph-host-04
在每個節點上賦予 ceph.client.admin.keyring 有操作權限
# chmod +r /etc/ceph/ceph.client.admin.keyring
或者使用ansible批量給ceph節點添加權限
# ansible ceph -a 'chmod +r /etc/ceph/ceph.client.admin.keyring'
3.創建ceph osd角色(osd部署)
新版ceph-deploy直接使用create
相當於prepare,activate,osd create --bluestore
ceph-deploy osd create --data /dev/vdb ceph-host-01
ceph-deploy osd create --data /dev/vdb ceph-host-02
ceph-deploy osd create --data /dev/vdb ceph-host-03
ceph-deploy osd create --data /dev/vdb ceph-host-04
注:如果磁盤已經有數據一定要擦除,示范命令如下
ceph-deploy disk zap ceph-host-02 /dev/vdb
4.創建mgr角色
自從ceph 12開始,manager是必須的。應該為每個運行monitor的機器添加一個mgr,否則集群處於WARN狀態。
[root@ceph-host-01 ceph-cluster]# ceph-deploy mgr create ceph-host-01 ceph-host-02 ceph-host-03
5.查看集群健康狀態
[root@ceph-host-03 ~]# ceph health
HEALTH_OK
[root@ceph-host-03 ~]# ceph -s
cluster:
id: 02e63c58-5200-45c9-b592-07624f4893a5
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph-host-01,ceph-host-02,ceph-host-03 (age 59m)
mgr: ceph-host-01(active, since 4m), standbys: ceph-host-02, ceph-host-03
osd: 4 osds: 4 up (since 87m), 4 in (since 87m)
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 5.0 GiB used, 90 GiB / 95 GiB avail
pgs:
再添加osd
ceph-deploy osd create --data /dev/vdc ceph-host-01
ceph-deploy osd create --data /dev/vdc ceph-host-02
ceph-deploy osd create --data /dev/vdc ceph-host-03
ceph-deploy osd create --data /dev/vdc ceph-host-04
查看狀態
[root@ceph-host-01 ceph-cluster]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.18585 root default
-3 0.03717 host ceph-host-01
0 hdd 0.01859 osd.0 up 1.00000 1.00000
4 hdd 0.01859 osd.4 up 1.00000 1.00000
-5 0.03717 host ceph-host-02
1 hdd 0.01859 osd.1 up 1.00000 1.00000
5 hdd 0.01859 osd.5 up 1.00000 1.00000
-7 0.03717 host ceph-host-03
2 hdd 0.01859 osd.2 up 1.00000 1.00000
6 hdd 0.01859 osd.6 up 1.00000 1.00000
-9 0.03717 host ceph-host-04
3 hdd 0.01859 osd.3 up 1.00000 1.00000
7 hdd 0.01859 osd.7 up 1.00000 1.00000
注:查看每個osd的權重和磁盤使用情況
[root@ceph-host-02 ceph-cluster]# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
0 hdd 1.81940 1.00000 1.8 TiB 732 GiB 731 GiB 8 KiB 1.2 GiB 1.1 TiB 39.30 1.18 86 up
1 hdd 1.81940 1.00000 1.8 TiB 956 GiB 955 GiB 40 KiB 1.5 GiB 907 GiB 51.33 1.54 85 up
2 hdd 1.81940 1.00000 1.8 TiB 826 GiB 825 GiB 48 KiB 1.5 GiB 1.0 TiB 44.36 1.33 74 up
3
hdd 5.45799 1.00000 5.5 TiB 1.0 GiB 12 MiB 0 B 1 GiB 5.5 TiB 0.02 0 90 up
4 hdd 1.81940 1.00000 1.8 TiB 939 GiB 938 GiB 39 KiB 1.5 GiB 924 GiB 50.42 1.51 89 up
5 hdd 1.81940 1.00000 1.8 TiB 1.0 TiB 1.0 TiB 3 KiB 1.9 GiB 834 GiB 55.24 1.66 109 up
6 hdd 1.81940 1.00000 1.8 TiB 808 GiB 806 GiB 52 KiB 1.4 GiB 1.0 TiB 43.36 1.30 90 up
7 hdd 1.81940 1.00000 1.8 TiB 919 GiB 917 GiB 48 KiB 1.5 GiB 945 GiB 49.30 1.48 88 up
TOTAL 18 TiB 6.1 TiB 6.1 TiB 240 KiB 11 GiB 12 TiB 33.34
MIN/MAX VAR: 0/1.66 STDDEV: 18.43
[root@ceph-host-02 ceph-cluster]# ceph osd status
+----+--------------+-------+-------+--------+---------+--------+---------+-----------+
| id | host | used | avail | wr ops | wr data | rd ops | rd data | state |
+----+--------------+-------+-------+--------+---------+--------+---------+-----------+
| 0 | ceph-host-01 | 732G | 1130G | 1 | 4096k | 0 | 0 | exists,up |
| 1 | ceph-host-01 | 956G | 906G | 4 | 19.2M | 0 | 0 | exists,up |
| 2 | ceph-host-02 | 826G | 1036G | 0 | 3276k | 0 | 0 | exists,up |
| 3 | ceph-host-02 | 1035M | 5588G | 0 | 0 | 0 | 0 | exists,up |
| 4 | ceph-host-03 | 939G | 923G | 3 | 14.4M | 0 | 0 | exists,up |
| 5 | ceph-host-03 | 1029G | 833G | 0 | 3413k | 0 | 0 | exists,up |
| 6 | ceph-host-04 | 808G | 1054G | 4 | 16.0M | 0 | 0 | exists,up |
| 7 | ceph-host-04 | 918G | 944G | 2 | 10.4M | 0 | 0 | exists,up |
+----+--------------+-------+-------+--------+---------+--------+---------+-----------+
查看掛載
[root@ceph-host-02 ~]# df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/vda1 xfs 20G 1.5G 19G 8% /
devtmpfs devtmpfs 475M 0 475M 0% /dev
tmpfs tmpfs 496M 0 496M 0% /dev/shm
tmpfs tmpfs 496M 13M 483M 3% /run
tmpfs tmpfs 496M 0 496M 0% /sys/fs/cgroup
tmpfs tmpfs 100M 0 100M 0% /run/user/0
tmpfs tmpfs 496M 52K 496M 1% /var/lib/ceph/osd/ceph-1
tmpfs tmpfs 496M 52K 496M 1% /var/lib/ceph/osd/ceph-5
注1:mon和mgr角色其實各一個節點就行,但是為了保障高可用,推薦多節點,示范添加命令如下
# ceph-deploy --overwrite-conf mon add ceph-host-03
# ceph-deploy --overwrite-conf mgr create ceph-host-03
注2:如果需要把某個mon節點剔除,示范命令如下
ceph-deploy mon destroy ceph-host-02
注3:當某個節點無法加入到mon集群中時,需要檢查各mon節點的ceph.conf配置,要保持一致並有新的節點在配置中步驟如下
ceph-deploy mon destroy ceph-host-02
確保deploy節點的ceph-cluster/ceph.conf和/etc/ceph/ceph.conf如下
[global]
fsid = a480fcef-1c4b-48cb-998d-0caed867b5eb
mon_initial_members = ceph-host-01, ceph-host-02, ceph-host-03
mon_host = 10.30.1.221,10.30.1.222,10.30.1.223
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
mon clock drift allowed = 2
mon clock drift warn backoff = 30
# 網絡配置
public_network = 10.30.1.0/24
cluster_network = 192.168.9.0/24
max_open_files = 131072
mon_pg_warn_max_per_osd = 1000
mon_max_pg_per_osd = 1000
osd pool default size = 3
osd pool default min size = 2
mon_osd_full_ratio = .90
mon_osd_nearfull_ratio = .80
osd_deep_scrub_randomize_ratio = 0.01
[mon]
mon_allow_pool_delete = true
[mgr]
mgr modules = dashboard
[mds]
mds cache memory limit = 10737418240
mds cache size = 250000
[osd]
osd_max_write_size = 1024
osd_recovery_op_priority = 1
osd_recovery_max_active = 1
osd_recovery_max_single_start = 1
osd_recovery_max_chunk = 1048576
osd_recovery_threads = 1
osd_max_backfills = 1
osd_scrub_begin_hour = 22
osd_scrub_end_hour = 7
osd_recovery_sleep = 0
osd_crush_update_on_start = false
ceph-deploy --overwrite-conf mon add ceph-host-02
注4:把deploy節點的ceph.conf配置推送到別的機器可以使用如下命令
# ceph-deploy --overwrite-conf config push ceph-host-01 ceph-host-02 ceph-host-04
重啟服務使參數修改生效,命令:
systemctl restart ceph-mgr.target
systemctl restart ceph.target
5.創建和刪除ceph存儲池
5.1創建
[root@ceph-host-01 ceph-cluster]# ceph osd pool create volumes 128
pool 'volumes' created
5.2刪除
[root@ceph-host-02 ~]# ceph osd pool rm volumes volumes --yes-i-really-really-mean-it
pool 'volumes' removed
6.部署CEPH-FS
6.1簡介
cephfs是ceph提供的兼容POSIX協議的文件系統,對比rbd和rgw功能,這個是ceph里最晚滿足production ready的一個功能,它底層還是使用rados存儲數據
使用cephfs的兩種方式
1. cephfs kernel module
2. cephfs-fuse
從上面的架構可以看出,cephfs-fuse的IO path比較長,性能會比cephfs kernel module的方式差一些;
client端訪問cephfs的流程
1. client端與mds節點通訊,獲取metadata信息(metadata也存在osd上)
2. client直接寫數據到osd
6.2 示范操作
http://docs.ceph.com/docs/master/rados/operations/placement-groups/
至少在一個節點運行ceph-mds守護進程
[root@ceph-host-01 ~]# cd ceph-cluster/
[root@ceph-host-01 ceph-cluster]# ceph-deploy mds create ceph-host-01 ceph-host-02
創建存儲池
[root@ceph-host-01 ceph-cluster]# ceph osd pool create data 128
[root@ceph-host-01 ceph-cluster]# ceph osd pool create metadata 128
激活文件系統
[root@ceph-host-01 ceph-cluster]# ceph fs new cephfs metadata data
查看文件系統
[root@ceph-host-01 ceph-cluster]# ceph fs ls
name: cephfs, metadata pool: metadata, data pools: [data ]
[root@ceph-host-01 ceph-cluster]# ceph mds stat
cephfs:1 {0=ceph-host-02=up:active} 2 up:standby
[root@ceph-host-01 ceph-cluster]# ceph fs status cephfs
cephfs - 0 clients
======
+------+--------+--------------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+--------------+---------------+-------+-------+
| 0 | active | ceph-host-01 | Reqs: 0 /s | 10 | 13 |
+------+--------+--------------+---------------+-------+-------+
+----------+----------+-------+-------+
| Pool | type | used | avail |
+----------+----------+-------+-------+
| metadata | metadata | 1024k | 580G |
| data | data | 0 | 580G |
+----------+----------+-------+-------+
+--------------+
| Standby MDS |
+--------------+
| ceph-host-02 |
+--------------+
MDS version: ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable)
雖然支持多 active mds並行運行,但官方文檔建議保持一個active mds,其他mds作為standby
注1:創建多個Cephfs
[root@ceph-host-01 ~]# ceph osd pool create nova-data 128
pool 'nova-data' created
You have new mail in /var/spool/mail/root
[root@ceph-host-01 ~]# ceph osd pool create nova-metadata 128
pool 'nova-metadata' created
直接創建第二個Cephfs報錯如下:
[root@ceph-host-01 ~]# ceph fs new nova nova-metadata nova-data
Error EINVAL: Creation of multiple filesystems is disabled. To enable this experimental feature, use 'ceph fs flag set enable_multiple true'
解決辦法:
[root@ceph-host-01 ~]# ceph fs flag set enable_multiple true --yes-i-really-mean-it
[root@ceph-host-01 ~]# ceph fs new nova nova-metadata nova-data
new fs with metadata pool 22 and data pool 21
特別說明:ceph的mds是一個單獨的daemon,它只能服務於一個cephfs,若cephfs指定多個rank了,它只能服務於其中一個rank
查看cephfs狀態
[root@ceph-host-01 ~]# ceph mds stat
cephfs:1 nova:1 {cephfs:0=ceph-host-01=up:active,nova:0=ceph-host-02=up:active}
[root@ceph-host-01 ~]# ceph fs status cephfs
cephfs - 1 clients
======
+------+--------+--------------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+--------------+---------------+-------+-------+
| 0 | active | ceph-host-01 | Reqs: 0 /s | 10 | 13 |
+------+--------+--------------+---------------+-------+-------+
+----------+----------+-------+-------+
| Pool | type | used | avail |
+----------+----------+-------+-------+
| metadata | metadata | 1024k | 580G |
| data | data | 0 | 580G |
+----------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
+-------------+
MDS version: ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable)
[root@ceph-host-01 ~]# ceph fs status nova
nova - 0 clients
====
+------+--------+--------------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+--------------+---------------+-------+-------+
| 0 | active | ceph-host-02 | Reqs: 0 /s | 10 | 13 |
+------+--------+--------------+---------------+-------+-------+
+---------------+----------+-------+-------+
| Pool | type | used | avail |
+---------------+----------+-------+-------+
| nova-metadata | metadata | 1024k | 580G |
| nova-data | data | 0 | 580G |
+---------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
+-------------+
MDS version: ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable)
[root@ceph-host-01 ~]# ceph fs ls
name: cephfs, metadata pool: metadata, data pools: [data ]
name: nova, metadata pool: nova-metadata, data pools: [nova-data ]
[root@ceph-host-01 ~]# ceph -s
cluster:
id: 272905d2-fd66-4ef6-a772-9cd73a274683
health: HEALTH_WARN
insufficient standby MDS daemons available
services:
mon: 3 daemons, quorum ceph-host-01,ceph-host-02,ceph-host-03 (age 2h)
mgr: ceph-host-02(active, since 2h), standbys: ceph-host-03, ceph-host-01
mds: cephfs:1 nova:1 {cephfs:0=ceph-host-01=up:active,nova:0=ceph-host-02=up:active}
osd: 16 osds: 16 up (since 2h), 16 in (since 2w)
data:
pools: 7 pools, 896 pgs
objects: 2.16k objects, 8.2 GiB
usage: 34 GiB used, 1.2 TiB / 1.2 TiB avail
pgs: 896 active+clean
注2:MDS的故障轉移
又添加一個新的mds daemon后,它會處於standby狀態,若前兩個mds daemon出問題,它會頂替上去,頂替的規則可以配置,詳情參考文章:http://docs.ceph.com/docs/master/cephfs/standby/#configuring-standby-daemons
[root@ceph-host-01 ceph-cluster]# ceph-deploy mds create ceph-host-03
查看有三個mds后的狀態
[root@ceph-host-01 ~]# ceph mds stat
cephfs:1 nova:1 {cephfs:0=ceph-host-01=up:active,nova:0=ceph-host-02=up:active} 1 up:standby
[root@ceph-host-01 ~]# ceph -s
cluster:
id: 272905d2-fd66-4ef6-a772-9cd73a274683
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph-host-01,ceph-host-02,ceph-host-03 (age 2h)
mgr: ceph-host-02(active, since 2h), standbys: ceph-host-03, ceph-host-01
mds: cephfs:1 nova:1 {cephfs:0=ceph-host-01=up:active,nova:0=ceph-host-02=up:active} 1 up:standby
osd: 16 osds: 16 up (since 2h), 16 in (since 2w)
data:
pools: 7 pools, 896 pgs
objects: 2.16k objects, 8.2 GiB
usage: 34 GiB used, 1.2 TiB / 1.2 TiB avail
pgs: 896 active+clean
io:
client: 4.2 KiB/s rd, 4 op/s rd, 0 op/s wr
[root@ceph-host-01 ~]# ceph fs status
cephfs - 0 clients
======
+------+--------+--------------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+--------------+---------------+-------+-------+
| 0 | active |
ceph-host-01 | Reqs: 0 /s | 14 | 15 |
+------+--------+--------------+---------------+-------+-------+
+----------+----------+-------+-------+
| Pool | type | used | avail |
+----------+----------+-------+-------+
| metadata | metadata | 1920k | 580G |
| data | data | 0 | 580G |
+----------+----------+-------+-------+
nova - 0 clients
====
+------+--------+--------------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+--------------+---------------+-------+-------+
| 0 | active | ceph-host-02 | Reqs: 0 /s | 12 | 15 |
+------+--------+--------------+---------------+-------+-------+
+---------------+----------+-------+-------+
| Pool | type | used | avail |
+---------------+----------+-------+-------+
| nova-metadata | metadata | 1024k | 580G |
| nova-data | data | 0 | 580G |
+---------------+----------+-------+-------+
+--------------+
| Standby MDS |
+--------------+
| ceph-host-03 |
+--------------+
MDS version: ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable)
停止active的MDS
[root@ceph-host-01 ~]# systemctl stop ceph-mds@ceph-host-01.service
查看standby是否頂替上來了
[root@ceph-host-01 ~]# ceph fs status
cephfs - 0 clients
======
+------+--------+--------------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+--------------+---------------+-------+-------+
| 0 | active |
ceph-host-03 | Reqs: 0 /s | 14 | 15 |
+------+--------+--------------+---------------+-------+-------+
+----------+----------+-------+-------+
| Pool | type | used | avail |
+----------+----------+-------+-------+
| metadata | metadata | 1920k | 580G |
| data | data | 0 | 580G |
+----------+----------+-------+-------+
nova - 0 clients
====
+------+--------+--------------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+--------------+---------------+-------+-------+
| 0 | active | ceph-host-02 | Reqs: 0 /s | 12 | 15 |
+------+--------+--------------+---------------+-------+-------+
+---------------+----------+-------+-------+
| Pool | type | used | avail |
+---------------+----------+-------+-------+
| nova-metadata | metadata | 1024k | 580G |
| nova-data | data | 0 | 580G |
+---------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
+-------------+
MDS version: ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable)
注3:多活 MDS
也叫: Multi MDS 、 active-active MDS
每個 CephFS 文件系統默認情況下都只配置一個活躍 MDS 守護進程。在大型系統中,為了擴展元數據性能你可以配置多個活躍的 MDS 守護進程,它們會共同承擔元數據負載。
CephFS 在Luminous版本中多元數據服務器(Multi-MDS)的功能和目錄分片(dirfragment)的功能宣稱已經可以在生產環境中使用。
多活MDS優勢
* 當元數據默認的單個 MDS 成為瓶頸時,配置多個活躍的 MDS 守護進程,提升集群性能。
* 多個活躍的 MDS 有利於性能提升。
* 多個活躍的MDS 可以實現MDS負載均衡。
* 多個活躍的MDS 可以實現多租戶資源隔離。
多活MDS特點
* 它能夠將文件系統樹分割成子樹。
* 每個子樹可以交給特定的MDS進行權威管理。
* 從而達到了隨着元數據服務器數量的增加,集群性能線性地擴展。
* 每個子樹都是基於元數據在給定目錄樹中的熱動態創建的。
* 一旦創建了子樹,它的元數據就被遷移到一個未加載的MDS。
* 后續客戶端對先前授權的MDS的請求被轉發。
擴容活躍MDS
# ceph mds stat
cephfs:1 nova:1 {cephfs:0=ceph-host-01=up:active,nova:0=ceph-host-02=up:active} 1 up:standby
設置max_mds為2
[root@ceph-host-01 ~]# ceph fs set cephfs max_mds 2
查看多活MDS狀態
[root@ceph-host-01 ~]# ceph mds stat
cephfs:2 nova:1 {cephfs:0=ceph-host-01=up:active,cephfs:1=ceph-host-03=up:active,nova:0=ceph-host-02=up:active}
# ceph fs status
cephfs - 0 clients
======
+------+--------+--------------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+--------------+---------------+-------+-------+
| 0 | active | ceph-host-03 | Reqs: 0 /s | 14 | 15 |
| 1 | active | ceph-host-02 | Reqs: 0 /s | 10 | 13 |
+------+--------+--------------+---------------+-------+-------+
+----------+----------+-------+-------+
| Pool | type | used | avail |
+----------+----------+-------+-------+
| metadata | metadata | 1920k | 580G |
| data | data | 0 | 580G |
+----------+----------+-------+-------+
nova - 0 clients
====
+------+--------+--------------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+--------------+---------------+-------+-------+
| 0 | active | ceph-host-01 | Reqs: 0 /s | 12 | 15 |
+------+--------+--------------+---------------+-------+-------+
+---------------+----------+-------+-------+
| Pool | type | used | avail |
+---------------+----------+-------+-------+
| nova-metadata | metadata | 1024k | 580G |
| nova-data | data | 0 | 580G |
+---------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
+-------------+
MDS version: ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable)
注4:默認的數據復制數是3,安全性更高,所以osd最少需要3個,我們可以改變pool里數據的復制數,命令如下
# ceph osd pool set metadata size 2
# ceph osd pool set data size 2
# ceph osd pool set metadata min_size 2
# ceph osd pool set data min_size 2
創建相應(可讀寫)的認證
[root@ceph-host-01 ~]# ceph auth get-or-create client.fsclient mon 'allow r' mds 'allow rw' osd 'allow rwx pool=data' -o ceph.client.fsclient.keyring
[root@ceph-host-01 ~]# ceph auth get client.fsclient
exported keyring for client.fsclient
[client.fsclient]
key = AQC9A4he42+qFBAA7zvVYCOsiLOJrSfjyFQcFg==
caps mds = "allow rw"
caps mon = "allow r"
caps osd = "allow rwx pool=data"
[root@ceph-host-01 ~]# cat ceph.client.fsclient.keyring
[client.fsclient]
key = AQC9A4he42+qFBAA7zvVYCOsiLOJrSfjyFQcFg==
擴展:
1.生產環境創建只讀權限的cephfs認證
ceph auth get-or-create client.r_wk_data mon 'allow r' mds 'allow r' osd 'allow r pool=wk_data'
2.生產環境創建讀寫權限的cephfs認證
ceph auth get-or-create client.wk_data mon 'allow r' mds 'allow rw' osd 'allow rwx pool=wk_data'
注5:刪除文件系統nova及相關池
# ceph fs rm nova --yes-i-really-mean-it
# ceph osd pool rm nova-metadata nova-metadata --yes-i-really-mean-it
# ceph osd pool rm nova-data nova-metadata --yes-i-really-mean-it
注6:Ceph MDS States狀態詳解
MDS 主從切換流程:
handle_mds_map state change up:boot --> up:replay
handle_mds_map state change up:replay --> up:reconnect
handle_mds_map state change up:reconnect --> up:rejoin
handle_mds_map state change up:rejoin --> up:active
7.掛載fs
7.1 用內核驅動掛載 CEPH 文件系統
提前給客戶端安裝好內核驅動掛載ceph文件系統所需要的ceph-common軟件
yum install -y
https://mirrors.aliyun.com/ceph/rpm-nautilus/el7/noarch/ceph-release-1-0.el7.noarch.rpm
yum install ceph-common -y
注1:ubuntu系統上安裝ceph-common或者ceph-fuse
1.添加 release key :
wget -q -O- 'https://download.ceph.com/keys/release.asc' | sudo apt-key add -
2.添加Ceph軟件包源,用Ceph穩定版(如 cuttlefish 、 dumpling 、 emperor 、 nautilus等等)替換掉 {ceph-stable-release} 。例如:
echo deb http://download.ceph.com/debian-{ceph-stable-release}/ $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list
示范:安裝14.2.9版本(nautilus)的ceph
echo deb http://download.ceph.com/debian-nautilus/ $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list
3.更新你的倉庫,並安裝
3.1安裝ceph-common
sudo apt-get update && sudo apt-get install ceph-common
3.2安裝ceph-fuse
sudo apt-get update && sudo apt-get install ceph-fuse
把密鑰文件拷貝到客戶端上
[root@ceph-host-01 ~]# ceph auth print-key client.fsclient
AQC9A4he42+qFBAA7zvVYCOsiLOJrSfjyFQcFg==
[root@ceph-host-01 ~]# ceph auth print-key client.fsclient >fsclient.key
[root@ceph-host-01 ~]# scp fsclient.key root@node3:/etc/ceph/ #scp上傳到客戶端的/etc/ceph目錄下
客戶端上驗證是否有ceph模塊
[root@node3 ~]# modinfo ceph
filename: /lib/modules/3.10.0-957.el7.x86_64/kernel/fs/ceph/ceph.ko.xz
license: GPL
description: Ceph filesystem for Linux
author: Patience Warnick <patience@newdream.net>
author: Yehuda Sadeh <yehuda@hq.newdream.net>
author: Sage Weil <sage@newdream.net>
alias: fs-ceph
retpoline: Y
rhelversion: 7.6
srcversion: 43DA49DF11334B2A5652931
depends: libceph
intree: Y
vermagic: 3.10.0-957.el7.x86_64 SMP mod_unload modversions
signer: CentOS Linux kernel signing key
sig_key: B7:0D:CF:0D:F2:D9:B7:F2:91:59:24:82:49:FD:6F:E8:7B:78:14:27
sig_hashalgo: sha256
客戶端創建掛載點
[root@node3 ~]# mkdir -pv /data
客戶端掛載
[root@node3 ~]# mount -t ceph ceph-host-01:6789,ceph-host-02:6789,ceph-host-03:6789:/ /data -o name=fsclient,secret=AQC9A4he42+qFBAA7zvVYCOsiLOJrSfjyFQcFg==
[root@node3 ~]# df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/mapper/centos-root xfs 360G 44G 317G 13% /
devtmpfs devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs tmpfs 3.9G 17M 3.9G 1% /run
tmpfs tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/vda1 xfs 497M 140M 358M 29% /boot
tmpfs tmpfs 782M 0 782M 0% /run/user/0
10.30.1.221:6789,10.30.1.222:6789,10.30.1.223:6789:/ ceph 581G 0 581G 0% /data
注1:使用secretfile參數指定密鑰文件來掛載更安全,不會被歷史命令記錄到密鑰
[root@node3 ~]# mount -t ceph ceph-host-01:6789,ceph-host-02:6789,ceph-host-03:6789:/ /data -o name=fsclient,secretfile=/etc/ceph/fsclient.key
[root@node3 ~]# df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/mapper/centos-root xfs 360G 44G 317G 13% /
devtmpfs devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs tmpfs 3.9G 17M 3.9G 1% /run
tmpfs tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/vda1 xfs 497M 140M 358M 29% /boot
tmpfs tmpfs 782M 0 782M 0% /run/user/0
10.30.1.221:6789,10.30.1.222:6789,10.30.1.223:6789:/ ceph 581G 0 581G 0% /data
注2:設置開機啟動掛載
[root@node3 ~]# echo "ceph-host-01:6789,ceph-host-02:6789,ceph-host-03:6789:/ /data ceph name=fsclient,secretfile=/etc/ceph/fsclient.key,_netdev 0 0" >> /etc/fstab
注3:掛載多cephfs
創建認證和拷貝密鑰到客戶端
[root@ceph-host-01 ~]# ceph auth get-or-create client.novafsclient mon 'allow r' mds 'allow rw' osd 'allow rwx pool=nova-data'
[root@ceph-host-01 ~]# ceph auth print-key client.novafsclient | ssh node3 tee /etc/ceph/novafsclient.key
創建掛載目錄
[root@node3 ~]# mkdir -pv /nova
卸載原有的那個cephfs
[root@node3 ~]# umount -t ceph /data
重新掛載2個cephfs到客戶端
[root@node3 ~]# mount -t ceph ceph-host-01:6789,ceph-host-02:6789,ceph-host-03:6789:/ /data -o mds_namespace=cephfs,name=fsclient,secretfile=/etc/ceph/fsclient.key
[root@node3 ~]#
mount -t ceph ceph-host-01:6789,ceph-host-02:6789,ceph-host-03:6789:/ /nova -o mds_namespace=nova,name=novafsclient,secretfile=/etc/ceph/novafsclient.key
永久掛載
[root@node3 ~]# cat /etc/fstab
#
# /etc/fstab
# Created by anaconda on Mon Dec 23 04:37:50 2019
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/centos-root / xfs defaults 0 0
UUID=1262a46b-e4eb-4e25-9519-39c4f0c45c8e /boot xfs defaults 0 0
ceph-host-01:6789,ceph-host-02:6789,ceph-host-03:6789:/ /data ceph mds_namespace=cephfs,name=fsclient,secretfile=/etc/ceph/fsclient.key,_netdev 0 0
ceph-host-01:6789,ceph-host-02:6789,ceph-host-03:6789:/ /nova ceph mds_namespace=nova,name=novafsclient,secretfile=/etc/ceph/novafsclient.key,_netdev 0 0
驗證掛載
[root@node3 ~]# stat -f /data
File: "/data"
ID: 5995b80750841c7 Namelen: 255 Type:
ceph
Block size: 4194304 Fundamental block size: 4194304
Blocks: Total: 148575 Free: 148575 Available: 148575
Inodes: Total: 0 Free: -1
[root@node3 ~]# stat -f /nova
File: "/nova"
ID: 5995b80750841c7 Namelen: 255 Type:
ceph
Block size: 4194304 Fundamental block size: 4194304
Blocks: Total: 148575 Free: 148575 Available: 148575
Inodes: Total: 0 Free: -1
查看cephfs狀態
[root@ceph-host-01 ~]# ceph fs get nova
Filesystem 'nova' (4)
fs_name nova
epoch 1439
flags 12
created 2020-04-04 13:04:09.091835
modified 2020-04-04 13:04:11.057747
tableserver 0
root 0
session_timeout 60
session_autoclose 300
max_file_size 1099511627776
min_compat_client -1 (unspecified)
last_failure 0
last_failure_osd_epoch 0
compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
max_mds 1
in 0
up {0=6754123}
failed
damaged
stopped
data_pools [21]
metadata_pool 22
inline_data disabled
balancer
standby_count_wanted 1
6754123: [v2:10.30.1.222:6800/1567901637,v1:10.30.1.222:6801/1567901637] 'ceph-host-02' mds.0.1438 up:active seq 1641
7.2 用戶空間掛載 CEPH 文件系統
從用戶空間( FUSE )掛載一 Ceph 文件系統前,確保客戶端主機有一份 Ceph 配置副本、和具備 Ceph 元數據服務器能力的密鑰環。
提前給客戶端安裝好用戶空間掛載ceph文件系統所需要的ceph-common軟件
yum install -y
https://mirrors.aliyun.com/ceph/rpm-nautilus/el7/noarch/ceph-release-1-0.el7.noarch.rpm
yum install ceph-fuse -y
7.2.1 在客戶端主機上,把監視器主機上的 Ceph 配置文件拷貝到 /etc/ceph/ 目錄下。
mkdir -p /etc/ceph
scp root@ceph-host-01:/etc/ceph/ceph.conf /etc/ceph/ceph.conf
7.2.2 在客戶端主機上,把監視器主機上的 Ceph 密鑰環拷貝到 /etc/ceph 目錄下。
ceph auth get client.fsclient >/tmp/ceph.client.fsclient.keyring
scp root@ceph-host-01:/tmp/ceph.client.fsclient.keyring /etc/ceph/ceph.client.fsclient.keyring
ceph.client.fsclient.keyring配置示范
# cat /etc/ceph/ceph.client.fsclient.keyring
[client.fsclient]
key = AQDxJ5heTf20AhAA34vP0xErt2mFHQiuONWTSQ==
caps mds = "allow rw"
caps mon = "allow r"
caps osd = "allow rwx pool=cephfs-data"
7.2.3 確保客戶端機器上的 Ceph 配置文件和密鑰環都有合適的權限位,如 chmod 644 。
要把 Ceph 文件系統掛載為用戶空間文件系統,可以用 ceph-fuse 命令,例如:
mkdir -pv /ceph_data
ceph-fuse -n client.fsclient /ceph_data
上面命令的補全也可以如下:
ceph-fuse --keyring /etc/ceph/ceph.client.fsclient.keyring --name client.fsclient -m ceph-host-01:6789,ceph-host-02:6789 /ceph_data
開機掛載
echo "id=fsclient,keyring=/etc/ceph/ceph.client.fsclient.keyring /ceph_data fuse.ceph defaults 0 0" >> /etc/fstab
none /ceph_data fuse.ceph ceph.id=fsclient,ceph.conf=/etc/ceph/ceph.conf,_netdev,defaults 0 0
7.2.4 卸載
卸載:fusemount -u <mount_point>
擴展:
nova與ceph結合
在所有計算節點上,把創建成功的ceph文件系統的volume,掛在到/var/lib/nova/instances目錄:
mount -t ceph <CEPH集群mds節點IP>:6789/ /var/lib/nova/instances -o name=admin,secret={ceph.client.admin.key}
chown -R nova:nova /var/lib/nova/instances
示范操作
創建MDS並創建相應的存儲池
[root@ceph-host-01 ceph-cluster]# ceph-deploy mds create ceph-host-01
[root@ceph-host-01 ceph-cluster]# ceph mds stat
1 up:standby
[root@ceph-host-01 ceph-cluster]# ceph osd pool create nova-metadata 128
[root@ceph-host-01 ceph-cluster]# ceph osd pool create nova-data 128
[root@ceph-host-01 ceph-cluster]# ceph fs new nova nova-metadata nova-data
注:默認的數據復制數是3,安全性更高,所以osd最少需要3個,我們可以改變pool里數據的復制數,命令如下
# ceph osd pool set nova-metadata size 2
# ceph osd pool set nova-data size 2
# ceph osd pool set nova-metadata min_size 2
# ceph osd pool set nova-data min_size 2
在計算節點掛載
掛載(當前生效)
[root@node3 ~]# mount -t ceph 10.30.1.221:6789:/ /var/lib/nova/instances/ -o name=admin,secret=AQA8HzdeFQuPHxAAUfjHnOMSfFu7hHIoGv/x1A==
[root@node3 ~]# chown -R nova:nova /var/lib/nova/instances
掛載(永久)
[root@node3 ~]# echo "10.30.1.221:6789:/ /var/lib/nova/instances ceph name=admin,secret=AQA8HzdeFQuPHxAAUfjHnOMSfFu7hHIoGv/x1A==,_netdev 0 0" >> /etc/fstab
定時監測掛載是否失效,並重新掛載
[root@node3 ~]# echo '*/3 * * * * root if [ `mount | grep ceph | wc -l` -eq 0 ] ; then mount -t ceph 10.30.1.221:6789:/ /var/lib/nova/instances/ -o name=admin,secret=AQA8HzdeFQuPHxAAUfjHnOMSfFu7hHIoGv/x1A== ; fi' >>/etc/crontab
注:secret值的查看方法
# cat /etc/ceph/ceph.client.admin.keyring
[client.admin]
key = AQA8HzdeFQuPHxAAUfjHnOMSfFu7hHIoGv/x1A==
創建雲主機后查看使用情況
[root@node3 ~]# df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/mapper/centos-root xfs 200G 3.4G 197G 2% /
devtmpfs devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs tmpfs 3.9G 17M 3.9G 1% /run
tmpfs tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/vda1 xfs 497M 140M 358M 29% /boot
tmpfs tmpfs 782M 0 782M 0% /run/user/0
10.30.1.221:6789:/ ceph 277G 2.2G 275G 1% /var/lib/nova/instances
[root@node3 ~]# tree /var/lib/nova/instances
/var/lib/nova/instances
├── 1878b03d-aa3e-4424-8325-ae3bafce0e6a
│ └── disk.info
├── 3b394c96-94a4-4b98-b55b-cac54ef31282
│ └── disk.info
├── 4dd899dc-df13-4853-b70f-2359db577b2d
│ └── disk.info
├── 52fce24f-c8bc-4bb2-8675-cc0cfe4d3678
│ └── disk.info
├── 5632d386-5cb2-4887-9f48-11bcb709ba5f
│ └── disk.info
├── 59cd7399-202c-44b8-918d-9e9acb0cc2e5
│ └── disk.info
├── 60599ade-f271-42ee-9edc-cfe59b4d2459
│ └── disk.info
├── 6937ed06-8cc0-47d0-8a36-59cbf9981337
│ └── disk.info
├── aa852ceb-700f-4e00-a338-faa137b6dbf6
│ └── disk.info
├── _base
│ ├── a36c45ee0cb50b3d5f57afcff5c9a552becfe68b.converted
│ └── a36c45ee0cb50b3d5f57afcff5c9a552becfe68b.part
├── c45a024c-d944-4135-82da-03251f694b72
│ └── disk.info
├── e4607eff-5d40-4238-ab79-903bba641dd8
│ └── disk.info
└── locks
└── nova-a36c45ee0cb50b3d5f57afcff5c9a552becfe68b
13 directories, 14 files
root@node1 ~]# openstack server list
+--------------------------------------+-----------------+--------+---------------------------------------------+------------+--------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+-----------------+--------+---------------------------------------------+------------+--------+
| 4924b0a7-aad6-447e-b340-a2116f56a4a6 | nova-create-vm9 | ACTIVE | vlan99=172.16.99.139; vlan809=192.168.9.219 | CentOS 7.5 | 1c1g |
| b60a7bd4-8515-4020-b635-00c656928dcc | nova-create-vm8 | ACTIVE | vlan99=172.16.99.138; vlan809=192.168.9.218 | CentOS 7.5 | 1c1g |
| a91c9082-72fe-4c4e-b864-6bdf4b5b3c65 | nova-create-vm7 | ACTIVE | vlan99=172.16.99.137; vlan809=192.168.9.217 | CentOS 7.5 | 1c1g |
| ce3a4dab-9e2d-4c66-8d8c-974dd30ca65a | nova-create-vm6 | ACTIVE | vlan99=172.16.99.136; vlan809=192.168.9.216 | CentOS 7.5 | 1c1g |
| 4c94d4d4-9074-405b-a570-768dc1c1b5a4 | nova-create-vm5 | ACTIVE | vlan99=172.16.99.135; vlan809=192.168.9.215 | CentOS 7.5 | 1c1g |
| a56a700e-f0e1-4845-9eb7-84d77fbf683d | nova-create-vm4 | ACTIVE | vlan99=172.16.99.134; vlan809=192.168.9.214 | CentOS 7.5 | 1c1g |
| c237cbb8-62a6-4bfd-be95-009aaa30c3bf | nova-create-vm3 | ACTIVE | vlan99=172.16.99.133; vlan809=192.168.9.213 | CentOS 7.5 | 1c1g |
| d89a137d-53c5-448e-8592-6b06eac00af7 | nova-create-vm2 | ACTIVE | vlan99=172.16.99.132; vlan809=192.168.9.212 | CentOS 7.5 | 1c1g |
| 38764d77-73ee-4030-9dc5-51effe6cfa95 | nova-create-vm1 | ACTIVE | vlan99=172.16.99.131; vlan809=192.168.9.211 | CentOS 7.5 | 1c1g |
+--------------------------------------+-----------------+--------+---------------------------------------------+------------+--------+
[root@node1 ~]# ceph df
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 1.2 TiB 1.1 TiB 5.7 GiB 21 GiB 1.75
TOTAL 1.2 TiB 1.1 TiB 5.7 GiB 21 GiB 1.75
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL
nova-metadata 6 3.6 MiB 23 16 MiB 0 276 GiB
nova-data 7 1.2 GiB 372 5.0 GiB 0.45 276 GiB
[root@node1 ~]# ceph -s
cluster:
id: 272905d2-fd66-4ef6-a772-9cd73a274683
health: HEALTH_WARN
1 daemons have recently crashed
services:
mon: 3 daemons, quorum ceph-host-01,ceph-host-02,ceph-host-03 (age 15m)
mgr: ceph-host-01(active, since 38m), standbys: ceph-host-03, ceph-host-02
mds: nova:1 {0=ceph-host-01=up:active} 1 up:standby
osd: 15 osds: 15 up (since 13m), 15 in (since 107m)
data:
pools: 2 pools, 128 pgs
objects: 415 objects, 1.4 GiB
usage: 21 GiB used, 1.1 TiB / 1.2 TiB avail
pgs: 128 active+clean
io:
client: 3.2 MiB/s rd, 174 KiB/s wr, 123 op/s rd, 23 op/s wr
關於環境的清理
$ ceph-deploy purge ceph-host-01 ceph-host-02 ceph-host-03 ceph-host-04 // 會移除所有與ceph相關的
$ ceph-deploy purgedata ceph-host-01 ceph-host-02 ceph-host-03 ceph-host-04
$ ceph-deploy forgetkeys
關於報錯:
報錯1:
[ceph-mon01][DEBUG ] --> Finished Dependency Resolution
[ceph-mon01][WARNIN] Error: Package: 2:librgw2-14.2.9-0.el7.x86_64 (Ceph)
[ceph-mon01][WARNIN] Requires: liblttng-ust.so.0()(64bit)
[ceph-mon01][WARNIN] Error: Package: 2:ceph-common-14.2.9-0.el7.x86_64 (Ceph)
[ceph-mon01][WARNIN] Requires: libbabeltrace.so.1()(64bit)
[ceph-mon01][WARNIN] Error: Package: 2:ceph-common-14.2.9-0.el7.x86_64 (Ceph)
[ceph-mon01][WARNIN] Requires: libbabeltrace-ctf.so.1()(64bit)
[ceph-mon01][WARNIN] Error: Package: 2:ceph-mon-14.2.9-0.el7.x86_64 (Ceph)
[ceph-mon01][WARNIN] Requires: libleveldb.so.1()(64bit)
[ceph-mon01][WARNIN] Error: Package: 2:librgw2-14.2.9-0.el7.x86_64 (Ceph)
[ceph-mon01][WARNIN] Requires: liboath.so.0()(64bit)
[ceph-mon01][WARNIN] Error: Package: 2:ceph-osd-14.2.9-0.el7.x86_64 (Ceph)
[ceph-mon01][WARNIN] Requires: libleveldb.so.1()(64bit)
[ceph-mon01][WARNIN] Error: Package: 2:ceph-common-14.2.9-0.el7.x86_64 (Ceph)
[ceph-mon01][WARNIN] Requires: liboath.so.0()(64bit)
[ceph-mon01][WARNIN] Error: Package: 2:ceph-common-14.2.9-0.el7.x86_64 (Ceph)
[ceph-mon01][WARNIN] Requires: libleveldb.so.1()(64bit)
[ceph-mon01][WARNIN] Error: Package: 2:ceph-common-14.2.9-0.el7.x86_64 (Ceph)
[ceph-mon01][WARNIN] Requires: liboath.so.0(LIBOATH_1.10.0)(64bit)
[ceph-mon01][WARNIN] Error: Package: 2:librbd1-14.2.9-0.el7.x86_64 (Ceph)
解決辦法:
yum install epel-release -y
注:這一步非常重要,如果跳過這一步,直接進行ceph的安裝,那么會報如下的錯誤:
報錯2:
health: HEALTH_WARN
clock skew detected on mon.ceph-host-02, mon.ceph-host-03
這個是時間同步造成的
# ansible ceph -a 'yum install ntpdate -y'
# ansible ceph -a 'systemctl stop ntpdate'
# ansible ceph -a 'ntpdate time.windows.com'
每個ceph節點都設置開機啟動同步時間並做定時同步:
[root@ceph-host-01 ~]# cat /etc/rc.d/rc.local
#!/bin/bash
# THIS FILE IS ADDED FOR COMPATIBILITY PURPOSES
#
# It is highly advisable to create own systemd services or udev rules
# to run scripts during boot instead of using this file.
#
# In contrast to previous versions due to parallel execution during boot
# this script will NOT be run after all other services.
#
# Please note that you must run 'chmod +x /etc/rc.d/rc.local' to ensure
# that this script will be executed during boot.
timedatectl set-timezone Asia/Shanghai && ntpdate
time1.aliyun.com && hwclock -w >/dev/null 2>&1
touch /var/lock/subsys/local
[root@ceph-host-01 ~]# chmod +x /etc/rc.d/rc.local
[root@ceph-host-01 ~]# systemctl enable rc-local
[root@ceph-host-01 ~]# echo '*/5 * * * * root timedatectl set-timezone Asia/Shanghai && ntpdate
time1.aliyun.com && hwclock -w >/dev/null 2>&1' >> /etc/crontab
注:centos7的時間同步使用chrony更好,具體步驟如下
yum install chrony -y
systemctl start chronyd
systemctl enable chronyd
# cat /etc/chrony.conf | grep -v '^#\|^$'
server 0.centos.pool.ntp.org iburst
server 1.centos.pool.ntp.org iburst
server 2.centos.pool.ntp.org iburst
server 3.centos.pool.ntp.org iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
報錯3:
# ceph status
cluster:
id: 04d85079-c2ef-47c8-a8bb-c6cb13db3cc4
health: HEALTH_WARN
62 daemons have recently crashed
解決辦法:
# ceph crash archive-all
作者:Dexter_Wang 工作崗位:某互聯網公司資深雲計算與存儲工程師 聯系郵箱:993852246@qq.com
