Ceph是一个可靠、自动重均衡、自动恢复的分布式存储系统,根据场景划分可以将Ceph分为三大块,分别是对象存储、块设备和文件系统服务。块设备存储是Ceph的强项。
Ceph的主要优点是分布式存储,在存储每一个数据时,都会通过计算得出该数据存储的位置,尽量将数据分布均衡,不存在传统的单点故障的问题,可以水平扩展。
Ceph架构



Ceph架构
RADOS自身是一个完整的分布式对象存储系统,它具有可靠、智能、分布式等特性,Ceph的高可靠、高可拓展、高性能、高自动化都是由这一层来提供的,用户数据的存储最终也都是通过这一层来进行存储的,RADOS可以说就是Ceph的核心组件。
RADOS系统主要由两部分组成,分别是OSD和Monitor。
基于RADOS层的上一层是LIBRADOS,LIBRADOS是一个库,它允许应用程序通过访问该库来与RADOS系统进行交互,支持多种编程语言,比如C、C++、Python等。
基于LIBRADOS层开发的又可以看到有三层,分别是RADOSGW、RBD和CEPH FS。
RADOSGW:RADOSGW是一套基于当前流行的RESTFUL协议的网关,并且兼容S3和Swift。
RBD:RBD通过Linux内核客户端和QEMU/KVM驱动来提供一个分布式的块设备。
CEPH FS:CEPH FS通过Linux内核客户端和FUSE来提供一个兼容POSIX的文件系统。
Ceph核心组件RADOS
RADOS系统主要由两部分组成,分别是OSD和Monitor。
Ceph OSD:OSD的英文全称是Object Storage Device,它的主要功能是存储数据、复制数据、平衡数据、恢复数据等,与其它OSD间进行心跳检查等,并将一些变化情况上报给Ceph Monitor。一般情况下一块硬盘对应一个OSD,由OSD来对硬盘存储进行管理,当然一个分区也可以成为一个OSD。
Ceph Monitor:由该英文名字我们可以知道它是一个监视器,负责监视Ceph集群,维护Ceph集群的健康状态,同时维护着Ceph集群中的各种Map图,比如OSD Map、Monitor Map、PG Map和CRUSH Map,这些Map统称为Cluster Map,Cluster Map是RADOS的关键数据结构,管理集群中的所有成员、关系、属性等信息以及数据的分发,比如当用户需要存储数据到Ceph集群时,OSD需要先通过Monitor获取最新的Map图,然后根据Map图和object id等计算出数据最终存储的位置。



为保证高可用性, Ceph 存储集群应该保存两份以上的对象副本。Ceph OSD 守护进程自动在其它 Ceph 节点上创建对象副本来确保数据安全和高可用性。
Ceph 监视器维护着集群运行图的主副本。为保证高可用性,监视器也实现了集群化。一个监视器集群确保了当某个监视器失效时的高可用性。
Ceph数据分布算法
Ceph是为大规模分布式存储而设计的,数据分布算法必须能够满足在大规模的集群下数据依然能够快速的准确的计算存放位置,同时能够在硬件故障或扩展硬件设备时做到尽可能小的数据迁移,Ceph的CRUSH算法就是精心为这些特性设计的。
在说明CRUSH算法的基本原理之前,先介绍几个概念和它们之间的关系。
Object: 当用户要将数据存储到Ceph集群时,存储数据都会被分割成多个Object,每个Object都有一个object id,每个Object的大小是可以设置的,默认是4MB,Object可以看成是Ceph存储的最小存储单元。



PG:由于Object的数量很多,所以Ceph引入了PG的概念用于管理Object,每个Object最后都会通过CRUSH计算映射到某个PG中,一个PG可以包含多个Object。
PG与OSD的关系:PG也需要通过CRUSH计算映射到OSD中去存储,如果是二副本的,则每个PG都会映射到二个OSD,比如[OSD#1,OSD#2],那么OSD#1是存放该PG的主副本,OSD#2是存放该PG的从副本,保证了数据的冗余。
把对象映射到归置组在 OSD 和客户端间创建了一个间接层。由于 Ceph 集群必须能增大或缩小、并动态地重均衡。如果让客户端“知道”哪个 OSD 有哪个对象,就会导致客户端和 OSD 紧耦合;相反, CRUSH 算法把对象映射到归置组、然后再把各归置组映射到一或多个 OSD ,这一间接层可以让 Ceph 在 OSD 守护进程和底层设备上线时动态地重均衡。下列图表描述了 CRUSH 如何将对象映射到归置组、再把归置组映射到 OSD 。


PG和PGP的关系:pg是用来存放object的,pgp相当于是pg存放osd的一种排列组合,我举个例子,比如有3个osd,osd.1、osd.2和osd.3,副本数是2,如果pgp的数目为1,那么pg存放的osd组合就只有一种,可能是[osd.1,osd.2],那么所有的pg主从副本分别存放到osd.1和osd.2,如果pgp设为2,那么其osd组合可以两种,可能是[osd.1,osd.2]和[osd.1,osd.3],是不是很像我们高中数学学过的排列组合,pgp就是代表这个意思。一般来说应该将pg和pgp的数量设置为相等。


object、pg、pool、osd、存储磁盘的关系
本质上CRUSH算法是根据存储设备的权重来计算数据对象的分布的,权重的设计可以根据该磁盘的容量和读写速度来设置,比如根据容量大小可以将1T的硬盘设备权重设为1,2T的就设为2,在计算过程中,CRUSH是根据Cluster Map、数据分布策略和一个随机数共同决定数组最终的存储位置的。
Cluster Map里的内容信息包括存储集群中可用的存储资源及其相互之间的空间层次关系,比如集群中有多少个支架,每个支架中有多少个服务器,每个服务器有多少块磁盘用以OSD等。
数据分布策略是指可以通过Ceph管理者通过配置信息指定数据分布的一些特点,比如管理者配置的故障域是Host,也就意味着当有一台Host起不来时,数据能够不丢失,CRUSH可以通过将每个pg的主从副本分别存放在不同Host的OSD上即可达到,不单单可以指定Host,还可以指定机架等故障域,除了故障域,还有选择数据冗余的方式,比如副本数或纠删码。
CEPH网络配置参考
网络配置对构建高性能 Ceph 存储集群来说相当重要。 Ceph 存储集群不会代表 Ceph 客户端执行请求路由或调度,相反, Ceph 客户端(如块设备、 CephFS 、 REST 网关)直接向 OSD 请求,然后OSD为客户端执行数据复制,也就是说复制和其它因素会额外增加集群网的负载。
我们的快速入门配置提供了一个简陋的 Ceph 配置文件,其中只设置了监视器 IP 地址和守护进程所在的主机名。如果没有配置集群网,那么 Ceph 假设你只有一个“公共网”。只用一个网可以运行 Ceph ,但是在大型集群里用单独的“集群”网可显著地提升性能。
我们建议用两个网络运营 Ceph 存储集群:一个公共网(前端)和一个集群网(后端)。为此,各节点得配备多个网卡。

运营两个独立网络的考量主要有:
1. 性能: OSD 为客户端处理数据复制,复制多份时 OSD 间的网络负载势必会影响到客户端和 Ceph 集群的通讯,包括延时增加、产生性能问题;恢复和重均衡也会显著增加公共网延时。关于 Ceph 如何复制参见伸缩性和高可用性;关于心跳流量参见监视器与 OSD 的交互。
2. 安全: 大多数人都是良民,很少的一撮人喜欢折腾拒绝服务攻击( DoS )。当 OSD 间的流量失控时,归置组再也不能达到 active + clean 状态,这样用户就不能读写数据了。挫败此类攻击的一种好方法是维护一个完全独立的集群网,使之不能直连互联网;另外,请考虑用消息签名防止欺骗攻击。
使用ceph-deploy工具部署ceph
官方中文文档:
http://docs.ceph.org.cn/
实验环境
10.30.1.221 192.168.9.211 ceph-host-01
10.30.1.222 192.168.9.212 ceph-host-02
10.30.1.223 192.168.9.213 ceph-host-03
10.30.1.224 192.168.9.214 ceph-host-04
系统:CentOS7.6
每个主机上有2块空闲盘
ceph集群节点系统这里采用了centos7.6 64位。总共5台ceph节点机,每台节点机启动2个osd角色,每个osd对应一块物理磁盘。
对于Ceph 10.x,最好使用4.x内核。如果必须使用老内核,你应该使用FUSE作为客户端
升级系统内核
cat >>/etc/yum.repos.d/CentOS-altarch.repo<<EOF
# CentOS-Base.repo
#
# The mirror system uses the connecting IP address of the client and the
# update status of each mirror to pick mirrors that are updated to and
# geographically close to the client. You should use this for CentOS updates
# unless you are manually picking other mirrors.
#
# If the mirrorlist= does not work for you, as a fall back you can try the
# remarked out baseurl= line instead.
#
#
[kernel]
name=CentOS-$releasever - Kernel
baseurl=https://mirrors.tuna.tsinghua.edu.cn/centos-altarch/7/kernel/x86_64/
enabled=1
gpgcheck=0
EOF
yum clean all
yum install kernel -y
更新引导
grub2-mkconfig -o /boot/grub2/grub.cfg
grub2-set-default 0
系统优化
echo '* - nofile 65535' >> /etc/security/limits.conf
ulimit -SHn 65535
cat > /etc/sysctl.conf <<EOF
kernel.sysrq = 0
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
net.core.wmem_default = 8388608
net.core.rmem_default = 8388608
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.netdev_max_backlog = 262144
net.core.somaxconn = 262144
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.ip_forward = 0
net.ipv4.ip_local_port_range = 5000 65000
net.ipv4.tcp_fin_timeout = 1
net.ipv4.tcp_keepalive_time = 30
net.ipv4.tcp_max_orphans = 3276800
net.ipv4.tcp_max_syn_backlog = 262144
net.ipv4.tcp_max_tw_buckets = 6000
net.ipv4.tcp_mem = 94500000 915000000 927000000
net.ipv4.tcp_no_metrics_save=1
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_sack = 1
net.ipv4.tcp_syn_retries = 1
net.ipv4.tcp_synack_retries = 1
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_wmem = 4096 16384 16777216
fs.file-max=65536
fs.inotify.max_queued_events=99999999
fs.inotify.max_user_watches=99999999
fs.inotify.max_user_instances=65535
net.core.default_qdisc=fq
EOF
sysctl -p
关闭selinux和防火墙
setenforce 0
sed -i 's/SELINUX=.*/SELINUX=disabled/' /etc/selinux/config
systemctl stop firewalld
systemctl disable firewalld
systemctl disable NetworkManager
systemctl stop NetworkManager
安装网络守时服务
Openstack节点之间必须时间同步,不然可能会导致创建云主机不成功。
# yum install chrony -y
# vim /etc/chrony.conf #修改NTP配置
server
0.centos.pool.ntp.org iburst
server
1.centos.pool.ntp.org iburst
server
2.centos.pool.ntp.org iburst
server
3.centos.pool.ntp.org iburst
# systemctl enable chronyd.service#设置NTP服务开机启动
# systemctl start chronyd.service#启动NTP对时服务
# chronyc sources#验证NTP对时服务
210 Number of sources = 1
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^? ControllerNode 0 6 0 - +0ns[ +0ns] +/- 0ns
设置时区
timedatectl set-timezone Asia/Shanghai
常用软件包安装
yum install -y vim net-tools wget lrzsz deltarpm tree screen lsof tcpdump nmap sysstat iftop
更换centos源
wget -O /etc/yum.repos.d/CentOS-Base.repo
http://mirrors.aliyun.com/repo/Centos-7.repo
提前安装好epel源
yum install epel-release -y
注:使用阿里的epel源会使安装变快点
wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo
1.安装ceph-deloy
1.1配置主机名,配置host文件,本例ceph-deploy安装在其中一个节点上。
[root@ceph-host-01 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.30.1.221 ceph-host-01
10.30.1.222 ceph-host-02
10.30.1.223 ceph-host-03
10.30.1.224 ceph-host-04
注:主机名一定要于/etc/hosts中的一致
1.2使用ssh-keygen生成key,并用ssh-copy-id复制key到各节点机。
[root@ceph-host-01 ~]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:iVPfxuQVphRA8v2//XsM+PxzWjYrx5JnnHTbBdNYwTw root@ceph-host-01
The key's randomart image is:
+---[RSA 2048]----+
| ..o.o.=..|
| o o o E.|
| . . + .+.|
| o o = o+ .|
| o S . =..o |
| . .. .oo|
| o=+X|
| +o%X|
| B*X|
+----[SHA256]-----+
以将key复制到ceph-host-02为例
[root@ceph-host-01 ~]# ssh-copy-id ceph-host-02
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'ceph-host-02 (10.30.1.222)' can't be established.
ECDSA key fingerprint is SHA256:VsMfdmYFzxV1dxKZi2OSp8QluRVQ1m2lT98cJt4nAFU.
ECDSA key fingerprint is MD5:de:07:2f:5c:13:9b:ba:0b:e5:0e:c2:db:3e:b8:ab:bd.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@ceph-host-02's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'ceph-host-02'"
and check to make sure that only the key(s) you wanted were added.
1.3安装ceph-deploy.
安装前我们配置下yum源,这里使用的是较新的nautilus版本
[root@ceph-host-01 ~]# cat /etc/yum.repos.d/ceph.repo
[Ceph]
name=Ceph packages for $basearch
baseurl=http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/$basearch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
[Ceph-noarch]
name=Ceph noarch packages
baseurl=http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/noarch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
[ceph-source]
name=Ceph source packages
baseurl=http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/SRPMS
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
注:直接安装官方ceph源的命令如下
yum install -y
https://mirrors.aliyun.com/ceph/rpm-nautilus/el7/noarch/ceph-release-1-0.el7.noarch.rpm
[root@ceph-host-01 ~]# yum install ceph-deploy python-setuptools python2-subprocess32 -y
2.创建ceph monitor角色
2.1在使用ceph-deploy部署的过程中会产生一些配置文件,建议先创建一个目录,例如cpeh-cluster
[root@ceph-host-01 ~]# mkdir -pv ceph-cluster
[root@ceph-host-01 ~]# cd ceph-cluster
2.2初始化mon节点,准备创建集群:
[root@ceph-host-01 ceph-cluster]# ceph-deploy new ceph-host-01 ceph-host-02 ceph-host-03
更改生成的 ceph 集群配置文件
[root@ceph-host-01 ceph-cluster]# cat ceph.conf
[global]
fsid = a480fcef-1c4b-48cb-998d-0caed867b5eb
mon_initial_members = ceph-host-01, ceph-host-02, ceph-host-03
mon_host = 10.30.1.221,10.30.1.222,10.30.1.223
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
mon clock drift allowed = 2
mon clock drift warn backoff = 30
public_network = 10.30.1.0/24
cluster_network = 192.168.9.0/24
max_open_files = 131072
mon_pg_warn_max_per_osd = 1000
mon_max_pg_per_osd = 1000
osd pool default pg num = 256
osd pool default pgp num = 256
osd pool default size = 3
osd pool default min size = 1
mon_osd_full_ratio = .90
mon_osd_nearfull_ratio = .80
osd_deep_scrub_randomize_ratio = 0.01
[mon]
mon_allow_pool_delete = true
mon_osd_down_out_interval = 600
mon_osd_min_down_reporters = 3
[mgr]
mgr modules = dashboard
[mds]
mds cache memory limit = 10737418240
mds cache size = 250000
mds_max_export_size = 20971520
mds_bal_interval = 10
mds_bal_sample_interval = 3.000000
[osd]
osd_journal_size = 20480
osd_max_write_size = 1024
osd mkfs type = xfs
osd_recovery_op_priority = 1
osd_recovery_max_active = 1
osd_recovery_max_single_start = 1
osd_recovery_threads = 1
osd_recovery_max_chunk = 1048576
osd_max_backfills = 1
osd_scrub_begin_hour = 22
osd_scrub_end_hour = 7
osd_recovery_sleep = 0
[client]
rbd_cache = true
rbd_cache_writethrough_until_flush = true
rbd_concurrent_management_ops = 10
rbd_cache_size = 67108864
rbd_cache_max_dirty = 50331648
rbd_cache_target_dirty = 33554432
rbd_cache_max_dirty_age = 2
rbd_default_format = 2
注:以上是经过考虑后的优化配置,生产环境对配置进行增删后谨慎使用
2.3所有节点安装ceph程序
使用ceph-deploy来安装ceph程序,也可以单独到每个节点上手动安装ceph,根据配置的yum源不同,会安装不同版本的ceph
[root@ceph-host-01 ceph-cluster]# ceph-deploy install --no-adjust-repos ceph-host-01 ceph-host-02 ceph-host-03 ceph-host-04
# 不加--no-adjust-repos 会一直使用ceph-deploy提供的默认的源,很坑
提示:若需要在集群各节点独立安装ceph程序包,其方法如下:
# yum install -y
https://mirrors.aliyun.com/ceph/rpm-nautilus/el7/noarch/ceph-release-1-0.el7.noarch.rpm
# yum install ceph ceph-radosgw -y
2.4配置初始mon节点,并收集所有密钥
[root@ceph-host-01 ceph-cluster]# ceph-deploy mon create-initial
2.5查看启动服务
# ps -ef|grep ceph
ceph 1916 1 0 12:05 ? 00:00:03 /usr/bin/ceph-mon -f --cluster ceph --id ceph-host-01 --setuser ceph --setgroup ceph
2.6在管理节点把配置文件和 admin 密钥拷贝到管理节点和 Ceph 节点
[root@ceph-host-01 ceph-cluster]# ceph-deploy admin ceph-host-01 ceph-host-02 ceph-host-03 ceph-host-04
在每个节点上赋予 ceph.client.admin.keyring 有操作权限
# chmod +r /etc/ceph/ceph.client.admin.keyring
或者使用ansible批量给ceph节点添加权限
# ansible ceph -a 'chmod +r /etc/ceph/ceph.client.admin.keyring'
3.创建ceph osd角色(osd部署)
新版ceph-deploy直接使用create
相当于prepare,activate,osd create --bluestore
ceph-deploy osd create --data /dev/vdb ceph-host-01
ceph-deploy osd create --data /dev/vdb ceph-host-02
ceph-deploy osd create --data /dev/vdb ceph-host-03
ceph-deploy osd create --data /dev/vdb ceph-host-04
注:如果磁盘已经有数据一定要擦除,示范命令如下
ceph-deploy disk zap ceph-host-02 /dev/vdb
4.创建mgr角色
自从ceph 12开始,manager是必须的。应该为每个运行monitor的机器添加一个mgr,否则集群处于WARN状态。
[root@ceph-host-01 ceph-cluster]# ceph-deploy mgr create ceph-host-01 ceph-host-02 ceph-host-03
5.查看集群健康状态
[root@ceph-host-03 ~]# ceph health
HEALTH_OK
[root@ceph-host-03 ~]# ceph -s
cluster:
id: 02e63c58-5200-45c9-b592-07624f4893a5
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph-host-01,ceph-host-02,ceph-host-03 (age 59m)
mgr: ceph-host-01(active, since 4m), standbys: ceph-host-02, ceph-host-03
osd: 4 osds: 4 up (since 87m), 4 in (since 87m)
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 5.0 GiB used, 90 GiB / 95 GiB avail
pgs:
再添加osd
ceph-deploy osd create --data /dev/vdc ceph-host-01
ceph-deploy osd create --data /dev/vdc ceph-host-02
ceph-deploy osd create --data /dev/vdc ceph-host-03
ceph-deploy osd create --data /dev/vdc ceph-host-04
查看状态
[root@ceph-host-01 ceph-cluster]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.18585 root default
-3 0.03717 host ceph-host-01
0 hdd 0.01859 osd.0 up 1.00000 1.00000
4 hdd 0.01859 osd.4 up 1.00000 1.00000
-5 0.03717 host ceph-host-02
1 hdd 0.01859 osd.1 up 1.00000 1.00000
5 hdd 0.01859 osd.5 up 1.00000 1.00000
-7 0.03717 host ceph-host-03
2 hdd 0.01859 osd.2 up 1.00000 1.00000
6 hdd 0.01859 osd.6 up 1.00000 1.00000
-9 0.03717 host ceph-host-04
3 hdd 0.01859 osd.3 up 1.00000 1.00000
7 hdd 0.01859 osd.7 up 1.00000 1.00000
注:查看每个osd的权重和磁盘使用情况
[root@ceph-host-02 ceph-cluster]# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
0 hdd 1.81940 1.00000 1.8 TiB 732 GiB 731 GiB 8 KiB 1.2 GiB 1.1 TiB 39.30 1.18 86 up
1 hdd 1.81940 1.00000 1.8 TiB 956 GiB 955 GiB 40 KiB 1.5 GiB 907 GiB 51.33 1.54 85 up
2 hdd 1.81940 1.00000 1.8 TiB 826 GiB 825 GiB 48 KiB 1.5 GiB 1.0 TiB 44.36 1.33 74 up
3
hdd 5.45799 1.00000 5.5 TiB 1.0 GiB 12 MiB 0 B 1 GiB 5.5 TiB 0.02 0 90 up
4 hdd 1.81940 1.00000 1.8 TiB 939 GiB 938 GiB 39 KiB 1.5 GiB 924 GiB 50.42 1.51 89 up
5 hdd 1.81940 1.00000 1.8 TiB 1.0 TiB 1.0 TiB 3 KiB 1.9 GiB 834 GiB 55.24 1.66 109 up
6 hdd 1.81940 1.00000 1.8 TiB 808 GiB 806 GiB 52 KiB 1.4 GiB 1.0 TiB 43.36 1.30 90 up
7 hdd 1.81940 1.00000 1.8 TiB 919 GiB 917 GiB 48 KiB 1.5 GiB 945 GiB 49.30 1.48 88 up
TOTAL 18 TiB 6.1 TiB 6.1 TiB 240 KiB 11 GiB 12 TiB 33.34
MIN/MAX VAR: 0/1.66 STDDEV: 18.43
[root@ceph-host-02 ceph-cluster]# ceph osd status
+----+--------------+-------+-------+--------+---------+--------+---------+-----------+
| id | host | used | avail | wr ops | wr data | rd ops | rd data | state |
+----+--------------+-------+-------+--------+---------+--------+---------+-----------+
| 0 | ceph-host-01 | 732G | 1130G | 1 | 4096k | 0 | 0 | exists,up |
| 1 | ceph-host-01 | 956G | 906G | 4 | 19.2M | 0 | 0 | exists,up |
| 2 | ceph-host-02 | 826G | 1036G | 0 | 3276k | 0 | 0 | exists,up |
| 3 | ceph-host-02 | 1035M | 5588G | 0 | 0 | 0 | 0 | exists,up |
| 4 | ceph-host-03 | 939G | 923G | 3 | 14.4M | 0 | 0 | exists,up |
| 5 | ceph-host-03 | 1029G | 833G | 0 | 3413k | 0 | 0 | exists,up |
| 6 | ceph-host-04 | 808G | 1054G | 4 | 16.0M | 0 | 0 | exists,up |
| 7 | ceph-host-04 | 918G | 944G | 2 | 10.4M | 0 | 0 | exists,up |
+----+--------------+-------+-------+--------+---------+--------+---------+-----------+
查看挂载
[root@ceph-host-02 ~]# df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/vda1 xfs 20G 1.5G 19G 8% /
devtmpfs devtmpfs 475M 0 475M 0% /dev
tmpfs tmpfs 496M 0 496M 0% /dev/shm
tmpfs tmpfs 496M 13M 483M 3% /run
tmpfs tmpfs 496M 0 496M 0% /sys/fs/cgroup
tmpfs tmpfs 100M 0 100M 0% /run/user/0
tmpfs tmpfs 496M 52K 496M 1% /var/lib/ceph/osd/ceph-1
tmpfs tmpfs 496M 52K 496M 1% /var/lib/ceph/osd/ceph-5
注1:mon和mgr角色其实各一个节点就行,但是为了保障高可用,推荐多节点,示范添加命令如下
# ceph-deploy --overwrite-conf mon add ceph-host-03
# ceph-deploy --overwrite-conf mgr create ceph-host-03
注2:如果需要把某个mon节点剔除,示范命令如下
ceph-deploy mon destroy ceph-host-02
注3:当某个节点无法加入到mon集群中时,需要检查各mon节点的ceph.conf配置,要保持一致并有新的节点在配置中步骤如下
ceph-deploy mon destroy ceph-host-02
确保deploy节点的ceph-cluster/ceph.conf和/etc/ceph/ceph.conf如下
[global]
fsid = a480fcef-1c4b-48cb-998d-0caed867b5eb
mon_initial_members = ceph-host-01, ceph-host-02, ceph-host-03
mon_host = 10.30.1.221,10.30.1.222,10.30.1.223
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
mon clock drift allowed = 2
mon clock drift warn backoff = 30
# 网络配置
public_network = 10.30.1.0/24
cluster_network = 192.168.9.0/24
max_open_files = 131072
mon_pg_warn_max_per_osd = 1000
mon_max_pg_per_osd = 1000
osd pool default size = 3
osd pool default min size = 2
mon_osd_full_ratio = .90
mon_osd_nearfull_ratio = .80
osd_deep_scrub_randomize_ratio = 0.01
[mon]
mon_allow_pool_delete = true
[mgr]
mgr modules = dashboard
[mds]
mds cache memory limit = 10737418240
mds cache size = 250000
[osd]
osd_max_write_size = 1024
osd_recovery_op_priority = 1
osd_recovery_max_active = 1
osd_recovery_max_single_start = 1
osd_recovery_max_chunk = 1048576
osd_recovery_threads = 1
osd_max_backfills = 1
osd_scrub_begin_hour = 22
osd_scrub_end_hour = 7
osd_recovery_sleep = 0
osd_crush_update_on_start = false
ceph-deploy --overwrite-conf mon add ceph-host-02
注4:把deploy节点的ceph.conf配置推送到别的机器可以使用如下命令
# ceph-deploy --overwrite-conf config push ceph-host-01 ceph-host-02 ceph-host-04
重启服务使参数修改生效,命令:
systemctl restart ceph-mgr.target
systemctl restart ceph.target
5.创建和删除ceph存储池
5.1创建
[root@ceph-host-01 ceph-cluster]# ceph osd pool create volumes 128
pool 'volumes' created
5.2删除
[root@ceph-host-02 ~]# ceph osd pool rm volumes volumes --yes-i-really-really-mean-it
pool 'volumes' removed
6.部署CEPH-FS
6.1简介
cephfs是ceph提供的兼容POSIX协议的文件系统,对比rbd和rgw功能,这个是ceph里最晚满足production ready的一个功能,它底层还是使用rados存储数据


使用cephfs的两种方式
1. cephfs kernel module
2. cephfs-fuse
从上面的架构可以看出,cephfs-fuse的IO path比较长,性能会比cephfs kernel module的方式差一些;
client端访问cephfs的流程

1. client端与mds节点通讯,获取metadata信息(metadata也存在osd上)
2. client直接写数据到osd
6.2 示范操作
http://docs.ceph.com/docs/master/rados/operations/placement-groups/
至少在一个节点运行ceph-mds守护进程
[root@ceph-host-01 ~]# cd ceph-cluster/
[root@ceph-host-01 ceph-cluster]# ceph-deploy mds create ceph-host-01 ceph-host-02
创建存储池
[root@ceph-host-01 ceph-cluster]# ceph osd pool create data 128
[root@ceph-host-01 ceph-cluster]# ceph osd pool create metadata 128
激活文件系统
[root@ceph-host-01 ceph-cluster]# ceph fs new cephfs metadata data
查看文件系统
[root@ceph-host-01 ceph-cluster]# ceph fs ls
name: cephfs, metadata pool: metadata, data pools: [data ]
[root@ceph-host-01 ceph-cluster]# ceph mds stat
cephfs:1 {0=ceph-host-02=up:active} 2 up:standby
[root@ceph-host-01 ceph-cluster]# ceph fs status cephfs
cephfs - 0 clients
======
+------+--------+--------------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+--------------+---------------+-------+-------+
| 0 | active | ceph-host-01 | Reqs: 0 /s | 10 | 13 |
+------+--------+--------------+---------------+-------+-------+
+----------+----------+-------+-------+
| Pool | type | used | avail |
+----------+----------+-------+-------+
| metadata | metadata | 1024k | 580G |
| data | data | 0 | 580G |
+----------+----------+-------+-------+
+--------------+
| Standby MDS |
+--------------+
| ceph-host-02 |
+--------------+
MDS version: ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable)
虽然支持多 active mds并行运行,但官方文档建议保持一个active mds,其他mds作为standby
注1:创建多个Cephfs
[root@ceph-host-01 ~]# ceph osd pool create nova-data 128
pool 'nova-data' created
You have new mail in /var/spool/mail/root
[root@ceph-host-01 ~]# ceph osd pool create nova-metadata 128
pool 'nova-metadata' created
直接创建第二个Cephfs报错如下:
[root@ceph-host-01 ~]# ceph fs new nova nova-metadata nova-data
Error EINVAL: Creation of multiple filesystems is disabled. To enable this experimental feature, use 'ceph fs flag set enable_multiple true'
解决办法:
[root@ceph-host-01 ~]# ceph fs flag set enable_multiple true --yes-i-really-mean-it
[root@ceph-host-01 ~]# ceph fs new nova nova-metadata nova-data
new fs with metadata pool 22 and data pool 21
特别说明:ceph的mds是一个单独的daemon,它只能服务于一个cephfs,若cephfs指定多个rank了,它只能服务于其中一个rank
查看cephfs状态
[root@ceph-host-01 ~]# ceph mds stat
cephfs:1 nova:1 {cephfs:0=ceph-host-01=up:active,nova:0=ceph-host-02=up:active}
[root@ceph-host-01 ~]# ceph fs status cephfs
cephfs - 1 clients
======
+------+--------+--------------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+--------------+---------------+-------+-------+
| 0 | active | ceph-host-01 | Reqs: 0 /s | 10 | 13 |
+------+--------+--------------+---------------+-------+-------+
+----------+----------+-------+-------+
| Pool | type | used | avail |
+----------+----------+-------+-------+
| metadata | metadata | 1024k | 580G |
| data | data | 0 | 580G |
+----------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
+-------------+
MDS version: ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable)
[root@ceph-host-01 ~]# ceph fs status nova
nova - 0 clients
====
+------+--------+--------------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+--------------+---------------+-------+-------+
| 0 | active | ceph-host-02 | Reqs: 0 /s | 10 | 13 |
+------+--------+--------------+---------------+-------+-------+
+---------------+----------+-------+-------+
| Pool | type | used | avail |
+---------------+----------+-------+-------+
| nova-metadata | metadata | 1024k | 580G |
| nova-data | data | 0 | 580G |
+---------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
+-------------+
MDS version: ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable)
[root@ceph-host-01 ~]# ceph fs ls
name: cephfs, metadata pool: metadata, data pools: [data ]
name: nova, metadata pool: nova-metadata, data pools: [nova-data ]
[root@ceph-host-01 ~]# ceph -s
cluster:
id: 272905d2-fd66-4ef6-a772-9cd73a274683
health: HEALTH_WARN
insufficient standby MDS daemons available
services:
mon: 3 daemons, quorum ceph-host-01,ceph-host-02,ceph-host-03 (age 2h)
mgr: ceph-host-02(active, since 2h), standbys: ceph-host-03, ceph-host-01
mds: cephfs:1 nova:1 {cephfs:0=ceph-host-01=up:active,nova:0=ceph-host-02=up:active}
osd: 16 osds: 16 up (since 2h), 16 in (since 2w)
data:
pools: 7 pools, 896 pgs
objects: 2.16k objects, 8.2 GiB
usage: 34 GiB used, 1.2 TiB / 1.2 TiB avail
pgs: 896 active+clean
注2:MDS的故障转移
又添加一个新的mds daemon后,它会处于standby状态,若前两个mds daemon出问题,它会顶替上去,顶替的规则可以配置,详情参考文章:http://docs.ceph.com/docs/master/cephfs/standby/#configuring-standby-daemons
[root@ceph-host-01 ceph-cluster]# ceph-deploy mds create ceph-host-03
查看有三个mds后的状态
[root@ceph-host-01 ~]# ceph mds stat
cephfs:1 nova:1 {cephfs:0=ceph-host-01=up:active,nova:0=ceph-host-02=up:active} 1 up:standby
[root@ceph-host-01 ~]# ceph -s
cluster:
id: 272905d2-fd66-4ef6-a772-9cd73a274683
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph-host-01,ceph-host-02,ceph-host-03 (age 2h)
mgr: ceph-host-02(active, since 2h), standbys: ceph-host-03, ceph-host-01
mds: cephfs:1 nova:1 {cephfs:0=ceph-host-01=up:active,nova:0=ceph-host-02=up:active} 1 up:standby
osd: 16 osds: 16 up (since 2h), 16 in (since 2w)
data:
pools: 7 pools, 896 pgs
objects: 2.16k objects, 8.2 GiB
usage: 34 GiB used, 1.2 TiB / 1.2 TiB avail
pgs: 896 active+clean
io:
client: 4.2 KiB/s rd, 4 op/s rd, 0 op/s wr
[root@ceph-host-01 ~]# ceph fs status
cephfs - 0 clients
======
+------+--------+--------------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+--------------+---------------+-------+-------+
| 0 | active |
ceph-host-01 | Reqs: 0 /s | 14 | 15 |
+------+--------+--------------+---------------+-------+-------+
+----------+----------+-------+-------+
| Pool | type | used | avail |
+----------+----------+-------+-------+
| metadata | metadata | 1920k | 580G |
| data | data | 0 | 580G |
+----------+----------+-------+-------+
nova - 0 clients
====
+------+--------+--------------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+--------------+---------------+-------+-------+
| 0 | active | ceph-host-02 | Reqs: 0 /s | 12 | 15 |
+------+--------+--------------+---------------+-------+-------+
+---------------+----------+-------+-------+
| Pool | type | used | avail |
+---------------+----------+-------+-------+
| nova-metadata | metadata | 1024k | 580G |
| nova-data | data | 0 | 580G |
+---------------+----------+-------+-------+
+--------------+
| Standby MDS |
+--------------+
| ceph-host-03 |
+--------------+
MDS version: ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable)
停止active的MDS
[root@ceph-host-01 ~]# systemctl stop ceph-mds@ceph-host-01.service
查看standby是否顶替上来了
[root@ceph-host-01 ~]# ceph fs status
cephfs - 0 clients
======
+------+--------+--------------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+--------------+---------------+-------+-------+
| 0 | active |
ceph-host-03 | Reqs: 0 /s | 14 | 15 |
+------+--------+--------------+---------------+-------+-------+
+----------+----------+-------+-------+
| Pool | type | used | avail |
+----------+----------+-------+-------+
| metadata | metadata | 1920k | 580G |
| data | data | 0 | 580G |
+----------+----------+-------+-------+
nova - 0 clients
====
+------+--------+--------------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+--------------+---------------+-------+-------+
| 0 | active | ceph-host-02 | Reqs: 0 /s | 12 | 15 |
+------+--------+--------------+---------------+-------+-------+
+---------------+----------+-------+-------+
| Pool | type | used | avail |
+---------------+----------+-------+-------+
| nova-metadata | metadata | 1024k | 580G |
| nova-data | data | 0 | 580G |
+---------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
+-------------+
MDS version: ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable)
注3:多活 MDS
也叫: Multi MDS 、 active-active MDS
每个 CephFS 文件系统默认情况下都只配置一个活跃 MDS 守护进程。在大型系统中,为了扩展元数据性能你可以配置多个活跃的 MDS 守护进程,它们会共同承担元数据负载。
CephFS 在Luminous版本中多元数据服务器(Multi-MDS)的功能和目录分片(dirfragment)的功能宣称已经可以在生产环境中使用。

多活MDS优势
* 当元数据默认的单个 MDS 成为瓶颈时,配置多个活跃的 MDS 守护进程,提升集群性能。
* 多个活跃的 MDS 有利于性能提升。
* 多个活跃的MDS 可以实现MDS负载均衡。
* 多个活跃的MDS 可以实现多租户资源隔离。
多活MDS特点
* 它能够将文件系统树分割成子树。
* 每个子树可以交给特定的MDS进行权威管理。
* 从而达到了随着元数据服务器数量的增加,集群性能线性地扩展。
* 每个子树都是基于元数据在给定目录树中的热动态创建的。
* 一旦创建了子树,它的元数据就被迁移到一个未加载的MDS。
* 后续客户端对先前授权的MDS的请求被转发。
扩容活跃MDS
# ceph mds stat
cephfs:1 nova:1 {cephfs:0=ceph-host-01=up:active,nova:0=ceph-host-02=up:active} 1 up:standby
设置max_mds为2
[root@ceph-host-01 ~]# ceph fs set cephfs max_mds 2
查看多活MDS状态
[root@ceph-host-01 ~]# ceph mds stat
cephfs:2 nova:1 {cephfs:0=ceph-host-01=up:active,cephfs:1=ceph-host-03=up:active,nova:0=ceph-host-02=up:active}
# ceph fs status
cephfs - 0 clients
======
+------+--------+--------------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+--------------+---------------+-------+-------+
| 0 | active | ceph-host-03 | Reqs: 0 /s | 14 | 15 |
| 1 | active | ceph-host-02 | Reqs: 0 /s | 10 | 13 |
+------+--------+--------------+---------------+-------+-------+
+----------+----------+-------+-------+
| Pool | type | used | avail |
+----------+----------+-------+-------+
| metadata | metadata | 1920k | 580G |
| data | data | 0 | 580G |
+----------+----------+-------+-------+
nova - 0 clients
====
+------+--------+--------------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+--------------+---------------+-------+-------+
| 0 | active | ceph-host-01 | Reqs: 0 /s | 12 | 15 |
+------+--------+--------------+---------------+-------+-------+
+---------------+----------+-------+-------+
| Pool | type | used | avail |
+---------------+----------+-------+-------+
| nova-metadata | metadata | 1024k | 580G |
| nova-data | data | 0 | 580G |
+---------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
+-------------+
MDS version: ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable)
注4:默认的数据复制数是3,安全性更高,所以osd最少需要3个,我们可以改变pool里数据的复制数,命令如下
# ceph osd pool set metadata size 2
# ceph osd pool set data size 2
# ceph osd pool set metadata min_size 2
# ceph osd pool set data min_size 2
创建相应(可读写)的认证
[root@ceph-host-01 ~]# ceph auth get-or-create client.fsclient mon 'allow r' mds 'allow rw' osd 'allow rwx pool=data' -o ceph.client.fsclient.keyring
[root@ceph-host-01 ~]# ceph auth get client.fsclient
exported keyring for client.fsclient
[client.fsclient]
key = AQC9A4he42+qFBAA7zvVYCOsiLOJrSfjyFQcFg==
caps mds = "allow rw"
caps mon = "allow r"
caps osd = "allow rwx pool=data"
[root@ceph-host-01 ~]# cat ceph.client.fsclient.keyring
[client.fsclient]
key = AQC9A4he42+qFBAA7zvVYCOsiLOJrSfjyFQcFg==
扩展:
1.生产环境创建只读权限的cephfs认证
ceph auth get-or-create client.r_wk_data mon 'allow r' mds 'allow r' osd 'allow r pool=wk_data'
2.生产环境创建读写权限的cephfs认证
ceph auth get-or-create client.wk_data mon 'allow r' mds 'allow rw' osd 'allow rwx pool=wk_data'
注5:删除文件系统nova及相关池
# ceph fs rm nova --yes-i-really-mean-it
# ceph osd pool rm nova-metadata nova-metadata --yes-i-really-mean-it
# ceph osd pool rm nova-data nova-metadata --yes-i-really-mean-it
注6:Ceph MDS States状态详解
MDS 主从切换流程:
handle_mds_map state change up:boot --> up:replay
handle_mds_map state change up:replay --> up:reconnect
handle_mds_map state change up:reconnect --> up:rejoin
handle_mds_map state change up:rejoin --> up:active

7.挂载fs
7.1 用内核驱动挂载 CEPH 文件系统
提前给客户端安装好内核驱动挂载ceph文件系统所需要的ceph-common软件
yum install -y
https://mirrors.aliyun.com/ceph/rpm-nautilus/el7/noarch/ceph-release-1-0.el7.noarch.rpm
yum install ceph-common -y
注1:ubuntu系统上安装ceph-common或者ceph-fuse
1.添加 release key :
wget -q -O- 'https://download.ceph.com/keys/release.asc' | sudo apt-key add -
2.添加Ceph软件包源,用Ceph稳定版(如 cuttlefish 、 dumpling 、 emperor 、 nautilus等等)替换掉 {ceph-stable-release} 。例如:
echo deb http://download.ceph.com/debian-{ceph-stable-release}/ $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list
示范:安装14.2.9版本(nautilus)的ceph
echo deb http://download.ceph.com/debian-nautilus/ $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list
3.更新你的仓库,并安装
3.1安装ceph-common
sudo apt-get update && sudo apt-get install ceph-common
3.2安装ceph-fuse
sudo apt-get update && sudo apt-get install ceph-fuse
把密钥文件拷贝到客户端上
[root@ceph-host-01 ~]# ceph auth print-key client.fsclient
AQC9A4he42+qFBAA7zvVYCOsiLOJrSfjyFQcFg==
[root@ceph-host-01 ~]# ceph auth print-key client.fsclient >fsclient.key
[root@ceph-host-01 ~]# scp fsclient.key root@node3:/etc/ceph/ #scp上传到客户端的/etc/ceph目录下
客户端上验证是否有ceph模块
[root@node3 ~]# modinfo ceph
filename: /lib/modules/3.10.0-957.el7.x86_64/kernel/fs/ceph/ceph.ko.xz
license: GPL
description: Ceph filesystem for Linux
author: Patience Warnick <patience@newdream.net>
author: Yehuda Sadeh <yehuda@hq.newdream.net>
author: Sage Weil <sage@newdream.net>
alias: fs-ceph
retpoline: Y
rhelversion: 7.6
srcversion: 43DA49DF11334B2A5652931
depends: libceph
intree: Y
vermagic: 3.10.0-957.el7.x86_64 SMP mod_unload modversions
signer: CentOS Linux kernel signing key
sig_key: B7:0D:CF:0D:F2:D9:B7:F2:91:59:24:82:49:FD:6F:E8:7B:78:14:27
sig_hashalgo: sha256
客户端创建挂载点
[root@node3 ~]# mkdir -pv /data
客户端挂载
[root@node3 ~]# mount -t ceph ceph-host-01:6789,ceph-host-02:6789,ceph-host-03:6789:/ /data -o name=fsclient,secret=AQC9A4he42+qFBAA7zvVYCOsiLOJrSfjyFQcFg==
[root@node3 ~]# df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/mapper/centos-root xfs 360G 44G 317G 13% /
devtmpfs devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs tmpfs 3.9G 17M 3.9G 1% /run
tmpfs tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/vda1 xfs 497M 140M 358M 29% /boot
tmpfs tmpfs 782M 0 782M 0% /run/user/0
10.30.1.221:6789,10.30.1.222:6789,10.30.1.223:6789:/ ceph 581G 0 581G 0% /data
注1:使用secretfile参数指定密钥文件来挂载更安全,不会被历史命令记录到密钥
[root@node3 ~]# mount -t ceph ceph-host-01:6789,ceph-host-02:6789,ceph-host-03:6789:/ /data -o name=fsclient,secretfile=/etc/ceph/fsclient.key
[root@node3 ~]# df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/mapper/centos-root xfs 360G 44G 317G 13% /
devtmpfs devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs tmpfs 3.9G 17M 3.9G 1% /run
tmpfs tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/vda1 xfs 497M 140M 358M 29% /boot
tmpfs tmpfs 782M 0 782M 0% /run/user/0
10.30.1.221:6789,10.30.1.222:6789,10.30.1.223:6789:/ ceph 581G 0 581G 0% /data
注2:设置开机启动挂载
[root@node3 ~]# echo "ceph-host-01:6789,ceph-host-02:6789,ceph-host-03:6789:/ /data ceph name=fsclient,secretfile=/etc/ceph/fsclient.key,_netdev 0 0" >> /etc/fstab
注3:挂载多cephfs
创建认证和拷贝密钥到客户端
[root@ceph-host-01 ~]# ceph auth get-or-create client.novafsclient mon 'allow r' mds 'allow rw' osd 'allow rwx pool=nova-data'
[root@ceph-host-01 ~]# ceph auth print-key client.novafsclient | ssh node3 tee /etc/ceph/novafsclient.key
创建挂载目录
[root@node3 ~]# mkdir -pv /nova
卸载原有的那个cephfs
[root@node3 ~]# umount -t ceph /data
重新挂载2个cephfs到客户端
[root@node3 ~]# mount -t ceph ceph-host-01:6789,ceph-host-02:6789,ceph-host-03:6789:/ /data -o mds_namespace=cephfs,name=fsclient,secretfile=/etc/ceph/fsclient.key
[root@node3 ~]#
mount -t ceph ceph-host-01:6789,ceph-host-02:6789,ceph-host-03:6789:/ /nova -o mds_namespace=nova,name=novafsclient,secretfile=/etc/ceph/novafsclient.key
永久挂载
[root@node3 ~]# cat /etc/fstab
#
# /etc/fstab
# Created by anaconda on Mon Dec 23 04:37:50 2019
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/centos-root / xfs defaults 0 0
UUID=1262a46b-e4eb-4e25-9519-39c4f0c45c8e /boot xfs defaults 0 0
ceph-host-01:6789,ceph-host-02:6789,ceph-host-03:6789:/ /data ceph mds_namespace=cephfs,name=fsclient,secretfile=/etc/ceph/fsclient.key,_netdev 0 0
ceph-host-01:6789,ceph-host-02:6789,ceph-host-03:6789:/ /nova ceph mds_namespace=nova,name=novafsclient,secretfile=/etc/ceph/novafsclient.key,_netdev 0 0
验证挂载
[root@node3 ~]# stat -f /data
File: "/data"
ID: 5995b80750841c7 Namelen: 255 Type:
ceph
Block size: 4194304 Fundamental block size: 4194304
Blocks: Total: 148575 Free: 148575 Available: 148575
Inodes: Total: 0 Free: -1
[root@node3 ~]# stat -f /nova
File: "/nova"
ID: 5995b80750841c7 Namelen: 255 Type:
ceph
Block size: 4194304 Fundamental block size: 4194304
Blocks: Total: 148575 Free: 148575 Available: 148575
Inodes: Total: 0 Free: -1
查看cephfs状态
[root@ceph-host-01 ~]# ceph fs get nova
Filesystem 'nova' (4)
fs_name nova
epoch 1439
flags 12
created 2020-04-04 13:04:09.091835
modified 2020-04-04 13:04:11.057747
tableserver 0
root 0
session_timeout 60
session_autoclose 300
max_file_size 1099511627776
min_compat_client -1 (unspecified)
last_failure 0
last_failure_osd_epoch 0
compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
max_mds 1
in 0
up {0=6754123}
failed
damaged
stopped
data_pools [21]
metadata_pool 22
inline_data disabled
balancer
standby_count_wanted 1
6754123: [v2:10.30.1.222:6800/1567901637,v1:10.30.1.222:6801/1567901637] 'ceph-host-02' mds.0.1438 up:active seq 1641
7.2 用户空间挂载 CEPH 文件系统
从用户空间( FUSE )挂载一 Ceph 文件系统前,确保客户端主机有一份 Ceph 配置副本、和具备 Ceph 元数据服务器能力的密钥环。
提前给客户端安装好用户空间挂载ceph文件系统所需要的ceph-common软件
yum install -y
https://mirrors.aliyun.com/ceph/rpm-nautilus/el7/noarch/ceph-release-1-0.el7.noarch.rpm
yum install ceph-fuse -y
7.2.1 在客户端主机上,把监视器主机上的 Ceph 配置文件拷贝到 /etc/ceph/ 目录下。
mkdir -p /etc/ceph
scp root@ceph-host-01:/etc/ceph/ceph.conf /etc/ceph/ceph.conf
7.2.2 在客户端主机上,把监视器主机上的 Ceph 密钥环拷贝到 /etc/ceph 目录下。
ceph auth get client.fsclient >/tmp/ceph.client.fsclient.keyring
scp root@ceph-host-01:/tmp/ceph.client.fsclient.keyring /etc/ceph/ceph.client.fsclient.keyring
ceph.client.fsclient.keyring配置示范
# cat /etc/ceph/ceph.client.fsclient.keyring
[client.fsclient]
key = AQDxJ5heTf20AhAA34vP0xErt2mFHQiuONWTSQ==
caps mds = "allow rw"
caps mon = "allow r"
caps osd = "allow rwx pool=cephfs-data"
7.2.3 确保客户端机器上的 Ceph 配置文件和密钥环都有合适的权限位,如 chmod 644 。
要把 Ceph 文件系统挂载为用户空间文件系统,可以用 ceph-fuse 命令,例如:
mkdir -pv /ceph_data
ceph-fuse -n client.fsclient /ceph_data
上面命令的补全也可以如下:
ceph-fuse --keyring /etc/ceph/ceph.client.fsclient.keyring --name client.fsclient -m ceph-host-01:6789,ceph-host-02:6789 /ceph_data
开机挂载
echo "id=fsclient,keyring=/etc/ceph/ceph.client.fsclient.keyring /ceph_data fuse.ceph defaults 0 0" >> /etc/fstab
none /ceph_data fuse.ceph ceph.id=fsclient,ceph.conf=/etc/ceph/ceph.conf,_netdev,defaults 0 0
7.2.4 卸载
卸载:fusemount -u <mount_point>
扩展:
nova与ceph结合
在所有计算节点上,把创建成功的ceph文件系统的volume,挂在到/var/lib/nova/instances目录:
mount -t ceph <CEPH集群mds节点IP>:6789/ /var/lib/nova/instances -o name=admin,secret={ceph.client.admin.key}
chown -R nova:nova /var/lib/nova/instances
示范操作
创建MDS并创建相应的存储池
[root@ceph-host-01 ceph-cluster]# ceph-deploy mds create ceph-host-01
[root@ceph-host-01 ceph-cluster]# ceph mds stat
1 up:standby
[root@ceph-host-01 ceph-cluster]# ceph osd pool create nova-metadata 128
[root@ceph-host-01 ceph-cluster]# ceph osd pool create nova-data 128
[root@ceph-host-01 ceph-cluster]# ceph fs new nova nova-metadata nova-data
注:默认的数据复制数是3,安全性更高,所以osd最少需要3个,我们可以改变pool里数据的复制数,命令如下
# ceph osd pool set nova-metadata size 2
# ceph osd pool set nova-data size 2
# ceph osd pool set nova-metadata min_size 2
# ceph osd pool set nova-data min_size 2
在计算节点挂载
挂载(当前生效)
[root@node3 ~]# mount -t ceph 10.30.1.221:6789:/ /var/lib/nova/instances/ -o name=admin,secret=AQA8HzdeFQuPHxAAUfjHnOMSfFu7hHIoGv/x1A==
[root@node3 ~]# chown -R nova:nova /var/lib/nova/instances
挂载(永久)
[root@node3 ~]# echo "10.30.1.221:6789:/ /var/lib/nova/instances ceph name=admin,secret=AQA8HzdeFQuPHxAAUfjHnOMSfFu7hHIoGv/x1A==,_netdev 0 0" >> /etc/fstab
定时监测挂载是否失效,并重新挂载
[root@node3 ~]# echo '*/3 * * * * root if [ `mount | grep ceph | wc -l` -eq 0 ] ; then mount -t ceph 10.30.1.221:6789:/ /var/lib/nova/instances/ -o name=admin,secret=AQA8HzdeFQuPHxAAUfjHnOMSfFu7hHIoGv/x1A== ; fi' >>/etc/crontab
注:secret值的查看方法
# cat /etc/ceph/ceph.client.admin.keyring
[client.admin]
key = AQA8HzdeFQuPHxAAUfjHnOMSfFu7hHIoGv/x1A==
创建云主机后查看使用情况
[root@node3 ~]# df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/mapper/centos-root xfs 200G 3.4G 197G 2% /
devtmpfs devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs tmpfs 3.9G 17M 3.9G 1% /run
tmpfs tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/vda1 xfs 497M 140M 358M 29% /boot
tmpfs tmpfs 782M 0 782M 0% /run/user/0
10.30.1.221:6789:/ ceph 277G 2.2G 275G 1% /var/lib/nova/instances
[root@node3 ~]# tree /var/lib/nova/instances
/var/lib/nova/instances
├── 1878b03d-aa3e-4424-8325-ae3bafce0e6a
│ └── disk.info
├── 3b394c96-94a4-4b98-b55b-cac54ef31282
│ └── disk.info
├── 4dd899dc-df13-4853-b70f-2359db577b2d
│ └── disk.info
├── 52fce24f-c8bc-4bb2-8675-cc0cfe4d3678
│ └── disk.info
├── 5632d386-5cb2-4887-9f48-11bcb709ba5f
│ └── disk.info
├── 59cd7399-202c-44b8-918d-9e9acb0cc2e5
│ └── disk.info
├── 60599ade-f271-42ee-9edc-cfe59b4d2459
│ └── disk.info
├── 6937ed06-8cc0-47d0-8a36-59cbf9981337
│ └── disk.info
├── aa852ceb-700f-4e00-a338-faa137b6dbf6
│ └── disk.info
├── _base
│ ├── a36c45ee0cb50b3d5f57afcff5c9a552becfe68b.converted
│ └── a36c45ee0cb50b3d5f57afcff5c9a552becfe68b.part
├── c45a024c-d944-4135-82da-03251f694b72
│ └── disk.info
├── e4607eff-5d40-4238-ab79-903bba641dd8
│ └── disk.info
└── locks
└── nova-a36c45ee0cb50b3d5f57afcff5c9a552becfe68b
13 directories, 14 files
root@node1 ~]# openstack server list
+--------------------------------------+-----------------+--------+---------------------------------------------+------------+--------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+-----------------+--------+---------------------------------------------+------------+--------+
| 4924b0a7-aad6-447e-b340-a2116f56a4a6 | nova-create-vm9 | ACTIVE | vlan99=172.16.99.139; vlan809=192.168.9.219 | CentOS 7.5 | 1c1g |
| b60a7bd4-8515-4020-b635-00c656928dcc | nova-create-vm8 | ACTIVE | vlan99=172.16.99.138; vlan809=192.168.9.218 | CentOS 7.5 | 1c1g |
| a91c9082-72fe-4c4e-b864-6bdf4b5b3c65 | nova-create-vm7 | ACTIVE | vlan99=172.16.99.137; vlan809=192.168.9.217 | CentOS 7.5 | 1c1g |
| ce3a4dab-9e2d-4c66-8d8c-974dd30ca65a | nova-create-vm6 | ACTIVE | vlan99=172.16.99.136; vlan809=192.168.9.216 | CentOS 7.5 | 1c1g |
| 4c94d4d4-9074-405b-a570-768dc1c1b5a4 | nova-create-vm5 | ACTIVE | vlan99=172.16.99.135; vlan809=192.168.9.215 | CentOS 7.5 | 1c1g |
| a56a700e-f0e1-4845-9eb7-84d77fbf683d | nova-create-vm4 | ACTIVE | vlan99=172.16.99.134; vlan809=192.168.9.214 | CentOS 7.5 | 1c1g |
| c237cbb8-62a6-4bfd-be95-009aaa30c3bf | nova-create-vm3 | ACTIVE | vlan99=172.16.99.133; vlan809=192.168.9.213 | CentOS 7.5 | 1c1g |
| d89a137d-53c5-448e-8592-6b06eac00af7 | nova-create-vm2 | ACTIVE | vlan99=172.16.99.132; vlan809=192.168.9.212 | CentOS 7.5 | 1c1g |
| 38764d77-73ee-4030-9dc5-51effe6cfa95 | nova-create-vm1 | ACTIVE | vlan99=172.16.99.131; vlan809=192.168.9.211 | CentOS 7.5 | 1c1g |
+--------------------------------------+-----------------+--------+---------------------------------------------+------------+--------+
[root@node1 ~]# ceph df
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 1.2 TiB 1.1 TiB 5.7 GiB 21 GiB 1.75
TOTAL 1.2 TiB 1.1 TiB 5.7 GiB 21 GiB 1.75
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL
nova-metadata 6 3.6 MiB 23 16 MiB 0 276 GiB
nova-data 7 1.2 GiB 372 5.0 GiB 0.45 276 GiB
[root@node1 ~]# ceph -s
cluster:
id: 272905d2-fd66-4ef6-a772-9cd73a274683
health: HEALTH_WARN
1 daemons have recently crashed
services:
mon: 3 daemons, quorum ceph-host-01,ceph-host-02,ceph-host-03 (age 15m)
mgr: ceph-host-01(active, since 38m), standbys: ceph-host-03, ceph-host-02
mds: nova:1 {0=ceph-host-01=up:active} 1 up:standby
osd: 15 osds: 15 up (since 13m), 15 in (since 107m)
data:
pools: 2 pools, 128 pgs
objects: 415 objects, 1.4 GiB
usage: 21 GiB used, 1.1 TiB / 1.2 TiB avail
pgs: 128 active+clean
io:
client: 3.2 MiB/s rd, 174 KiB/s wr, 123 op/s rd, 23 op/s wr
关于环境的清理
$ ceph-deploy purge ceph-host-01 ceph-host-02 ceph-host-03 ceph-host-04 // 会移除所有与ceph相关的
$ ceph-deploy purgedata ceph-host-01 ceph-host-02 ceph-host-03 ceph-host-04
$ ceph-deploy forgetkeys
关于报错:
报错1:
[ceph-mon01][DEBUG ] --> Finished Dependency Resolution
[ceph-mon01][WARNIN] Error: Package: 2:librgw2-14.2.9-0.el7.x86_64 (Ceph)
[ceph-mon01][WARNIN] Requires: liblttng-ust.so.0()(64bit)
[ceph-mon01][WARNIN] Error: Package: 2:ceph-common-14.2.9-0.el7.x86_64 (Ceph)
[ceph-mon01][WARNIN] Requires: libbabeltrace.so.1()(64bit)
[ceph-mon01][WARNIN] Error: Package: 2:ceph-common-14.2.9-0.el7.x86_64 (Ceph)
[ceph-mon01][WARNIN] Requires: libbabeltrace-ctf.so.1()(64bit)
[ceph-mon01][WARNIN] Error: Package: 2:ceph-mon-14.2.9-0.el7.x86_64 (Ceph)
[ceph-mon01][WARNIN] Requires: libleveldb.so.1()(64bit)
[ceph-mon01][WARNIN] Error: Package: 2:librgw2-14.2.9-0.el7.x86_64 (Ceph)
[ceph-mon01][WARNIN] Requires: liboath.so.0()(64bit)
[ceph-mon01][WARNIN] Error: Package: 2:ceph-osd-14.2.9-0.el7.x86_64 (Ceph)
[ceph-mon01][WARNIN] Requires: libleveldb.so.1()(64bit)
[ceph-mon01][WARNIN] Error: Package: 2:ceph-common-14.2.9-0.el7.x86_64 (Ceph)
[ceph-mon01][WARNIN] Requires: liboath.so.0()(64bit)
[ceph-mon01][WARNIN] Error: Package: 2:ceph-common-14.2.9-0.el7.x86_64 (Ceph)
[ceph-mon01][WARNIN] Requires: libleveldb.so.1()(64bit)
[ceph-mon01][WARNIN] Error: Package: 2:ceph-common-14.2.9-0.el7.x86_64 (Ceph)
[ceph-mon01][WARNIN] Requires: liboath.so.0(LIBOATH_1.10.0)(64bit)
[ceph-mon01][WARNIN] Error: Package: 2:librbd1-14.2.9-0.el7.x86_64 (Ceph)
解决办法:
yum install epel-release -y
注:这一步非常重要,如果跳过这一步,直接进行ceph的安装,那么会报如下的错误:
报错2:
health: HEALTH_WARN
clock skew detected on mon.ceph-host-02, mon.ceph-host-03
这个是时间同步造成的
# ansible ceph -a 'yum install ntpdate -y'
# ansible ceph -a 'systemctl stop ntpdate'
# ansible ceph -a 'ntpdate time.windows.com'
每个ceph节点都设置开机启动同步时间并做定时同步:
[root@ceph-host-01 ~]# cat /etc/rc.d/rc.local
#!/bin/bash
# THIS FILE IS ADDED FOR COMPATIBILITY PURPOSES
#
# It is highly advisable to create own systemd services or udev rules
# to run scripts during boot instead of using this file.
#
# In contrast to previous versions due to parallel execution during boot
# this script will NOT be run after all other services.
#
# Please note that you must run 'chmod +x /etc/rc.d/rc.local' to ensure
# that this script will be executed during boot.
timedatectl set-timezone Asia/Shanghai && ntpdate
time1.aliyun.com && hwclock -w >/dev/null 2>&1
touch /var/lock/subsys/local
[root@ceph-host-01 ~]# chmod +x /etc/rc.d/rc.local
[root@ceph-host-01 ~]# systemctl enable rc-local
[root@ceph-host-01 ~]# echo '*/5 * * * * root timedatectl set-timezone Asia/Shanghai && ntpdate
time1.aliyun.com && hwclock -w >/dev/null 2>&1' >> /etc/crontab
注:centos7的时间同步使用chrony更好,具体步骤如下
yum install chrony -y
systemctl start chronyd
systemctl enable chronyd
# cat /etc/chrony.conf | grep -v '^#\|^$'
server 0.centos.pool.ntp.org iburst
server 1.centos.pool.ntp.org iburst
server 2.centos.pool.ntp.org iburst
server 3.centos.pool.ntp.org iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
报错3:
# ceph status
cluster:
id: 04d85079-c2ef-47c8-a8bb-c6cb13db3cc4
health: HEALTH_WARN
62 daemons have recently crashed
解决办法:
# ceph crash archive-all
作者:Dexter_Wang 工作岗位:某互联网公司资深云计算与存储工程师 联系邮箱:993852246@qq.com