ceph高可用分布式存储集群01-ceph分布式存储集群介绍和生产环境部署


Ceph是一个可靠、自动重均衡、自动恢复的分布式存储系统,根据场景划分可以将Ceph分为三大块,分别是对象存储、块设备和文件系统服务。块设备存储是Ceph的强项。
Ceph的主要优点是分布式存储,在存储每一个数据时,都会通过计算得出该数据存储的位置,尽量将数据分布均衡,不存在传统的单点故障的问题,可以水平扩展。

Ceph架构

 
 
 
 
 
 
 
 
 
 
Ceph架构
RADOS自身是一个完整的分布式对象存储系统,它具有可靠、智能、分布式等特性,Ceph的高可靠、高可拓展、高性能、高自动化都是由这一层来提供的,用户数据的存储最终也都是通过这一层来进行存储的,RADOS可以说就是Ceph的核心组件。
RADOS系统主要由两部分组成,分别是OSD和Monitor。
基于RADOS层的上一层是LIBRADOS,LIBRADOS是一个库,它允许应用程序通过访问该库来与RADOS系统进行交互,支持多种编程语言,比如C、C++、Python等。
基于LIBRADOS层开发的又可以看到有三层,分别是RADOSGW、RBD和CEPH FS。
RADOSGW:RADOSGW是一套基于当前流行的RESTFUL协议的网关,并且兼容S3和Swift。
RBD:RBD通过Linux内核客户端和QEMU/KVM驱动来提供一个分布式的块设备。
CEPH FS:CEPH FS通过Linux内核客户端和FUSE来提供一个兼容POSIX的文件系统。

Ceph核心组件RADOS

RADOS系统主要由两部分组成,分别是OSD和Monitor。
Ceph OSD:OSD的英文全称是Object Storage Device,它的主要功能是存储数据、复制数据、平衡数据、恢复数据等,与其它OSD间进行心跳检查等,并将一些变化情况上报给Ceph Monitor。一般情况下一块硬盘对应一个OSD,由OSD来对硬盘存储进行管理,当然一个分区也可以成为一个OSD。
Ceph Monitor:由该英文名字我们可以知道它是一个监视器,负责监视Ceph集群,维护Ceph集群的健康状态,同时维护着Ceph集群中的各种Map图,比如OSD Map、Monitor Map、PG Map和CRUSH Map,这些Map统称为Cluster Map,Cluster Map是RADOS的关键数据结构,管理集群中的所有成员、关系、属性等信息以及数据的分发,比如当用户需要存储数据到Ceph集群时,OSD需要先通过Monitor获取最新的Map图,然后根据Map图和object id等计算出数据最终存储的位置。
为保证高可用性, Ceph 存储集群应该保存两份以上的对象副本。Ceph OSD 守护进程自动在其它 Ceph 节点上创建对象副本来确保数据安全和高可用性。
Ceph 监视器维护着集群运行图的主副本。为保证高可用性,监视器也实现了集群化。一个监视器集群确保了当某个监视器失效时的高可用性。

Ceph数据分布算法

Ceph是为大规模分布式存储而设计的,数据分布算法必须能够满足在大规模的集群下数据依然能够快速的准确的计算存放位置,同时能够在硬件故障或扩展硬件设备时做到尽可能小的数据迁移,Ceph的CRUSH算法就是精心为这些特性设计的。
在说明CRUSH算法的基本原理之前,先介绍几个概念和它们之间的关系。
Object: 当用户要将数据存储到Ceph集群时,存储数据都会被分割成多个Object,每个Object都有一个object id,每个Object的大小是可以设置的,默认是4MB,Object可以看成是Ceph存储的最小存储单元。
PG:由于Object的数量很多,所以Ceph引入了PG的概念用于管理Object,每个Object最后都会通过CRUSH计算映射到某个PG中,一个PG可以包含多个Object。
PG与OSD的关系:PG也需要通过CRUSH计算映射到OSD中去存储,如果是二副本的,则每个PG都会映射到二个OSD,比如[OSD#1,OSD#2],那么OSD#1是存放该PG的主副本,OSD#2是存放该PG的从副本,保证了数据的冗余。
把对象映射到归置组在 OSD 和客户端间创建了一个间接层。由于 Ceph 集群必须能增大或缩小、并动态地重均衡。如果让客户端“知道”哪个 OSD 有哪个对象,就会导致客户端和 OSD 紧耦合;相反, CRUSH 算法把对象映射到归置组、然后再把各归置组映射到一或多个 OSD ,这一间接层可以让 Ceph 在 OSD 守护进程和底层设备上线时动态地重均衡。下列图表描述了 CRUSH 如何将对象映射到归置组、再把归置组映射到 OSD 。
PG和PGP的关系:pg是用来存放object的,pgp相当于是pg存放osd的一种排列组合,我举个例子,比如有3个osd,osd.1、osd.2和osd.3,副本数是2,如果pgp的数目为1,那么pg存放的osd组合就只有一种,可能是[osd.1,osd.2],那么所有的pg主从副本分别存放到osd.1和osd.2,如果pgp设为2,那么其osd组合可以两种,可能是[osd.1,osd.2]和[osd.1,osd.3],是不是很像我们高中数学学过的排列组合,pgp就是代表这个意思。一般来说应该将pg和pgp的数量设置为相等。
object、pg、pool、osd、存储磁盘的关系
本质上CRUSH算法是根据存储设备的权重来计算数据对象的分布的,权重的设计可以根据该磁盘的容量和读写速度来设置,比如根据容量大小可以将1T的硬盘设备权重设为1,2T的就设为2,在计算过程中,CRUSH是根据Cluster Map、数据分布策略和一个随机数共同决定数组最终的存储位置的。
Cluster Map里的内容信息包括存储集群中可用的存储资源及其相互之间的空间层次关系,比如集群中有多少个支架,每个支架中有多少个服务器,每个服务器有多少块磁盘用以OSD等。
数据分布策略是指可以通过Ceph管理者通过配置信息指定数据分布的一些特点,比如管理者配置的故障域是Host,也就意味着当有一台Host起不来时,数据能够不丢失,CRUSH可以通过将每个pg的主从副本分别存放在不同Host的OSD上即可达到,不单单可以指定Host,还可以指定机架等故障域,除了故障域,还有选择数据冗余的方式,比如副本数或纠删码。
CEPH网络配置参考
网络配置对构建高性能 Ceph 存储集群来说相当重要。 Ceph 存储集群不会代表 Ceph 客户端执行请求路由或调度,相反, Ceph 客户端(如块设备、 CephFS 、 REST 网关)直接向 OSD 请求,然后OSD为客户端执行数据复制,也就是说复制和其它因素会额外增加集群网的负载。
我们的快速入门配置提供了一个简陋的 Ceph 配置文件,其中只设置了监视器 IP 地址和守护进程所在的主机名。如果没有配置集群网,那么 Ceph 假设你只有一个“公共网”。只用一个网可以运行 Ceph ,但是在大型集群里用单独的“集群”网可显著地提升性能。
我们建议用两个网络运营 Ceph 存储集群:一个公共网(前端)和一个集群网(后端)。为此,各节点得配备多个网卡。
运营两个独立网络的考量主要有:
1. 性能: OSD 为客户端处理数据复制,复制多份时 OSD 间的网络负载势必会影响到客户端和 Ceph 集群的通讯,包括延时增加、产生性能问题;恢复和重均衡也会显著增加公共网延时。关于 Ceph 如何复制参见伸缩性和高可用性;关于心跳流量参见监视器与 OSD 的交互。
2. 安全: 大多数人都是良民,很少的一撮人喜欢折腾拒绝服务攻击( DoS )。当 OSD 间的流量失控时,归置组再也不能达到 active + clean 状态,这样用户就不能读写数据了。挫败此类攻击的一种好方法是维护一个完全独立的集群网,使之不能直连互联网;另外,请考虑用消息签名防止欺骗攻击。
 
使用ceph-deploy工具部署ceph
官方中文文档:  http://docs.ceph.org.cn/
实验环境
10.30.1.221 192.168.9.211 ceph-host-01
10.30.1.222 192.168.9.212 ceph-host-02
10.30.1.223 192.168.9.213 ceph-host-03
10.30.1.224 192.168.9.214 ceph-host-04
 
系统:CentOS7.6
每个主机上有2块空闲盘
 
ceph集群节点系统这里采用了centos7.6 64位。总共5台ceph节点机,每台节点机启动2个osd角色,每个osd对应一块物理磁盘。
 
对于Ceph 10.x,最好使用4.x内核。如果必须使用老内核,你应该使用FUSE作为客户端
升级系统内核
cat  >>/etc/yum.repos.d/CentOS-altarch.repo<<EOF
# CentOS-Base.repo
#
# The mirror system uses the connecting IP address of the client and the
# update status of each mirror to pick mirrors that are updated to and
# geographically close to the client. You should use this for CentOS updates
# unless you are manually picking other mirrors.
#
# If the mirrorlist= does not work for you, as a fall back you can try the
# remarked out baseurl= line instead.
#
#
[kernel]
name=CentOS-$releasever - Kernel
baseurl=https://mirrors.tuna.tsinghua.edu.cn/centos-altarch/7/kernel/x86_64/
enabled=1
gpgcheck=0
EOF
 
yum clean all
 
yum install  kernel -y
更新引导
grub2-mkconfig -o /boot/grub2/grub.cfg
grub2-set-default 0
系统优化
echo '* - nofile 65535' >> /etc/security/limits.conf
ulimit -SHn 65535
 
cat > /etc/sysctl.conf  <<EOF
kernel.sysrq = 0
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
net.core.wmem_default = 8388608
net.core.rmem_default = 8388608
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.somaxconn = 262144
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.ip_forward = 0
net.ipv4.ip_local_port_range = 5000 65000
net.ipv4.tcp_fin_timeout = 1
net.ipv4.tcp_keepalive_time = 30
net.ipv4.tcp_max_orphans = 3276800
net.ipv4.tcp_max_syn_backlog = 262144
net.ipv4.tcp_max_tw_buckets = 6000
net.ipv4.tcp_mem = 94500000 915000000 927000000
net.ipv4.tcp_no_metrics_save=1
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_sack = 1
net.ipv4.tcp_syn_retries = 1
net.ipv4.tcp_synack_retries = 1
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_wmem = 4096 16384 16777216
fs.file-max=65536
fs.inotify.max_queued_events=99999999
fs.inotify.max_user_watches=99999999
fs.inotify.max_user_instances=65535
net.core.default_qdisc=fq
EOF
 
sysctl -p
 
关闭selinux和防火墙
setenforce 0
sed -i 's/SELINUX=.*/SELINUX=disabled/' /etc/selinux/config
systemctl stop firewalld
systemctl disable firewalld
systemctl disable NetworkManager
systemctl stop NetworkManager
 
安装网络守时服务
Openstack节点之间必须时间同步,不然可能会导致创建云主机不成功。
# yum install chrony -y
# vim /etc/chrony.conf #修改NTP配置
server 0.centos.pool.ntp.org iburst
server 1.centos.pool.ntp.org iburst
server 2.centos.pool.ntp.org iburst
server 3.centos.pool.ntp.org iburst
 
# systemctl enable chronyd.service#设置NTP服务开机启动
# systemctl start chronyd.service#启动NTP对时服务
# chronyc sources#验证NTP对时服务
210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
^? ControllerNode                0   6     0     -     +0ns[   +0ns] +/-    0ns
 
设置时区
timedatectl set-timezone Asia/Shanghai
 
常用软件包安装
yum install -y vim net-tools wget lrzsz deltarpm tree screen lsof tcpdump nmap sysstat iftop
更换centos源
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
 
提前安装好epel源
yum install epel-release -y
注:使用阿里的epel源会使安装变快点
wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo
 
1.安装ceph-deloy
 
1.1配置主机名,配置host文件,本例ceph-deploy安装在其中一个节点上。
[root@ceph-host-01 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.30.1.221 ceph-host-01
10.30.1.222 ceph-host-02
10.30.1.223 ceph-host-03
10.30.1.224 ceph-host-04
 
注:主机名一定要于/etc/hosts中的一致
 
1.2使用ssh-keygen生成key,并用ssh-copy-id复制key到各节点机。
[root@ceph-host-01 ~]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:iVPfxuQVphRA8v2//XsM+PxzWjYrx5JnnHTbBdNYwTw root@ceph-host-01
The key's randomart image is:
+---[RSA 2048]----+
|        ..o.o.=..|
|         o o o E.|
|        . . + .+.|
|       o o = o+ .|
|      o S . =..o |
|       .   .. .oo|
|             o=+X|
|             +o%X|
|              B*X|
+----[SHA256]-----+
 
以将key复制到ceph-host-02为例
[root@ceph-host-01 ~]# ssh-copy-id ceph-host-02
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'ceph-host-02 (10.30.1.222)' can't be established.
ECDSA key fingerprint is SHA256:VsMfdmYFzxV1dxKZi2OSp8QluRVQ1m2lT98cJt4nAFU.
ECDSA key fingerprint is MD5:de:07:2f:5c:13:9b:ba:0b:e5:0e:c2:db:3e:b8:ab:bd.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@ceph-host-02's password:
 
Number of key(s) added: 1
 
Now try logging into the machine, with:   "ssh 'ceph-host-02'"
and check to make sure that only the key(s) you wanted were added.
 
1.3安装ceph-deploy.
 
安装前我们配置下yum源,这里使用的是较新的nautilus版本
[root@ceph-host-01 ~]# cat /etc/yum.repos.d/ceph.repo
[Ceph]
name=Ceph packages for $basearch
baseurl=http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/$basearch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
 
[Ceph-noarch]
name=Ceph noarch packages
baseurl=http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/noarch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
 
[ceph-source]
name=Ceph source packages
baseurl=http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/SRPMS
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
 
注:直接安装官方ceph源的命令如下
 
[root@ceph-host-01 ~]# yum install ceph-deploy  python-setuptools python2-subprocess32 -y
 
2.创建ceph monitor角色
2.1在使用ceph-deploy部署的过程中会产生一些配置文件,建议先创建一个目录,例如cpeh-cluster
 
[root@ceph-host-01 ~]# mkdir -pv ceph-cluster
[root@ceph-host-01 ~]# cd ceph-cluster
 
2.2初始化mon节点,准备创建集群:
[root@ceph-host-01 ceph-cluster]# ceph-deploy new  ceph-host-01 ceph-host-02 ceph-host-03
更改生成的 ceph 集群配置文件
[root@ceph-host-01 ceph-cluster]# cat ceph.conf
[global]
fsid = a480fcef-1c4b-48cb-998d-0caed867b5eb
mon_initial_members = ceph-host-01, ceph-host-02, ceph-host-03
mon_host = 10.30.1.221,10.30.1.222,10.30.1.223
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
mon clock drift allowed = 2
mon clock drift warn backoff = 30
 
public_network = 10.30.1.0/24
cluster_network = 192.168.9.0/24
 
max_open_files = 131072
mon_pg_warn_max_per_osd = 1000
mon_max_pg_per_osd = 1000
osd pool default pg num = 256
osd pool default pgp num = 256
osd pool default size = 3
osd pool default min size = 1
 
mon_osd_full_ratio = .90
mon_osd_nearfull_ratio = .80
osd_deep_scrub_randomize_ratio = 0.01
 
[mon]
mon_allow_pool_delete = true
mon_osd_down_out_interval = 600
mon_osd_min_down_reporters = 3
[mgr]
mgr modules = dashboard
[mds]
mds cache memory limit = 10737418240
mds cache size = 250000
mds_max_export_size = 20971520
mds_bal_interval = 10
mds_bal_sample_interval = 3.000000
 
[osd]
osd_journal_size = 20480
osd_max_write_size = 1024
osd mkfs type = xfs
osd_recovery_op_priority = 1
osd_recovery_max_active = 1
osd_recovery_max_single_start = 1
osd_recovery_threads = 1
osd_recovery_max_chunk = 1048576
osd_max_backfills = 1
osd_scrub_begin_hour = 22
osd_scrub_end_hour = 7
osd_recovery_sleep = 0
 
[client]
rbd_cache = true
rbd_cache_writethrough_until_flush = true
rbd_concurrent_management_ops = 10
rbd_cache_size = 67108864
rbd_cache_max_dirty = 50331648
rbd_cache_target_dirty = 33554432
rbd_cache_max_dirty_age = 2
rbd_default_format = 2
 
注:以上是经过考虑后的优化配置,生产环境对配置进行增删后谨慎使用
 
2.3所有节点安装ceph程序
使用ceph-deploy来安装ceph程序,也可以单独到每个节点上手动安装ceph,根据配置的yum源不同,会安装不同版本的ceph
[root@ceph-host-01 ceph-cluster]# ceph-deploy install  --no-adjust-repos ceph-host-01 ceph-host-02 ceph-host-03 ceph-host-04
# 不加--no-adjust-repos 会一直使用ceph-deploy提供的默认的源,很坑
 
提示:若需要在集群各节点独立安装ceph程序包,其方法如下:
# yum install ceph ceph-radosgw -y
2.4配置初始mon节点,并收集所有密钥
[root@ceph-host-01 ceph-cluster]# ceph-deploy mon create-initial 
 
2.5查看启动服务
# ps -ef|grep ceph
ceph        1916       1  0 12:05 ?        00:00:03 /usr/bin/ceph-mon -f --cluster ceph --id ceph-host-01 --setuser ceph --setgroup ceph
 
2.6在管理节点把配置文件和 admin 密钥拷贝到管理节点和 Ceph 节点
 
[root@ceph-host-01 ceph-cluster]# ceph-deploy admin ceph-host-01 ceph-host-02 ceph-host-03 ceph-host-04
 
在每个节点上赋予 ceph.client.admin.keyring 有操作权限
# chmod +r /etc/ceph/ceph.client.admin.keyring
 
或者使用ansible批量给ceph节点添加权限
# ansible ceph -a 'chmod +r /etc/ceph/ceph.client.admin.keyring'
 
3.创建ceph osd角色(osd部署)
 
新版ceph-deploy直接使用create
相当于prepare,activate,osd create --bluestore
ceph-deploy osd create --data /dev/vdb ceph-host-01
ceph-deploy osd create --data /dev/vdb ceph-host-02
ceph-deploy osd create --data /dev/vdb ceph-host-03
ceph-deploy osd create --data /dev/vdb ceph-host-04
 
注:如果磁盘已经有数据一定要擦除,示范命令如下
ceph-deploy disk zap ceph-host-02 /dev/vdb
 
4.创建mgr角色
自从ceph 12开始,manager是必须的。应该为每个运行monitor的机器添加一个mgr,否则集群处于WARN状态。
 
[root@ceph-host-01 ceph-cluster]# ceph-deploy mgr create ceph-host-01 ceph-host-02 ceph-host-03
 
5.查看集群健康状态
[root@ceph-host-03 ~]# ceph health
HEALTH_OK
[root@ceph-host-03 ~]# ceph -s
  cluster:
    id:     02e63c58-5200-45c9-b592-07624f4893a5
    health: HEALTH_OK
  services:
    mon: 3 daemons, quorum ceph-host-01,ceph-host-02,ceph-host-03 (age 59m)
    mgr: ceph-host-01(active, since 4m), standbys: ceph-host-02, ceph-host-03
    osd: 4 osds: 4 up (since 87m), 4 in (since 87m)
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   5.0 GiB used, 90 GiB / 95 GiB avail
    pgs:     
 
再添加osd
ceph-deploy osd create --data /dev/vdc ceph-host-01
ceph-deploy osd create --data /dev/vdc ceph-host-02
ceph-deploy osd create --data /dev/vdc ceph-host-03
ceph-deploy osd create --data /dev/vdc ceph-host-04
 
查看状态
 
[root@ceph-host-01 ceph-cluster]# ceph osd tree
ID  CLASS WEIGHT  TYPE NAME             STATUS REWEIGHT PRI-AFF
-1       0.18585 root default                                  
-3       0.03717     host ceph-host-01                         
  0   hdd 0.01859         osd.0             up  1.00000 1.00000
  4   hdd 0.01859         osd.4             up  1.00000 1.00000
-5       0.03717     host ceph-host-02                         
  1   hdd 0.01859         osd.1             up  1.00000 1.00000
  5   hdd 0.01859         osd.5             up  1.00000 1.00000
-7       0.03717     host ceph-host-03                         
  2   hdd 0.01859         osd.2             up  1.00000 1.00000
  6   hdd 0.01859         osd.6             up  1.00000 1.00000
-9       0.03717     host ceph-host-04                         
  3   hdd 0.01859         osd.3             up  1.00000 1.00000
  7   hdd 0.01859         osd.7             up  1.00000 1.00000
 
注:查看每个osd的权重和磁盘使用情况
[root@ceph-host-02 ceph-cluster]# ceph osd df  
ID CLASS WEIGHT  REWEIGHT SIZE    RAW USE DATA    OMAP    META    AVAIL   %USE  VAR  PGS STATUS
0   hdd 1.81940  1.00000 1.8 TiB 732 GiB 731 GiB   8 KiB 1.2 GiB 1.1 TiB 39.30 1.18  86     up
1   hdd 1.81940  1.00000 1.8 TiB 956 GiB 955 GiB  40 KiB 1.5 GiB 907 GiB 51.33 1.54  85     up
2   hdd 1.81940  1.00000 1.8 TiB 826 GiB 825 GiB  48 KiB 1.5 GiB 1.0 TiB 44.36 1.33  74     up
3    hdd 5.45799  1.00000 5.5 TiB 1.0 GiB  12 MiB     0 B   1 GiB 5.5 TiB  0.02    0  90     up
4   hdd 1.81940  1.00000 1.8 TiB 939 GiB 938 GiB  39 KiB 1.5 GiB 924 GiB 50.42 1.51  89     up
5   hdd 1.81940  1.00000 1.8 TiB 1.0 TiB 1.0 TiB   3 KiB 1.9 GiB 834 GiB 55.24 1.66 109     up
6   hdd 1.81940  1.00000 1.8 TiB 808 GiB 806 GiB  52 KiB 1.4 GiB 1.0 TiB 43.36 1.30  90     up
7   hdd 1.81940  1.00000 1.8 TiB 919 GiB 917 GiB  48 KiB 1.5 GiB 945 GiB 49.30 1.48  88     up
                    TOTAL  18 TiB 6.1 TiB 6.1 TiB 240 KiB  11 GiB  12 TiB 33.34                 
MIN/MAX VAR: 0/1.66  STDDEV: 18.43
[root@ceph-host-02 ceph-cluster]# ceph osd status
+----+--------------+-------+-------+--------+---------+--------+---------+-----------+
| id |     host     |  used | avail | wr ops | wr data | rd ops | rd data |   state   |
+----+--------------+-------+-------+--------+---------+--------+---------+-----------+
| 0  | ceph-host-01 |  732G | 1130G |    1   |  4096k  |    0   |     0   | exists,up |
| 1  | ceph-host-01 |  956G |  906G |    4   |  19.2M  |    0   |     0   | exists,up |
| 2  | ceph-host-02 |  826G | 1036G |    0   |  3276k  |    0   |     0   | exists,up |
| 3  | ceph-host-02 | 1035M | 5588G |    0   |     0   |    0   |     0   | exists,up |
| 4  | ceph-host-03 |  939G |  923G |    3   |  14.4M  |    0   |     0   | exists,up |
| 5  | ceph-host-03 | 1029G |  833G |    0   |  3413k  |    0   |     0   | exists,up |
| 6  | ceph-host-04 |  808G | 1054G |    4   |  16.0M  |    0   |     0   | exists,up |
| 7  | ceph-host-04 |  918G |  944G |    2   |  10.4M  |    0   |     0   | exists,up |
+----+--------------+-------+-------+--------+---------+--------+---------+-----------+
 
查看挂载
[root@ceph-host-02 ~]# df -hT
Filesystem     Type      Size  Used Avail Use% Mounted on
/dev/vda1      xfs        20G  1.5G   19G   8% /
devtmpfs       devtmpfs  475M     0  475M   0% /dev
tmpfs          tmpfs     496M     0  496M   0% /dev/shm
tmpfs          tmpfs     496M   13M  483M   3% /run
tmpfs          tmpfs     496M     0  496M   0% /sys/fs/cgroup
tmpfs          tmpfs     100M     0  100M   0% /run/user/0
tmpfs          tmpfs     496M   52K  496M   1% /var/lib/ceph/osd/ceph-1
tmpfs          tmpfs     496M   52K  496M   1% /var/lib/ceph/osd/ceph-5
注1:mon和mgr角色其实各一个节点就行,但是为了保障高可用,推荐多节点,示范添加命令如下
# ceph-deploy --overwrite-conf mon add  ceph-host-03
# ceph-deploy --overwrite-conf mgr create  ceph-host-03
 
注2:如果需要把某个mon节点剔除,示范命令如下
ceph-deploy mon destroy ceph-host-02
 
注3:当某个节点无法加入到mon集群中时,需要检查各mon节点的ceph.conf配置,要保持一致并有新的节点在配置中步骤如下
ceph-deploy mon destroy ceph-host-02
 
确保deploy节点的ceph-cluster/ceph.conf和/etc/ceph/ceph.conf如下
[global]
fsid = a480fcef-1c4b-48cb-998d-0caed867b5eb
mon_initial_members = ceph-host-01, ceph-host-02, ceph-host-03
mon_host = 10.30.1.221,10.30.1.222,10.30.1.223
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
mon clock drift allowed = 2
mon clock drift warn backoff = 30
 
# 网络配置
public_network = 10.30.1.0/24
cluster_network = 192.168.9.0/24
 
max_open_files = 131072
mon_pg_warn_max_per_osd = 1000
mon_max_pg_per_osd = 1000
osd pool default size = 3
osd pool default min size = 2
 
mon_osd_full_ratio = .90
mon_osd_nearfull_ratio = .80
osd_deep_scrub_randomize_ratio = 0.01
 
[mon]
mon_allow_pool_delete = true
[mgr]
mgr modules = dashboard
[mds]
mds cache memory limit = 10737418240
mds cache size = 250000
 
[osd]
osd_max_write_size = 1024
osd_recovery_op_priority = 1
osd_recovery_max_active = 1
osd_recovery_max_single_start = 1
osd_recovery_max_chunk = 1048576
osd_recovery_threads = 1
osd_max_backfills = 1
osd_scrub_begin_hour = 22
osd_scrub_end_hour = 7
osd_recovery_sleep = 0
osd_crush_update_on_start = false
ceph-deploy --overwrite-conf mon add ceph-host-02
 
注4:把deploy节点的ceph.conf配置推送到别的机器可以使用如下命令
# ceph-deploy --overwrite-conf config push ceph-host-01 ceph-host-02 ceph-host-04
重启服务使参数修改生效,命令:
systemctl restart ceph-mgr.target
systemctl restart ceph.target
5.创建和删除ceph存储池
5.1创建
[root@ceph-host-01 ceph-cluster]# ceph osd pool create volumes 128
pool 'volumes' created
5.2删除
[root@ceph-host-02 ~]# ceph osd pool rm volumes volumes --yes-i-really-really-mean-it
pool 'volumes' removed
 
6.部署CEPH-FS
6.1简介
cephfs是ceph提供的兼容POSIX协议的文件系统,对比rbd和rgw功能,这个是ceph里最晚满足production ready的一个功能,它底层还是使用rados存储数据
 
 
 
使用cephfs的两种方式
1. cephfs kernel module
2. cephfs-fuse
从上面的架构可以看出,cephfs-fuse的IO path比较长,性能会比cephfs kernel module的方式差一些;
client端访问cephfs的流程
 
1. client端与mds节点通讯,获取metadata信息(metadata也存在osd上)
2. client直接写数据到osd
 
6.2 示范操作
http://docs.ceph.com/docs/master/rados/operations/placement-groups/
至少在一个节点运行ceph-mds守护进程
[root@ceph-host-01 ~]# cd ceph-cluster/
[root@ceph-host-01 ceph-cluster]# ceph-deploy mds create ceph-host-01 ceph-host-02 
创建存储池
[root@ceph-host-01 ceph-cluster]# ceph osd pool create data 128
[root@ceph-host-01 ceph-cluster]# ceph osd pool create metadata 128
激活文件系统
[root@ceph-host-01 ceph-cluster]# ceph fs new cephfs metadata data
查看文件系统
[root@ceph-host-01 ceph-cluster]# ceph fs ls
name: cephfs, metadata pool: metadata, data pools: [data ]
[root@ceph-host-01 ceph-cluster]# ceph mds stat
cephfs:1 {0=ceph-host-02=up:active} 2 up:standby
[root@ceph-host-01 ceph-cluster]# ceph fs status cephfs
cephfs - 0 clients
======
+------+--------+--------------+---------------+-------+-------+
| Rank | State  |     MDS      |    Activity   |  dns  |  inos |
+------+--------+--------------+---------------+-------+-------+
|  0   | active | ceph-host-01 | Reqs:    0 /s |   10  |   13  |
+------+--------+--------------+---------------+-------+-------+
+----------+----------+-------+-------+
|   Pool   |   type   |  used | avail |
+----------+----------+-------+-------+
| metadata | metadata | 1024k |  580G |
|   data   |   data   |    0  |  580G |
+----------+----------+-------+-------+
+--------------+
| Standby MDS  |
+--------------+
| ceph-host-02 |
+--------------+
MDS version: ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable)
虽然支持多 active mds并行运行,但官方文档建议保持一个active mds,其他mds作为standby
 
注1:创建多个Cephfs
 
[root@ceph-host-01 ~]# ceph osd pool create nova-data 128
pool 'nova-data' created
You have new mail in /var/spool/mail/root
[root@ceph-host-01 ~]# ceph osd pool create nova-metadata 128
pool 'nova-metadata' created
直接创建第二个Cephfs报错如下:
[root@ceph-host-01 ~]# ceph fs new nova nova-metadata nova-data
Error EINVAL: Creation of multiple filesystems is disabled.  To enable this experimental feature, use 'ceph fs flag set enable_multiple true'
解决办法:
[root@ceph-host-01 ~]# ceph fs flag set enable_multiple true --yes-i-really-mean-it
[root@ceph-host-01 ~]# ceph fs new nova nova-metadata nova-data
new fs with metadata pool 22 and data pool 21
 
特别说明:ceph的mds是一个单独的daemon,它只能服务于一个cephfs,若cephfs指定多个rank了,它只能服务于其中一个rank
 
查看cephfs状态
[root@ceph-host-01 ~]# ceph mds stat
cephfs:1 nova:1 {cephfs:0=ceph-host-01=up:active,nova:0=ceph-host-02=up:active}
[root@ceph-host-01 ~]# ceph fs status cephfs
cephfs - 1 clients
======
+------+--------+--------------+---------------+-------+-------+
| Rank | State  |     MDS      |    Activity   |  dns  |  inos |
+------+--------+--------------+---------------+-------+-------+
|  0   | active | ceph-host-01 | Reqs:    0 /s |   10  |   13  |
+------+--------+--------------+---------------+-------+-------+
+----------+----------+-------+-------+
|   Pool   |   type   |  used | avail |
+----------+----------+-------+-------+
| metadata | metadata | 1024k |  580G |
|   data   |   data   |    0  |  580G |
+----------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
+-------------+
MDS version: ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable)
[root@ceph-host-01 ~]# ceph fs status nova
nova - 0 clients
====
+------+--------+--------------+---------------+-------+-------+
| Rank | State  |     MDS      |    Activity   |  dns  |  inos |
+------+--------+--------------+---------------+-------+-------+
|  0   | active | ceph-host-02 | Reqs:    0 /s |   10  |   13  |
+------+--------+--------------+---------------+-------+-------+
+---------------+----------+-------+-------+
|      Pool     |   type   |  used | avail |
+---------------+----------+-------+-------+
| nova-metadata | metadata | 1024k |  580G |
|   nova-data   |   data   |    0  |  580G |
+---------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
+-------------+
MDS version: ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable)
[root@ceph-host-01 ~]# ceph fs ls
name: cephfs, metadata pool: metadata, data pools: [data ]
name: nova, metadata pool: nova-metadata, data pools: [nova-data ]
[root@ceph-host-01 ~]# ceph -s
  cluster:
    id:     272905d2-fd66-4ef6-a772-9cd73a274683
    health: HEALTH_WARN
            insufficient standby MDS daemons available
  services:
    mon: 3 daemons, quorum ceph-host-01,ceph-host-02,ceph-host-03 (age 2h)
    mgr: ceph-host-02(active, since 2h), standbys: ceph-host-03, ceph-host-01
    mds: cephfs:1 nova:1 {cephfs:0=ceph-host-01=up:active,nova:0=ceph-host-02=up:active}
    osd: 16 osds: 16 up (since 2h), 16 in (since 2w)
  data:
    pools:   7 pools, 896 pgs
    objects: 2.16k objects, 8.2 GiB
    usage:   34 GiB used, 1.2 TiB / 1.2 TiB avail
    pgs:     896 active+clean
 
注2:MDS的故障转移
又添加一个新的mds daemon后,它会处于standby状态,若前两个mds daemon出问题,它会顶替上去,顶替的规则可以配置,详情参考文章:http://docs.ceph.com/docs/master/cephfs/standby/#configuring-standby-daemons
[root@ceph-host-01 ceph-cluster]#  ceph-deploy mds create ceph-host-03
 
查看有三个mds后的状态
[root@ceph-host-01 ~]# ceph mds stat
cephfs:1 nova:1 {cephfs:0=ceph-host-01=up:active,nova:0=ceph-host-02=up:active} 1 up:standby
[root@ceph-host-01 ~]# ceph -s
  cluster:
    id:     272905d2-fd66-4ef6-a772-9cd73a274683
    health: HEALTH_OK
  services:
    mon: 3 daemons, quorum ceph-host-01,ceph-host-02,ceph-host-03 (age 2h)
    mgr: ceph-host-02(active, since 2h), standbys: ceph-host-03, ceph-host-01
    mds: cephfs:1 nova:1 {cephfs:0=ceph-host-01=up:active,nova:0=ceph-host-02=up:active} 1 up:standby
    osd: 16 osds: 16 up (since 2h), 16 in (since 2w)
  data:
    pools:   7 pools, 896 pgs
    objects: 2.16k objects, 8.2 GiB
    usage:   34 GiB used, 1.2 TiB / 1.2 TiB avail
    pgs:     896 active+clean
  io:
    client:   4.2 KiB/s rd, 4 op/s rd, 0 op/s wr
 
[root@ceph-host-01 ~]# ceph fs status
cephfs - 0 clients
======
+------+--------+--------------+---------------+-------+-------+
| Rank | State  |     MDS      |    Activity   |  dns  |  inos |
+------+--------+--------------+---------------+-------+-------+
|  0   | active | ceph-host-01 | Reqs:    0 /s |   14  |   15  |
+------+--------+--------------+---------------+-------+-------+
+----------+----------+-------+-------+
|   Pool   |   type   |  used | avail |
+----------+----------+-------+-------+
| metadata | metadata | 1920k |  580G |
|   data   |   data   |    0  |  580G |
+----------+----------+-------+-------+
nova - 0 clients
====
+------+--------+--------------+---------------+-------+-------+
| Rank | State  |     MDS      |    Activity   |  dns  |  inos |
+------+--------+--------------+---------------+-------+-------+
|  0   | active | ceph-host-02 | Reqs:    0 /s |   12  |   15  |
+------+--------+--------------+---------------+-------+-------+
+---------------+----------+-------+-------+
|      Pool     |   type   |  used | avail |
+---------------+----------+-------+-------+
| nova-metadata | metadata | 1024k |  580G |
|   nova-data   |   data   |    0  |  580G |
+---------------+----------+-------+-------+
+--------------+
| Standby MDS  |
+--------------+
| ceph-host-03 |
+--------------+
MDS version: ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable)
 
停止active的MDS
[root@ceph-host-01 ~]# systemctl stop ceph-mds@ceph-host-01.service
查看standby是否顶替上来了
[root@ceph-host-01 ~]# ceph fs status
cephfs - 0 clients
======
+------+--------+--------------+---------------+-------+-------+
| Rank | State  |     MDS      |    Activity   |  dns  |  inos |
+------+--------+--------------+---------------+-------+-------+
|  0   | active | ceph-host-03 | Reqs:    0 /s |   14  |   15  |
+------+--------+--------------+---------------+-------+-------+
+----------+----------+-------+-------+
|   Pool   |   type   |  used | avail |
+----------+----------+-------+-------+
| metadata | metadata | 1920k |  580G |
|   data   |   data   |    0  |  580G |
+----------+----------+-------+-------+
nova - 0 clients
====
+------+--------+--------------+---------------+-------+-------+
| Rank | State  |     MDS      |    Activity   |  dns  |  inos |
+------+--------+--------------+---------------+-------+-------+
|  0   | active | ceph-host-02 | Reqs:    0 /s |   12  |   15  |
+------+--------+--------------+---------------+-------+-------+
+---------------+----------+-------+-------+
|      Pool     |   type   |  used | avail |
+---------------+----------+-------+-------+
| nova-metadata | metadata | 1024k |  580G |
|   nova-data   |   data   |    0  |  580G |
+---------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
+-------------+
MDS version: ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable)
 
注3:多活 MDS 
也叫: Multi MDS 、 active-active MDS
每个 CephFS 文件系统默认情况下都只配置一个活跃 MDS 守护进程。在大型系统中,为了扩展元数据性能你可以配置多个活跃的 MDS 守护进程,它们会共同承担元数据负载。
CephFS 在Luminous版本中多元数据服务器(Multi-MDS)的功能和目录分片(dirfragment)的功能宣称已经可以在生产环境中使用。
 
多活MDS优势
* 当元数据默认的单个 MDS 成为瓶颈时,配置多个活跃的 MDS 守护进程,提升集群性能。
* 多个活跃的 MDS 有利于性能提升。
* 多个活跃的MDS 可以实现MDS负载均衡。
* 多个活跃的MDS 可以实现多租户资源隔离。
 
多活MDS特点
* 它能够将文件系统树分割成子树。
* 每个子树可以交给特定的MDS进行权威管理。
* 从而达到了随着元数据服务器数量的增加,集群性能线性地扩展。
* 每个子树都是基于元数据在给定目录树中的热动态创建的。
* 一旦创建了子树,它的元数据就被迁移到一个未加载的MDS。
* 后续客户端对先前授权的MDS的请求被转发。
 
扩容活跃MDS
# ceph mds stat
cephfs:1 nova:1 {cephfs:0=ceph-host-01=up:active,nova:0=ceph-host-02=up:active} 1 up:standby
设置max_mds为2
[root@ceph-host-01 ~]# ceph fs set cephfs max_mds 2
 
查看多活MDS状态
[root@ceph-host-01 ~]# ceph mds stat
cephfs:2 nova:1 {cephfs:0=ceph-host-01=up:active,cephfs:1=ceph-host-03=up:active,nova:0=ceph-host-02=up:active}
 
# ceph fs status
cephfs - 0 clients
======
+------+--------+--------------+---------------+-------+-------+
| Rank | State  |     MDS      |    Activity   |  dns  |  inos |
+------+--------+--------------+---------------+-------+-------+
|  0   | active | ceph-host-03 | Reqs:    0 /s |   14  |   15  |
|  1   | active | ceph-host-02 | Reqs:    0 /s |   10  |   13  |
+------+--------+--------------+---------------+-------+-------+
+----------+----------+-------+-------+
|   Pool   |   type   |  used | avail |
+----------+----------+-------+-------+
| metadata | metadata | 1920k |  580G |
|   data   |   data   |    0  |  580G |
+----------+----------+-------+-------+
nova - 0 clients
====
+------+--------+--------------+---------------+-------+-------+
| Rank | State  |     MDS      |    Activity   |  dns  |  inos |
+------+--------+--------------+---------------+-------+-------+
|  0   | active | ceph-host-01 | Reqs:    0 /s |   12  |   15  |
+------+--------+--------------+---------------+-------+-------+
+---------------+----------+-------+-------+
|      Pool     |   type   |  used | avail |
+---------------+----------+-------+-------+
| nova-metadata | metadata | 1024k |  580G |
|   nova-data   |   data   |    0  |  580G |
+---------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
+-------------+
MDS version: ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable)
 
注4:默认的数据复制数是3,安全性更高,所以osd最少需要3个,我们可以改变pool里数据的复制数,命令如下
# ceph osd pool set metadata size 2
# ceph osd pool set data size 2
# ceph osd pool set metadata min_size 2
# ceph osd pool set data min_size 2
 
创建相应(可读写)的认证
[root@ceph-host-01 ~]# ceph auth get-or-create client.fsclient mon 'allow r' mds 'allow rw' osd 'allow rwx pool=data' -o ceph.client.fsclient.keyring
[root@ceph-host-01 ~]# ceph auth get client.fsclient
exported keyring for client.fsclient
[client.fsclient]
    key = AQC9A4he42+qFBAA7zvVYCOsiLOJrSfjyFQcFg==
    caps mds = "allow rw"
    caps mon = "allow r"
    caps osd = "allow rwx pool=data"
[root@ceph-host-01 ~]# cat ceph.client.fsclient.keyring
[client.fsclient]
    key = AQC9A4he42+qFBAA7zvVYCOsiLOJrSfjyFQcFg==
 
扩展:
1.生产环境创建只读权限的cephfs认证
ceph auth get-or-create client.r_wk_data mon 'allow r' mds 'allow r' osd 'allow r pool=wk_data'
2.生产环境创建读写权限的cephfs认证
ceph auth get-or-create client.wk_data mon 'allow r' mds 'allow rw' osd 'allow rwx pool=wk_data'
 
注5:删除文件系统nova及相关池
 
# ceph fs rm nova  --yes-i-really-mean-it
# ceph osd pool rm nova-metadata nova-metadata  --yes-i-really-mean-it
# ceph osd pool rm nova-data nova-metadata  --yes-i-really-mean-it
 
注6:Ceph MDS States状态详解
 
MDS 主从切换流程:
handle_mds_map state change up:boot --> up:replay
handle_mds_map state change up:replay --> up:reconnect
handle_mds_map state change up:reconnect --> up:rejoin
handle_mds_map state change up:rejoin --> up:active
 
 
7.挂载fs
 
7.1 用内核驱动挂载 CEPH 文件系统
提前给客户端安装好内核驱动挂载ceph文件系统所需要的ceph-common软件
yum install ceph-common -y
注1:ubuntu系统上安装ceph-common或者ceph-fuse
1.添加 release key :
wget -q -O- 'https://download.ceph.com/keys/release.asc' | sudo apt-key add -
2.添加Ceph软件包源,用Ceph稳定版(如 cuttlefish 、 dumpling 、 emperor 、 nautilus等等)替换掉 {ceph-stable-release} 。例如:
echo deb http://download.ceph.com/debian-{ceph-stable-release}/ $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list
示范:安装14.2.9版本(nautilus)的ceph
echo deb http://download.ceph.com/debian-nautilus/ $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list
3.更新你的仓库,并安装
3.1安装ceph-common
sudo apt-get update && sudo apt-get install ceph-common 
3.2安装ceph-fuse
sudo apt-get update && sudo apt-get install ceph-fuse
把密钥文件拷贝到客户端上
[root@ceph-host-01 ~]# ceph auth print-key client.fsclient
AQC9A4he42+qFBAA7zvVYCOsiLOJrSfjyFQcFg==
[root@ceph-host-01 ~]# ceph auth print-key client.fsclient >fsclient.key
[root@ceph-host-01 ~]# scp fsclient.key root@node3:/etc/ceph/  #scp上传到客户端的/etc/ceph目录下
 
客户端上验证是否有ceph模块
[root@node3 ~]# modinfo ceph
filename:       /lib/modules/3.10.0-957.el7.x86_64/kernel/fs/ceph/ceph.ko.xz
license:        GPL
description:    Ceph filesystem for Linux
author:         Patience Warnick <patience@newdream.net>
author:         Yehuda Sadeh <yehuda@hq.newdream.net>
author:         Sage Weil <sage@newdream.net>
alias:          fs-ceph
retpoline:      Y
rhelversion:    7.6
srcversion:     43DA49DF11334B2A5652931
depends:        libceph
intree:         Y
vermagic:       3.10.0-957.el7.x86_64 SMP mod_unload modversions
signer:         CentOS Linux kernel signing key
sig_key:        B7:0D:CF:0D:F2:D9:B7:F2:91:59:24:82:49:FD:6F:E8:7B:78:14:27
sig_hashalgo:   sha256
 
客户端创建挂载点
[root@node3 ~]# mkdir -pv /data
 
客户端挂载
[root@node3 ~]# mount -t ceph ceph-host-01:6789,ceph-host-02:6789,ceph-host-03:6789:/ /data -o name=fsclient,secret=AQC9A4he42+qFBAA7zvVYCOsiLOJrSfjyFQcFg==
[root@node3 ~]# df -hT
Filesystem                                           Type      Size  Used Avail Use% Mounted on
/dev/mapper/centos-root                              xfs       360G   44G  317G  13% /
devtmpfs                                             devtmpfs  3.9G     0  3.9G   0% /dev
tmpfs                                                tmpfs     3.9G     0  3.9G   0% /dev/shm
tmpfs                                                tmpfs     3.9G   17M  3.9G   1% /run
tmpfs                                                tmpfs     3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/vda1                                            xfs       497M  140M  358M  29% /boot
tmpfs                                                tmpfs     782M     0  782M   0% /run/user/0
10.30.1.221:6789,10.30.1.222:6789,10.30.1.223:6789:/ ceph      581G     0  581G   0% /data
 
注1:使用secretfile参数指定密钥文件来挂载更安全,不会被历史命令记录到密钥
[root@node3 ~]# mount -t ceph ceph-host-01:6789,ceph-host-02:6789,ceph-host-03:6789:/ /data -o name=fsclient,secretfile=/etc/ceph/fsclient.key
[root@node3 ~]# df -hT
Filesystem                                           Type      Size  Used Avail Use% Mounted on
/dev/mapper/centos-root                              xfs       360G   44G  317G  13% /
devtmpfs                                             devtmpfs  3.9G     0  3.9G   0% /dev
tmpfs                                                tmpfs     3.9G     0  3.9G   0% /dev/shm
tmpfs                                                tmpfs     3.9G   17M  3.9G   1% /run
tmpfs                                                tmpfs     3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/vda1                                            xfs       497M  140M  358M  29% /boot
tmpfs                                                tmpfs     782M     0  782M   0% /run/user/0
10.30.1.221:6789,10.30.1.222:6789,10.30.1.223:6789:/ ceph      581G     0  581G   0% /data
 
注2:设置开机启动挂载
[root@node3 ~]# echo "ceph-host-01:6789,ceph-host-02:6789,ceph-host-03:6789:/ /data ceph name=fsclient,secretfile=/etc/ceph/fsclient.key,_netdev 0 0" >> /etc/fstab
 
注3:挂载多cephfs
创建认证和拷贝密钥到客户端
 
[root@ceph-host-01 ~]# ceph auth get-or-create client.novafsclient mon 'allow r' mds 'allow rw' osd 'allow rwx pool=nova-data'
[root@ceph-host-01 ~]# ceph auth print-key client.novafsclient | ssh node3 tee /etc/ceph/novafsclient.key
创建挂载目录
[root@node3 ~]# mkdir -pv /nova
卸载原有的那个cephfs
[root@node3 ~]# umount  -t ceph /data
重新挂载2个cephfs到客户端
[root@node3 ~]# mount -t ceph ceph-host-01:6789,ceph-host-02:6789,ceph-host-03:6789:/ /data -o mds_namespace=cephfs,name=fsclient,secretfile=/etc/ceph/fsclient.key
[root@node3 ~]#  mount -t ceph ceph-host-01:6789,ceph-host-02:6789,ceph-host-03:6789:/ /nova -o mds_namespace=nova,name=novafsclient,secretfile=/etc/ceph/novafsclient.key
 
永久挂载
[root@node3 ~]# cat /etc/fstab
#
# /etc/fstab
# Created by anaconda on Mon Dec 23 04:37:50 2019
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/centos-root /                       xfs     defaults        0 0
UUID=1262a46b-e4eb-4e25-9519-39c4f0c45c8e /boot                   xfs     defaults        0 0
ceph-host-01:6789,ceph-host-02:6789,ceph-host-03:6789:/ /data ceph mds_namespace=cephfs,name=fsclient,secretfile=/etc/ceph/fsclient.key,_netdev 0 0
ceph-host-01:6789,ceph-host-02:6789,ceph-host-03:6789:/ /nova ceph mds_namespace=nova,name=novafsclient,secretfile=/etc/ceph/novafsclient.key,_netdev 0 0
 
验证挂载
[root@node3 ~]# stat -f /data
  File: "/data"
    ID: 5995b80750841c7 Namelen: 255     Type: ceph
Block size: 4194304    Fundamental block size: 4194304
Blocks: Total: 148575     Free: 148575     Available: 148575
Inodes: Total: 0          Free: -1
[root@node3 ~]# stat -f /nova
  File: "/nova"
    ID: 5995b80750841c7 Namelen: 255     Type: ceph
Block size: 4194304    Fundamental block size: 4194304
Blocks: Total: 148575     Free: 148575     Available: 148575
Inodes: Total: 0          Free: -1
 
查看cephfs状态
[root@ceph-host-01 ~]# ceph fs get nova
Filesystem 'nova' (4)
fs_name    nova
epoch    1439
flags    12
created    2020-04-04 13:04:09.091835
modified    2020-04-04 13:04:11.057747
tableserver    0
root    0
session_timeout    60
session_autoclose    300
max_file_size    1099511627776
min_compat_client    -1 (unspecified)
last_failure    0
last_failure_osd_epoch    0
compat    compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
max_mds    1
in    0
up    {0=6754123}
failed    
damaged    
stopped    
data_pools    [21]
metadata_pool    22
inline_data    disabled
balancer    
standby_count_wanted    1
6754123:    [v2:10.30.1.222:6800/1567901637,v1:10.30.1.222:6801/1567901637] 'ceph-host-02' mds.0.1438 up:active seq 1641
 
 
7.2 用户空间挂载 CEPH 文件系统
从用户空间( FUSE )挂载一 Ceph 文件系统前,确保客户端主机有一份 Ceph 配置副本、和具备 Ceph 元数据服务器能力的密钥环。
提前给客户端安装好用户空间挂载ceph文件系统所需要的ceph-common软件
yum install ceph-fuse -y
7.2.1 在客户端主机上,把监视器主机上的 Ceph 配置文件拷贝到 /etc/ceph/ 目录下。
mkdir -p /etc/ceph
scp root@ceph-host-01:/etc/ceph/ceph.conf /etc/ceph/ceph.conf
 
7.2.2 在客户端主机上,把监视器主机上的 Ceph 密钥环拷贝到 /etc/ceph 目录下。
ceph auth get client.fsclient >/tmp/ceph.client.fsclient.keyring
scp root@ceph-host-01:/tmp/ceph.client.fsclient.keyring /etc/ceph/ceph.client.fsclient.keyring
 
ceph.client.fsclient.keyring配置示范
#  cat /etc/ceph/ceph.client.fsclient.keyring
[client.fsclient]
        key = AQDxJ5heTf20AhAA34vP0xErt2mFHQiuONWTSQ==
        caps mds = "allow rw"
        caps mon = "allow r"
        caps osd = "allow rwx pool=cephfs-data"
7.2.3 确保客户端机器上的 Ceph 配置文件和密钥环都有合适的权限位,如 chmod 644 。
要把 Ceph 文件系统挂载为用户空间文件系统,可以用 ceph-fuse 命令,例如:
mkdir -pv /ceph_data
ceph-fuse -n client.fsclient /ceph_data
 
上面命令的补全也可以如下:
ceph-fuse  --keyring /etc/ceph/ceph.client.fsclient.keyring --name client.fsclient -m ceph-host-01:6789,ceph-host-02:6789 /ceph_data
 
开机挂载
echo "id=fsclient,keyring=/etc/ceph/ceph.client.fsclient.keyring /ceph_data  fuse.ceph defaults 0 0" >> /etc/fstab
none /ceph_data fuse.ceph ceph.id=fsclient,ceph.conf=/etc/ceph/ceph.conf,_netdev,defaults 0 0
 
7.2.4 卸载
卸载:fusemount -u <mount_point>
 
扩展:
nova与ceph结合
在所有计算节点上,把创建成功的ceph文件系统的volume,挂在到/var/lib/nova/instances目录:
mount -t ceph <CEPH集群mds节点IP>:6789/ /var/lib/nova/instances    -o name=admin,secret={ceph.client.admin.key}
chown -R nova:nova /var/lib/nova/instances 
 
示范操作
创建MDS并创建相应的存储池
[root@ceph-host-01 ceph-cluster]# ceph-deploy mds create ceph-host-01
[root@ceph-host-01 ceph-cluster]# ceph mds stat
1 up:standby
[root@ceph-host-01 ceph-cluster]# ceph osd pool create nova-metadata 128
[root@ceph-host-01 ceph-cluster]# ceph osd pool create nova-data 128
[root@ceph-host-01 ceph-cluster]# ceph fs new nova nova-metadata nova-data
 
注:默认的数据复制数是3,安全性更高,所以osd最少需要3个,我们可以改变pool里数据的复制数,命令如下
# ceph osd pool set nova-metadata size 2
# ceph osd pool set nova-data size 2
# ceph osd pool set nova-metadata min_size 2
# ceph osd pool set nova-data min_size 2
 
在计算节点挂载
 
挂载(当前生效)
[root@node3 ~]# mount -t ceph 10.30.1.221:6789:/ /var/lib/nova/instances/ -o name=admin,secret=AQA8HzdeFQuPHxAAUfjHnOMSfFu7hHIoGv/x1A==
[root@node3 ~]# chown -R nova:nova /var/lib/nova/instances 
挂载(永久)
[root@node3 ~]# echo "10.30.1.221:6789:/ /var/lib/nova/instances ceph name=admin,secret=AQA8HzdeFQuPHxAAUfjHnOMSfFu7hHIoGv/x1A==,_netdev 0 0" >> /etc/fstab
 
定时监测挂载是否失效,并重新挂载
[root@node3 ~]# echo '*/3 * * * * root if [ `mount | grep ceph | wc -l` -eq 0 ] ; then mount -t ceph 10.30.1.221:6789:/ /var/lib/nova/instances/ -o name=admin,secret=AQA8HzdeFQuPHxAAUfjHnOMSfFu7hHIoGv/x1A== ; fi' >>/etc/crontab
 
注:secret值的查看方法
# cat /etc/ceph/ceph.client.admin.keyring
[client.admin]
    key = AQA8HzdeFQuPHxAAUfjHnOMSfFu7hHIoGv/x1A==
 
创建云主机后查看使用情况
[root@node3 ~]# df -hT
Filesystem              Type      Size  Used Avail Use% Mounted on
/dev/mapper/centos-root xfs       200G  3.4G  197G   2% /
devtmpfs                devtmpfs  3.9G     0  3.9G   0% /dev
tmpfs                   tmpfs     3.9G     0  3.9G   0% /dev/shm
tmpfs                   tmpfs     3.9G   17M  3.9G   1% /run
tmpfs                   tmpfs     3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/vda1               xfs       497M  140M  358M  29% /boot
tmpfs                   tmpfs     782M     0  782M   0% /run/user/0
10.30.1.221:6789:/      ceph      277G  2.2G  275G   1% /var/lib/nova/instances
 
[root@node3 ~]# tree /var/lib/nova/instances
/var/lib/nova/instances
├── 1878b03d-aa3e-4424-8325-ae3bafce0e6a
│   └── disk.info
├── 3b394c96-94a4-4b98-b55b-cac54ef31282
│   └── disk.info
├── 4dd899dc-df13-4853-b70f-2359db577b2d
│   └── disk.info
├── 52fce24f-c8bc-4bb2-8675-cc0cfe4d3678
│   └── disk.info
├── 5632d386-5cb2-4887-9f48-11bcb709ba5f
│   └── disk.info
├── 59cd7399-202c-44b8-918d-9e9acb0cc2e5
│   └── disk.info
├── 60599ade-f271-42ee-9edc-cfe59b4d2459
│   └── disk.info
├── 6937ed06-8cc0-47d0-8a36-59cbf9981337
│   └── disk.info
├── aa852ceb-700f-4e00-a338-faa137b6dbf6
│   └── disk.info
├── _base
│   ├── a36c45ee0cb50b3d5f57afcff5c9a552becfe68b.converted
│   └── a36c45ee0cb50b3d5f57afcff5c9a552becfe68b.part
├── c45a024c-d944-4135-82da-03251f694b72
│   └── disk.info
├── e4607eff-5d40-4238-ab79-903bba641dd8
│   └── disk.info
└── locks
    └── nova-a36c45ee0cb50b3d5f57afcff5c9a552becfe68b
 
13 directories, 14 files
 
root@node1 ~]# openstack server list
+--------------------------------------+-----------------+--------+---------------------------------------------+------------+--------+
| ID                                   | Name            | Status | Networks                                    | Image      | Flavor |
+--------------------------------------+-----------------+--------+---------------------------------------------+------------+--------+
| 4924b0a7-aad6-447e-b340-a2116f56a4a6 | nova-create-vm9 | ACTIVE | vlan99=172.16.99.139; vlan809=192.168.9.219 | CentOS 7.5 | 1c1g   |
| b60a7bd4-8515-4020-b635-00c656928dcc | nova-create-vm8 | ACTIVE | vlan99=172.16.99.138; vlan809=192.168.9.218 | CentOS 7.5 | 1c1g   |
| a91c9082-72fe-4c4e-b864-6bdf4b5b3c65 | nova-create-vm7 | ACTIVE | vlan99=172.16.99.137; vlan809=192.168.9.217 | CentOS 7.5 | 1c1g   |
| ce3a4dab-9e2d-4c66-8d8c-974dd30ca65a | nova-create-vm6 | ACTIVE | vlan99=172.16.99.136; vlan809=192.168.9.216 | CentOS 7.5 | 1c1g   |
| 4c94d4d4-9074-405b-a570-768dc1c1b5a4 | nova-create-vm5 | ACTIVE | vlan99=172.16.99.135; vlan809=192.168.9.215 | CentOS 7.5 | 1c1g   |
| a56a700e-f0e1-4845-9eb7-84d77fbf683d | nova-create-vm4 | ACTIVE | vlan99=172.16.99.134; vlan809=192.168.9.214 | CentOS 7.5 | 1c1g   |
| c237cbb8-62a6-4bfd-be95-009aaa30c3bf | nova-create-vm3 | ACTIVE | vlan99=172.16.99.133; vlan809=192.168.9.213 | CentOS 7.5 | 1c1g   |
| d89a137d-53c5-448e-8592-6b06eac00af7 | nova-create-vm2 | ACTIVE | vlan99=172.16.99.132; vlan809=192.168.9.212 | CentOS 7.5 | 1c1g   |
| 38764d77-73ee-4030-9dc5-51effe6cfa95 | nova-create-vm1 | ACTIVE | vlan99=172.16.99.131; vlan809=192.168.9.211 | CentOS 7.5 | 1c1g   |
+--------------------------------------+-----------------+--------+---------------------------------------------+------------+--------+
[root@node1 ~]# ceph df
RAW STORAGE:
    CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
    hdd       1.2 TiB     1.1 TiB     5.7 GiB       21 GiB          1.75
    TOTAL     1.2 TiB     1.1 TiB     5.7 GiB       21 GiB          1.75
POOLS:
    POOL              ID     STORED      OBJECTS     USED        %USED     MAX AVAIL
    nova-metadata      6     3.6 MiB          23      16 MiB         0       276 GiB
    nova-data          7     1.2 GiB         372     5.0 GiB      0.45       276 GiB
 
[root@node1 ~]# ceph -s
  cluster:
    id:     272905d2-fd66-4ef6-a772-9cd73a274683
    health: HEALTH_WARN
            1 daemons have recently crashed
  services:
    mon: 3 daemons, quorum ceph-host-01,ceph-host-02,ceph-host-03 (age 15m)
    mgr: ceph-host-01(active, since 38m), standbys: ceph-host-03, ceph-host-02
    mds: nova:1 {0=ceph-host-01=up:active} 1 up:standby
    osd: 15 osds: 15 up (since 13m), 15 in (since 107m)
 
  data:
    pools:   2 pools, 128 pgs
    objects: 415 objects, 1.4 GiB
    usage:   21 GiB used, 1.1 TiB / 1.2 TiB avail
    pgs:     128 active+clean
  io:
    client:   3.2 MiB/s rd, 174 KiB/s wr, 123 op/s rd, 23 op/s wr
 
关于环境的清理
$ ceph-deploy purge ceph-host-01 ceph-host-02 ceph-host-03 ceph-host-04   // 会移除所有与ceph相关的
$ ceph-deploy purgedata ceph-host-01 ceph-host-02 ceph-host-03 ceph-host-04 
$ ceph-deploy forgetkeys
 
关于报错:
报错1:
[ceph-mon01][DEBUG ] --> Finished Dependency Resolution
[ceph-mon01][WARNIN] Error: Package: 2:librgw2-14.2.9-0.el7.x86_64 (Ceph)
[ceph-mon01][WARNIN]            Requires: liblttng-ust.so.0()(64bit)
[ceph-mon01][WARNIN] Error: Package: 2:ceph-common-14.2.9-0.el7.x86_64 (Ceph)
[ceph-mon01][WARNIN]            Requires: libbabeltrace.so.1()(64bit)
[ceph-mon01][WARNIN] Error: Package: 2:ceph-common-14.2.9-0.el7.x86_64 (Ceph)
[ceph-mon01][WARNIN]            Requires: libbabeltrace-ctf.so.1()(64bit)
[ceph-mon01][WARNIN] Error: Package: 2:ceph-mon-14.2.9-0.el7.x86_64 (Ceph)
[ceph-mon01][WARNIN]            Requires: libleveldb.so.1()(64bit)
[ceph-mon01][WARNIN] Error: Package: 2:librgw2-14.2.9-0.el7.x86_64 (Ceph)
[ceph-mon01][WARNIN]            Requires: liboath.so.0()(64bit)
[ceph-mon01][WARNIN] Error: Package: 2:ceph-osd-14.2.9-0.el7.x86_64 (Ceph)
[ceph-mon01][WARNIN]            Requires: libleveldb.so.1()(64bit)
[ceph-mon01][WARNIN] Error: Package: 2:ceph-common-14.2.9-0.el7.x86_64 (Ceph)
[ceph-mon01][WARNIN]            Requires: liboath.so.0()(64bit)
[ceph-mon01][WARNIN] Error: Package: 2:ceph-common-14.2.9-0.el7.x86_64 (Ceph)
[ceph-mon01][WARNIN]            Requires: libleveldb.so.1()(64bit)
[ceph-mon01][WARNIN] Error: Package: 2:ceph-common-14.2.9-0.el7.x86_64 (Ceph)
[ceph-mon01][WARNIN]            Requires: liboath.so.0(LIBOATH_1.10.0)(64bit)
[ceph-mon01][WARNIN] Error: Package: 2:librbd1-14.2.9-0.el7.x86_64 (Ceph)
 
解决办法:
yum install epel-release -y
注:这一步非常重要,如果跳过这一步,直接进行ceph的安装,那么会报如下的错误:
 
报错2:
health: HEALTH_WARN
clock skew detected on mon.ceph-host-02, mon.ceph-host-03
这个是时间同步造成的
# ansible ceph -a 'yum install ntpdate -y'
# ansible ceph -a 'systemctl stop ntpdate'
# ansible ceph -a 'ntpdate time.windows.com'
 
每个ceph节点都设置开机启动同步时间并做定时同步:
[root@ceph-host-01 ~]# cat /etc/rc.d/rc.local
#!/bin/bash
# THIS FILE IS ADDED FOR COMPATIBILITY PURPOSES
#
# It is highly advisable to create own systemd services or udev rules
# to run scripts during boot instead of using this file.
#
# In contrast to previous versions due to parallel execution during boot
# this script will NOT be run after all other services.
#
# Please note that you must run 'chmod +x /etc/rc.d/rc.local' to ensure
# that this script will be executed during boot.
timedatectl set-timezone Asia/Shanghai && ntpdate time1.aliyun.com && hwclock -w >/dev/null 2>&1
touch /var/lock/subsys/local
[root@ceph-host-01 ~]# chmod +x /etc/rc.d/rc.local
[root@ceph-host-01 ~]# systemctl enable rc-local
[root@ceph-host-01 ~]# echo '*/5 * * * * root timedatectl set-timezone Asia/Shanghai && ntpdate time1.aliyun.com && hwclock -w >/dev/null 2>&1' >> /etc/crontab
 
注:centos7的时间同步使用chrony更好,具体步骤如下
yum install chrony -y
systemctl start chronyd
systemctl enable chronyd
 
# cat /etc/chrony.conf | grep -v '^#\|^$'
server 0.centos.pool.ntp.org iburst
server 1.centos.pool.ntp.org iburst
server 2.centos.pool.ntp.org iburst
server 3.centos.pool.ntp.org iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
 
报错3:
# ceph status
  cluster:
    id:     04d85079-c2ef-47c8-a8bb-c6cb13db3cc4
    health: HEALTH_WARN
            62 daemons have recently crashed
 
解决办法:
# ceph crash archive-all
 
作者:Dexter_Wang   工作岗位:某互联网公司资深云计算与存储工程师  联系邮箱:993852246@qq.com


免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM