centos7 MFS drbd keepalived


環境:

centos7.3 + moosefs 3.0.97 + drbd84-utils-8.9.8-1 + keepalived-1.2.13-9

 

工作原理:

 

 

架構圖:

 

 

節點信息:

節點名               MFS角色                         主機名              IP

node1               master & metalogger       node1              172.16.0.41

node2               master & metalogger       node2              172.16.0.42

node3               chunk server                   node3              172.16.0.43     

node4               chunk server                   node4              172.16.0.44

node5               chunk server                   node5              172.16.0.45

node6               client                               node5              172.16.0.11

node7               client                               node5              172.16.0.12

node8               client                               node5              172.16.0.13

vip                                                          mfsmaster        172.16.0.47

說明: 

1) 在兩台MFS Master機服務器安裝DRBD做網絡磁盤,網絡磁盤上存放mfs master的meta文件。
2)在兩台機器上都安裝keepalived,兩台服務器上有一個VIP漂移。keepalived通過檢測腳本來檢測服務器狀態,當一台有問題時,VIP自動切換到另一台上。
3)client、chunk server、 metalogger都是連接的VIP,所以當其中一台服務器掛掉后,並不影響服務。

 

node1 node2綁定hosts

cat /etc/hosts

172.16.0.41    node1
172.16.0.42    node2
172.16.0.43    node3
172.16.0.44    node4
172.16.0.45    node5
172.16.0.11    node6
172.16.0.12    node7
172.16.0.13    node8
172.16.0.47    mfsmaster

 

node3 - node8上綁定hosts, cat /etc/hosts

172.16.0.47    mfsmaster

 

安裝MFS yum庫(所有節點)

Import the public key:
curl "http://ppa.moosefs.com/RPM-GPG-KEY-MooseFS" > /etc/pki/rpm-gpg/RPM-GPG-KEY-MooseFS


To install ELRepo for RHEL-7, SL-7 or CentOS-7:
curl "http://ppa.moosefs.com/MooseFS-3-el7.repo" > /etc/yum.repos.d/MooseFS.repo



To install ELRepo for RHEL-6, SL-6 or CentOS-6:
curl "http://ppa.moosefs.com/MooseFS-3-el6.repo" > /etc/yum.repos.d/MooseFS.repo

 

安裝elreo庫(node1 node2)

Get started

Import the public key:
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org

Detailed info on the GPG key used by the ELRepo Project can be found on https://www.elrepo.org/tiki/key
If you have a system with Secure Boot enabled, please see the SecureBootKey page for more information.

To install ELRepo for RHEL-7, SL-7 or CentOS-7:
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm

To make use of our mirror system, please also install yum-plugin-fastestmirror. To install ELRepo for RHEL-6, SL-6 or CentOS-6: rpm -Uvh http://www.elrepo.org/elrepo-release-6-8.el6.elrepo.noarch.rpm

 

 

創建分區(node1、node2)

node1、node2添加單獨的硬盤做DRBD,大小一樣

 基於LVM,方便擴容

[root@mfs-n1 /]# fdisk /dev/vdb 
Welcome to fdisk (util-linux 2.23.2).

Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): p

Disk /dev/vdb: 64.4 GB, 64424509440 bytes, 125829120 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0xdaf38769

   Device Boot      Start         End      Blocks   Id  System

Command (m for help): n
Partition type:
   p   primary (0 primary, 0 extended, 4 free)
   e   extended
Select (default p): p
Partition number (1-4, default 1): 
First sector (2048-125829119, default 2048): 
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-125829119, default 125829119): 
Using default value 125829119
Partition 1 of type Linux and of size 60 GiB is set

Command (m for help): m
Command action
   a   toggle a bootable flag
   b   edit bsd disklabel
   c   toggle the dos compatibility flag
   d   delete a partition
   g   create a new empty GPT partition table
   G   create an IRIX (SGI) partition table
   l   list known partition types
   m   print this menu
   n   add a new partition
   o   create a new empty DOS partition table
   p   print the partition table
   q   quit without saving changes
   s   create a new empty Sun disklabel
   t   change a partition's system id
   u   change display/entry units
   v   verify the partition table
   w   write table to disk and exit
   x   extra functionality (experts only)

Command (m for help): t
Selected partition 1
Hex code (type L to list all codes): L

 0  Empty           24  NEC DOS         81  Minix / old Lin bf  Solaris        
 1  FAT12           27  Hidden NTFS Win 82  Linux swap / So c1  DRDOS/sec (FAT-
 2  XENIX root      39  Plan 9          83  Linux           c4  DRDOS/sec (FAT-
 3  XENIX usr       3c  PartitionMagic  84  OS/2 hidden C:  c6  DRDOS/sec (FAT-
 4  FAT16 <32M      40  Venix 80286     85  Linux extended  c7  Syrinx         
 5  Extended        41  PPC PReP Boot   86  NTFS volume set da  Non-FS data    
 6  FAT16           42  SFS             87  NTFS volume set db  CP/M / CTOS / .
 7  HPFS/NTFS/exFAT 4d  QNX4.x          88  Linux plaintext de  Dell Utility   
 8  AIX             4e  QNX4.x 2nd part 8e  Linux LVM       df  BootIt         
 9  AIX bootable    4f  QNX4.x 3rd part 93  Amoeba          e1  DOS access     
 a  OS/2 Boot Manag 50  OnTrack DM      94  Amoeba BBT      e3  DOS R/O        
 b  W95 FAT32       51  OnTrack DM6 Aux 9f  BSD/OS          e4  SpeedStor      
 c  W95 FAT32 (LBA) 52  CP/M            a0  IBM Thinkpad hi eb  BeOS fs        
 e  W95 FAT16 (LBA) 53  OnTrack DM6 Aux a5  FreeBSD         ee  GPT            
 f  W95 Ext'd (LBA) 54  OnTrackDM6      a6  OpenBSD         ef  EFI (FAT-12/16/
10  OPUS            55  EZ-Drive        a7  NeXTSTEP        f0  Linux/PA-RISC b
11  Hidden FAT12    56  Golden Bow      a8  Darwin UFS      f1  SpeedStor      
12  Compaq diagnost 5c  Priam Edisk     a9  NetBSD          f4  SpeedStor      
14  Hidden FAT16 <3 61  SpeedStor       ab  Darwin boot     f2  DOS secondary  
16  Hidden FAT16    63  GNU HURD or Sys af  HFS / HFS+      fb  VMware VMFS    
17  Hidden HPFS/NTF 64  Novell Netware  b7  BSDI fs         fc  VMware VMKCORE 
18  AST SmartSleep  65  Novell Netware  b8  BSDI swap       fd  Linux raid auto
1b  Hidden W95 FAT3 70  DiskSecure Mult bb  Boot Wizard hid fe  LANstep        
1c  Hidden W95 FAT3 75  PC/IX           be  Solaris boot    ff  BBT            
1e  Hidden W95 FAT1 80  Old Minix      
Hex code (type L to list all codes): 8e       
Changed type of partition 'Linux' to 'Linux LVM'

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

pvcreate /dev/vdb1             #創建物理卷

vgcreate vgdrbr /dev/vdb1      #創建卷組 

vgdisplay   (查看邏輯卷組)

vgdisplay 
  --- Volume group ---
  VG Name               vgdrbr
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  1
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                0
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               <60.00 GiB
  PE Size               4.00 MiB
  Total PE              15359
  Alloc PE / Size       0 / 0   
  Free  PE / Size       15359 / <60.00 GiB
  VG UUID               jgrCsU-CQmQ-l2yz-O375-qFQq-Rho9-r6eY54
   

lvcreate -l +15359 -n mfs vgdrbr        #創建邏輯卷

lvdisplay 
  --- Logical volume ---
  LV Path                /dev/vgdrbr/mfs
  LV Name                mfs
  VG Name                vgdrbr
  LV UUID                3pb6ZJ-aMIu-PVbU-PAID-ozvB-XVpz-hdfK1c
  LV Write Access        read/write
  LV Creation host, time mfs-n1, 2017-11-08 10:14:00 +0800
  LV Status              available
  # open                 2
  LV Size                <60.00 GiB
  Current LE             15359
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192
  Block device           253:3

 映射分區為 /dev/mapper/vgdrbr-mfs

 

DRBD

安裝DRBD(node1 node2 )

yum -y install drbd84-utils kmod-drbd84  # 有更新的版本 yum install  -y  drbd90  kmod-drbd90

 

加載DRBD模塊:

# modprobe drbd

查看DRBD模塊是否加載到內核:

# lsmod |grep drbd

 

配置DRBD

node1:

cat /etc/drbd.conf 

# You can find an example in  /usr/share/doc/drbd.../drbd.conf.example

include "drbd.d/global_common.conf";
include "drbd.d/*.res";

 

cat /etc/drbd.d/global_common.conf

# DRBD is the result of over a decade of development by LINBIT.
# In case you need professional services for DRBD or have
# feature requests visit http://www.linbit.com

global {
    usage-count no;
    # minor-count dialog-refresh disable-ip-verification
    # cmd-timeout-short 5; cmd-timeout-medium 121; cmd-timeout-long 600;
}

common {
    handlers {
        # These are EXAMPLE handlers only.
        # They may have severe implications,
        # like hard resetting the node under certain circumstances.
        # Be careful when chosing your poison.

        # pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
        # pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
        # local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
        # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
        # split-brain "/usr/lib/drbd/notify-split-brain.sh root";
        # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
        # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
        # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
    }

    startup {
        # wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
        wfc-timeout 30;
        degr-wfc-timeout 30;
        outdated-wfc-timeout 30;
    }

    options {
        # cpu-mask on-no-data-accessible
    }

    disk {
        # size on-io-error fencing disk-barrier disk-flushes
        # disk-drain md-flushes resync-rate resync-after al-extents
        # c-plan-ahead c-delay-target c-fill-target c-max-rate
        # c-min-rate disk-timeout
        on-io-error detach;
    }

    net {
        # protocol timeout max-epoch-size max-buffers unplug-watermark
        # connect-int ping-int sndbuf-size rcvbuf-size ko-count
        # allow-two-primaries cram-hmac-alg shared-secret after-sb-0pri
        # after-sb-1pri after-sb-2pri always-asbp rr-conflict
        # ping-timeout data-integrity-alg tcp-cork on-congestion
        # congestion-fill congestion-extents csums-alg verify-alg
        # use-rle
        protocol C;
        cram-hmac-alg sha1;
        shared-secret "bqrPnf9";
    }

}

 

cat /etc/drbd.d/mfs.res

resource mfs{
        device /dev/drbd0;
        meta-disk internal;

        on node1{
            disk /dev/vgdrbr/mfs;
            address 172.16.0.41:9876;
        }

        on node2{
            disk /dev/vgdrbr/mfs;
            address 172.16.0.42:9876;
        }
}

 

從node1 復制 /etc/drbd.d/*  到node2   /etc/drbd.d/

 

創建DRBD資源

drbdadm create-md mfs

initializing activity log
NOT initializing bitmap
Writing meta data...
New drbd meta data block successfully created.

 

啟動DRBD服務(node1 node2)

systemctl enable drbd

systemctl start drbd

 

查看DRBD初始化狀態:

drbd-overview

 

配置node1節點為主節點

drbdadm primary mfs

 

如果報錯,執行:

drbdadm primary --force mfs    # 不行重啟下drbd服務,再執行

drbdadm --overwrite-data-of-peer primary all

 

此時用命令  drbd-overview  或cat /proc/drbd  可以看到開始同步數據了

drbdsetup status mfs --verbose --statistics  // 查看詳細情況

 

node1節點上格式化drbd設備

mkfs -t xfs /dev/drbd0   或 mkfs.xfs /dev/drbd0

 

節點上測試mount設備

mkdir -p /data/drbd

mount /dev/drbd0 /data/drbd

 

node2上需要創建 掛載相同的掛載目錄 mkdir -p /data/drbd

node2上無需創建DRBD資源及格式化drbd設備,同步完后,node2上的數據跟node1上是一致的,已經是創建資源並格式化好的。

 

 

安裝MFS Master + metalogger

yum -y install moosefs-master moosefs-cli moosefs-cgi moosefs-cgiserv moosefs-metalogger

 

配置MFS Master 、MFSmetalogger

 /etc/mfs/mfsmaster.cfg

grep -v "^#" mfsmaster.cfg

WORKING_USER = mfs

WORKING_GROUP = mfs

SYSLOG_IDENT = mfsmaster

LOCK_MEMORY = 0

 

NICE_LEVEL = -19


DATA_PATH = /data/drbd/mfs

EXPORTS_FILENAME = /etc/mfs/mfsexports.cfg

TOPOLOGY_FILENAME = /etc/mfs/mfstopology.cfg

BACK_LOGS = 50

 

BACK_META_KEEP_PREVIOUS = 1

 

MATOML_LISTEN_HOST = *

MATOML_LISTEN_PORT = 9419

 

MATOCS_LISTEN_HOST = *

MATOCS_LISTEN_PORT = 9420

# chunkserver 與 master之間的認證
AUTH_CODE = mfspassword

 

REPLICATIONS_DELAY_INIT = 300


CHUNKS_LOOP_MAX_CPS = 100000

CHUNKS_LOOP_MIN_TIME = 300

CHUNKS_SOFT_DEL_LIMIT = 10

CHUNKS_HARD_DEL_LIMIT = 25

CHUNKS_WRITE_REP_LIMIT = 2

CHUNKS_READ_REP_LIMIT = 10

 

MATOCL_LISTEN_HOST = *

MATOCL_LISTEN_PORT = 9421


SESSION_SUSTAIN_TIME = 86400

 

/etc/mfs/mfsmetalogger.cfg

grep -v "^#" mfsmetalogger.cfg

WORKING_USER = mfs

WORKING_GROUP = mfs

SYSLOG_IDENT = mfsmetalogger

LOCK_MEMORY = 0



NICE_LEVEL = -19


DATA_PATH = /var/lib/mfs

BACK_LOGS = 50

BACK_META_KEEP_PREVIOUS = 3

META_DOWNLOAD_FREQ = 24


MASTER_RECONNECTION_DELAY = 5


MASTER_HOST = mfsmaster

MASTER_PORT = 9419

MASTER_TIMEOUT = 10

 

/etc/mfs/mfsexports.cfg  權限配置

grep -v "^#" mfsexports.cfg

*            /    rw,alldirs,admin,maproot=0:0,password=9WpV9odJ
* . rw
password值是MFS客戶連接時的認證密碼

 

把node1 /etc/mfs目錄下  mfsexports.cfg  mfsmaster.cfg  mfsmetalogger.cfg  mfstopology.cfg 文件同步到 node2 的/etc/mfs 目錄下 

 

創建metadata存儲目錄:

 

mkdir -p /data/drbd/mfs
cp /var/lib/mfs/metadata.mfs.empty /data/drbd/mfs/metadata.mfs
chown -R mfs.mfs /data/drbd/mfs

 

 

啟動mfsmaster(node2上不用啟,在node1故障里通過keepalived腳本來啟動node2的mfsmaster)

mfsmaster start

 

啟動MFS監控服務

chmod 755 /usr/share/mfscgi/*.cgi                   # node1 node2都確定有可執行權限

mfscgiserv start  或   systemctl start moosefs-cgiserv

用瀏覽器訪問:http://172.16.0.41:9425

 

 

安裝keepalived (node1 node2)

yum -y install keepalived

 

配置keepalived

node1:

添加腳本

發郵件腳本:

cat /etc/keepalived/script/mail_notify.py

#!/usr/bin/env python
# -*- coding:utf-8 -*-

import smtplib
from email.mime.text import MIMEText
from email.header import Header
import sys, time, subprocess, random



# 第三方 SMTP 服務
mail_host="smtp.qq.com"  #設置服務器

userinfo_list = [{'user':'user1@qq.com','pass':'pass1'}, {'user':'user2@qq.com','pass':'pass2'}, {'user':'user3@qq.com','pass':'pass3'}]

 
         

user_inst = userinfo_list[random.randint(0, len(userinfo_list)-1)]
mail_user=user_inst['user'] #用戶名
mail_pass=user_inst['pass'] #口令


sender = mail_user # 郵件發送者


receivers = ['xx1@qq.com', 'xx2@163.com']  # 接收郵件,可設置為你的QQ郵箱或者其他郵箱


p = subprocess.Popen('hostname', shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
hostname = p.stdout.readline().split('\n')[0]

message_to = ''
for i in receivers:
    message_to += i + ';'

def print_help():
    note = '''python script.py role ip vip
    '''
    print(note)
    exit(1)

time_stamp = time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time()))

if len(sys.argv) != 4:
    print_help()
elif sys.argv[1] == 'master':
    message_content = '%s server: %s(%s) keepalived change to Master, vIP: %s' %(time_stamp, sys.argv[2], hostname, sys.argv[3])
    subject = '%s keepalived change to Master -- keepalived notify' %(sys.argv[2])
elif sys.argv[1] == 'backup':
    message_content = '%s server: %s(%s) keepalived change to Backup, vIP: %s' %(time_stamp, sys.argv[2], hostname, sys.argv[3])
    subject = '%s keepalived change to Backup -- keepalived notify' %(sys.argv[2])
elif sys.argv[1] == 'stop':
    message_content = '%s server: %s(%s) keepalived change to Stop, vIP: %s' %(time_stamp, sys.argv[2], hostname, sys.argv[3])
    subject = '%s keepalived change to Stop -- keepalived notify' %(sys.argv[2])
else:
    print_help()

message = MIMEText(message_content, 'plain', 'utf-8')
message['From'] = Header(sender, 'utf-8')
message['To'] =  Header(message_to, 'utf-8')

message['Subject'] = Header(subject, 'utf-8')

try:
    smtpObj = smtplib.SMTP()
    smtpObj.connect(mail_host, 25)    # 25 為 SMTP 端口號
    smtpObj.login(mail_user,mail_pass)
    smtpObj.sendmail(sender, receivers, message.as_string())
    print("郵件發送成功")
except smtplib.SMTPException as e:
    print("Error: 無法發送郵件")
    print(e)

 

DRBD檢測腳本:

cat /etc/keepalived/script/check_drbd.sh

 

#!/bin/bash

# set basic parameter
drbd_res=mfs
drbd_mountpoint=/mfs/drbd


status="ok"
#ret=`ps -C mfsmaster --no-header |wc -l`
ret=`pidof mfsmaster |wc -l`

if [ $ret -eq 0 ]; then
   status="mfsmaster not running"
   umount $drbd_mountpoint
   drbdadm secondary $drbd_res
   mfscgiserv stop
   /bin/python /etc/keepalived/script/mail_notify.py stop 172.16.0.41 172.16.0.47
   systemctl stop keepalived
fi

echo $status

 

keepalived切換為master腳本:

cat /etc/keepalived/script/master.sh

#!/bin/bash

drbdadm primary mfs
mount /dev/drbd0 /data/drbd mfsmaster start
mfscgiserv start

 

 chmod +x /etc/keepalived/script/*.sh

從node1 復制 mail_notify.py 到 node2 /etc/keepalived/script

 

cat /etc/keepalived/keepalived.conf

! Configuration File for keepalived

 
global_defs {

    notification_email {
        xx@xx.com
    }
    
    notification_email_from keepalived@xx.com
    smtp_server 127.0.0.1
    smtp_connect_timeout 30
    
    router_id node1_mfs_master      # 標識本節點的字條串,通常為hostname,但不一定非得是hostname。故障發生時,郵件通知會用到
}

vrrp_script check_drbd {
    script "/etc/keepalived/script/check_drbd.sh"
    interval 3     # check every 3 seconds
#    weight -40      # if failed, decrease 40 of the priority
#    fall   2        # require 2 failures for failures
#    rise   1        # require 1 sucesses for ok
}

# net.ipv4.ip_nonlocal_bind=1
vrrp_instance VI_MFS {

    state BACKUP
    interface ens160
    virtual_router_id 16
    #mcast_src_ip 172.16.0.41
    
    nopreempt    ## 當node2 keepalived 選舉為 MASTER 時,node1 keepalived重啟,優先級比node2高也不會搶占MASTER角色,state 都需為 BACKUP,只在優先級高的設置 nopreempt
    priority 100
    advert_int 1
    #debug

    authentication {
        auth_type PASS
        auth_pass O7F3CjHVXWP
    }

    virtual_ipaddress {
        172.16.0.47
    }

    track_script {
        check_drbd
    }

}

 

 systemctl start keepalived

 systemctl disable keepalived

systemctl enable moosefs-metalogger;  systemctl start moosefs-metalogger

 

node2:

keepalived切換為master腳本:

cat /etc/keepalived/script/master.sh

#!/bin/bash

# set basic parameter
drbd_res=mfs
drbd_driver=/dev/drbd0
drbd_mountpoint=/mfs/drbd


drbdadm primary $drbd_res
mount $drbd_driver $drbd_mountpoint
mfsmaster start
mfscgiserv start


/bin/python /etc/keepalived/script/mail_notify.py master 172.16.0.42 172.16.0.47

 

 

keepalived切換為backup腳本:

cat /etc/keepalived/script/backup.sh

#!/bin/bash

# set basic parameter
drbd_res=mfs
drbd_mountpoint=/mfs/drbd


mfsmaster stop
umount $drbd_mountpoint
drbdadm secondary $drbd_res
mfscgiserv stop


/bin/python /etc/keepalived/script/mail_notify.py backup 172.16.0.42 172.16.0.47

 

 

cat /etc/keepalived/keepalived.conf

! Configuration File for keepalived

 
global_defs {

    notification_email {
        xx@xx.com
    }
    
    notification_email_from keepalived@xx.com
    smtp_server 127.0.0.1
    smtp_connect_timeout 30
    
    router_id node2_mfs_backup
}


# net.ipv4.ip_nonlocal_bind=1
vrrp_instance mfs {

    state BACKUP
    interface ens160
    virtual_router_id 16
    #mcast_src_ip 172.16.0.42
    
    priority 80
    advert_int 1
    #debug

    authentication {
        auth_type PASS
        auth_pass O7F3CjHVXWP
    }

    virtual_ipaddress {
        172.16.0.47
    }

    notify_master "/etc/keepalived/script/master.sh"
    notify_backup "/etc/keepalived/script/backup.sh"
}

 systemctl start keepalived

 systemctl disable keepalived

systemctl enable moosefs-metalogger;  systemctl start moosefs-metalogger

 

注意:node1 node2 的mfsmaster、keepalived服務不要設置成開機啟動 

安裝MFS Chunk servers

yum -y install moosefs-chunkserver

 

配置MFS Chunk servers (node3 node4 node5)

/etc/mfs/mfschunkserver.cfg

grep -v "^#"  /etc/mfs/mfschunkserver.cfg

WORKING_USER = mfs

WORKING_GROUP = mfs

SYSLOG_IDENT = mfschunkserver

LOCK_MEMORY = 0

 

NICE_LEVEL = -19

 

DATA_PATH = /var/lib/mfs

HDD_CONF_FILENAME = /etc/mfs/mfshdd.cfg

HDD_TEST_FREQ = 10

 

BIND_HOST = *

MASTER_HOST = mfsmaster

MASTER_PORT = 9420


MASTER_TIMEOUT = 60

MASTER_RECONNECTION_DELAY = 5

# authentication string (used only when master requires authorization)

AUTH_CODE = mfspassword


CSSERV_LISTEN_HOST = *

CSSERV_LISTEN_PORT = 9422

 

/etc/mfs/mfshdd.cfg  指定chunk server的硬盤驅動

mkdir -p /data/mfs; chown -R mfs:mfs /data/mfs

grep -v "^#" /etc/mfs/mfshdd.cfg

mfschunk保存數據的路徑,建議使用單獨的LVM 邏輯卷

/data/mfs

 

systemctl enable moosefs-chunkserver;  systemctl start moosefs-chunkserver

手動啟動命令 mfschunkserver start

可通過MFS的監控頁面查看是否連接MFS MASTER成功

 

MFS客戶端

安裝MFS客戶端

yum -y install moosefs-client fuse

 

MFS客戶端重啟自動掛載mfs目錄

Shell> vi /etc/rc.local  
/sbin/modprobe fuse  
/usr/bin/mfsmount /mnt1 -H mfsmaster -S /backup/db  
/usr/bin/mfsmount /mnt2 -H mfsmaster -S /app/image

mfsmount -H mfsmaster /mnt3 # 掛載根目錄到 /mnt3
mfsmount -H 主機 -P 端口 -p 認證密碼 掛載點路徑
mfsmount -H 主機 -P 端口 -o mfspassword=PASSWORD 掛載點路徑

 

通過/etc/fstab的方式(建議使用該方法)

mfsmaster使用主機名的話,需要本機可以解析,可以設置/etc/hosts

Shell> vi /etc/fstab  
 
mfsmount /mnt fuse mfsmaster=MASTER_IP,mfsport=9421,_netdev 0 0 (重啟系統后掛載MFS的根目錄)    
mfsmount /mnt2 fuse mfstermaster=MASTER_IP,mfsport=9421,mfssubfolder=/subdir,_netdev 0 0(重啟系統后掛載MFS的子目錄)

mfsmount /data/upload    fuse    mfsmaster=mfsmaster,mfsport=9421,mfssubfolder=/pro1,mfspassword=9WpV9odJ,_netdev 0 0(使用密碼認證)



## _netdev:當網絡可用時才進行掛載,避免掛載失敗

 采用fstab配置文件掛載方式可以通過如下命令,測試是否配置正確,並可把fstab中mfsmount 進行掛載:

mount -a -t fuse 

 取消掛載,操作時工作目錄不能在掛載點/mnt:

umount /mnt

 

查看掛載情況

df -h -T

 

 

 附:

MFS master切換測試

node1:

停止 mfsmaster服務

mfsmaster stop

查看node2是否有接管vip

df -h -T 查看磁盤掛載情況

cat /proc/drbd  #查看 DRBD狀態

drbd-overview    #  查看DRBD詳情

告警郵件是否有收到

 

若node2 keepalived為master,node1要再切回master,先確定node1 node2 drbd狀態已經同步一致,使用drbd-overview可查看

若一致時,node2上執行 /etc/keepalived/script/backup.sh腳本,然后關閉 keepalived服務, 

node1執行 /etc/keepalived/script/master.sh 腳本,然后啟動 keepalived服務

node2 啟動 keepalived服務

 

 

使用metalogger恢復數據

metalogger節點不是必要的,但是在master掛掉的時候,可以用來恢復master,十分有用!
如果master無法啟動,使用 mfsmetarestore -a 進行修復,如果不能修復,則拷貝metalogger上的備份日志到master上,然后進行恢復!

 

查看drbd狀態

cat /proc/drbd

version: 8.4.9-1 (api:1/proto:86-101)
GIT-hash: 9976da086367a2476503ef7f6b13d4567327a280 build by akemi@Build64R7, 2016-12-04 01:08:48
0: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r-----
ns:0 nr:4865024 dw:4865024 dr:0 al:8 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:58046556
[>...................] sync'ed: 7.8% (56684/61436)M
finish: 0:23:35 speed: 41,008 (33,784) want: 41,000 K/sec



第一行為:drbd的版本信息
第二行為:編譯的信息
重點第三行:
開頭"0:"表示 設備/dev/drbd0
開頭"1:"表示 設備/dev/data-1-1
"cs(connection state):"  連接狀態  Connected連接
"ro(roles):"       角色  Primary/Secondary   一個主一個備,正常狀態應該屬於這樣。
"ds:(disk staes)"     硬盤狀態  UpToDate/UpToDate
"ns(network send)"      通過網絡連接把數據發送到對端 volume of net data sent to the parther via the network connection

"nr(network receive:)"   網絡接收,收到的數據
"dw(disk write)"               硬盤的寫
"dr(disk read)"                   硬盤的讀

 

MFS安裝

 Add the appropriate key to package manager:

curl "http://ppa.moosefs.com/RPM-GPG-KEY-MooseFS" > /etc/pki/rpm-gpg/RPM-GPG-KEY-MooseFS

 

Next you need to add the repository entry (MooseFS 3.0):

  • For EL7 family:
    # curl "http://ppa.moosefs.com/MooseFS-3-el7.repo" > /etc/yum.repos.d/MooseFS.repo
  • For EL6 family:
    # curl "http://ppa.moosefs.com/MooseFS-3-el6.repo" > /etc/yum.repos.d/MooseFS.repo

For MoosefS 2.0, use:

  • For EL7 family:
    # curl "http://ppa.moosefs.com/MooseFS-2-el7.repo" > /etc/yum.repos.d/MooseFS.repo
  • For EL6 family:
    # curl "http://ppa.moosefs.com/MooseFS-2-el6.repo" > /etc/yum.repos.d/MooseFS.repo

After those operations it should be possible to install the packages with following commands:

  • For Master Server:
    # yum install moosefs-master moosefs-cli moosefs-cgi moosefs-cgiserv
  • For Chunkservers:
    # yum install moosefs-chunkserver
  • For Metaloggers:
    # yum install moosefs-metalogger
  • For Clients:
    # yum install moosefs-client

    If you want MooseFS to be mounted automatically when system starts, first of all install File System in Userspace (FUSE) utilities:
    # yum install fuse
    and then add one of the following entries to your /etc/fstab:

    "classic" entry (works with all MooseFS 3.0 and 2.0 verisons):
    mfsmount                /mnt/mfs    fuse       defaults    0 0

    or "NFS-like" entry (works with MooseFS 3.0.75+):
    mfsmaster.host.name:    /mnt/mfs    moosefs    defaults    0 0

Running the system

    • To start process manually:
      # mfsmaster start
      # mfschunkserver start
    • For systemd OS family - EL7:
      # systemctl start moosefs-master.service
      # systemctl start moosefs-chunkserver.service
    • For SysV OS family - EL6:
      # service moosefs-master start 
      # service moosefs-chunkserver start

 

moosefs-master.service修復

默認的moosefs-master.service 啟動腳本啟動超時不成功,方法:注釋掉 PIDFile=/var/lib/mfs/.mfsmaster.lock 這行

cat /usr/lib/systemd/system/moosefs-master.service

[Unit]
Description=MooseFS Master server
Wants=network-online.target
After=network.target network-online.target

[Service]
Type=forking
ExecStart=/usr/sbin/mfsmaster start
ExecStop=/usr/sbin/mfsmaster stop
ExecReload=/usr/sbin/mfsmaster reload
#PIDFile=/var/lib/mfs/.mfsmaster.lock
TimeoutStopSec=60
TimeoutStartSec=60
Restart=no

[Install]
WantedBy=multi-user.target

 

drbdadm create-md mfs時報  'mfs' not defined in your config (for this host)錯

主要問題:主機名與  /etc/drbd.d/mfs.res 中定義的不一樣,改成一樣即可

 

多個業務需要連接MFS建議

多個業務需要連接MFS,建議先用一台MFS客戶端掛載MFS / 目錄,然后創建好相應業務對方的目錄,每業務客戶機再掛載相應業務的子目錄。

 

 

MFS文件系統使用

Client通過MFS軟件提供的工具來管理MFS文件系統,下面是工具介紹

/usr/local/mfs/bin/mfstools -h
mfs multi tool

usage:
        mfstools create - create symlinks (mfs<toolname> -> /usr/local/mfs/bin/mfstools)

tools:
        mfsgetgoal                               // 設定副本數 mfssetgoal // 獲取副本數 mfsgettrashtime // 設定回收站時間 mfssettrashtime // 設定回收站時間 mfscheckfile // 檢查文件 mfsfileinfo // 文件信息 mfsappendchunks mfsdirinfo // 目錄信息 mfsfilerepair // 文件修復 mfsmakesnapshot // 快照 mfsgeteattr // 設置權限 mfsseteattr mfsdeleattr deprecated tools: // 遞歸設置 mfsrgetgoal = mfsgetgoal -r mfsrsetgoal = mfssetgoal -r mfsrgettrashtime = mfsgettreshtime -r mfsrsettrashtime = mfssettreshtime -r 

掛載文件系統

MooseFS 文件系統利用下面的命令:

mfsmount mountpoint [-d][-f] [-s][-m] [-n][-p] [-HMASTER][-PPORT] [-S PATH][-o OPT[,OPT...]]
-H MASTER:是管理服務器(master server)的ip 地址
-P PORT: 是管理服務器( master server)的端口號,要按照mfsmaster.cfg 配置文件中的變量 MATOCU_LISTEN_POR 的之填寫。如果master serve 使用的是默認端口號則不用指出。 -S PATH:指出被掛接mfs 目錄的子目錄,默認是/目錄,就是掛載整個mfs 目錄。 

Mountpoint:是指先前創建的用來掛接mfs 的目錄。

在開始mfsmount 進程時,用一個-m 或-o mfsmeta 的選項,這樣可以掛接一個輔助的文件系統

MFSMETA,這么做的目的是對於意外的從MooseFS 卷上刪除文件或者是為了釋放磁盤空間而移動的

文件而又此文件又過去了垃圾文件存放期的恢復,例如:

/usr/local/mfs/bin/mfsmount -m /MFS_meta/ -H 172.16.18.137 

設定副本數量

目標(goal),是指文件被拷貝副本的份數,設定了拷貝的份數后是可以通過mfsgetgoal 命令來證實的,也可以通過mfsrsetgoal 來改變設定。

mfssetgoal 3 /MFS_data/test/ 
mfssetgoal 3 /MFS_data/test/

用 mfsgetgoal –r 和 mfssetgoal –r 同樣的操作可以對整個樹形目錄遞歸操作,其等效於 mfsrsetgoal 命令。實際的拷貝份數可以通過 mfscheckfile 和 mfsfile info 命令來證實。

注意以下幾種特殊情況:

  • 一個不包含數據的零長度的文件,盡管沒有設置為非零的目標(the non-zero “goal”),但用mfscheckfile 命令查詢將返回一個空的結果;將文件填充內容后,其會根據設置的goal創建副本;這時再將文件清空,其副本依然作為空文件存在。
  • 假如改變一個已經存在的文件的拷貝個數,那么文件的拷貝份數將會被擴大或者被刪除,這個過程會有延時。可以通過mfscheckfile 命令來證實。
  • 對一個目錄設定“目標”,此目錄下的新創建文件和子目錄均會繼承此目錄的設定,但不會改變已經存在的文件及目錄的拷貝份數。

可以通過mfsdirinfo來查看整個目錄樹的信息摘要。

垃圾回收站

一個被刪除文件能夠存放在一個“ 垃圾箱”的時間就是一個隔離時間, 這個時間可以用 mfsgettrashtime 命令來驗證,也可以使用`mfssettrashtime 命令來設置。

mfssettrashtime 64800 /MFS_data/test/test1 
mfsgettrashtime /MFS_data/test/test1

時間的單位是秒(有用的值有:1 小時是3600 秒,24 - 86400 秒,1天 - 604800 秒)。就像文件被存儲的份數一樣, 為一個目錄設定存放時間是要被新創建的文件和目錄所繼承的。數字0 意味着一個文件被刪除后, 將立即被徹底刪除,在想回收是不可能的。

刪除文件可以通過一個單獨安裝MFSMETA 文件系統。特別是它包含目錄/ trash (包含任然可以被還原的被刪除文件的信息)和/ trash/undel (用於獲取文件)。只有管理員有權限訪問MFSMETA(用戶的uid 0,通常是root)。

/usr/local/mfs/bin/mfsmount -m /MFS_meta/ -H 172.16.18.137 

被刪文件的文件名在“垃圾箱”目錄里還可見,文件名由一個八位十六進制的數i-node 和被刪文件的文件名組成,在文件名和i-node 之間不是用“/”,而是用了“|”替代。如果一個文件名的長度超過操作系統的限制(通常是255 個字符),那么部分將被刪除。通過從掛載點起全路徑的文件名被刪除的文件任然可以被讀寫。

移動這個文件到trash/undel 子目錄下,將會使原始的文件恢復到正確的MooseFS 文件系統上路徑下(如果路徑沒有改變)。如果在同一路徑下有個新的同名文件,那么恢復不會成功。

從“垃圾箱”中刪除文件結果是釋放之前被它站用的空間(刪除有延遲,數據被異步刪除)。

在MFSMETA中還有另一個目錄reserved,該目錄內的是被刪除但依然打開的文件。在用戶關閉了這些被打開的文件后,reserved 目錄中的文件將被刪除,文件的數據也將被立即刪除。在reserved 目錄中文件的命名方法同trash 目錄中的一樣,但是不能有其他功能的操作。

快照snapshot

MooseFS 系統的另一個特征是利用mfsmakesnapshot 工具給文件或者是目錄樹做快照

mfsmakesnapshot source ... destination

Mfsmakesnapshot 是在一次執行中整合了一個或是一組文件的拷貝,而且任何修改這些文件的源文件都不會影響到源文件的快照, 就是說任何對源文件的操作,例如寫入源文件,將不會修改副本(或反之亦然)。

也可以使用mfsappendchunks:

mfsappendchunks destination-file source-file ...

當有多個源文件時,它們的快照被加入到同一個目標文件中(每個chunk 的最大量是chunk)。

MFS集群維護

啟動MFS集群

安全的啟動MooseFS 集群(避免任何讀或寫的錯誤數據或類似的問題)的方式是按照以下命令步驟:

  1. 啟動mfsmaster 進程
  2. 啟動所有的mfschunkserver 進程
  3. 啟動mfsmetalogger 進程(如果配置了mfsmetalogger)
  4. 當所有的chunkservers 連接到MooseFS master 后,任何數目的客戶端可以利用mfsmount 去掛接被export 的文件系統。(可以通過檢查master 的日志或是CGI 監視器來查看是否所有的chunkserver被連接)。

停止MFS集群

安全的停止MooseFS 集群:

  1. 在所有的客戶端卸載MooseFS 文件系統(用umount 命令或者是其它等效的命令)
  2. 用mfschunkserver stop 命令停止chunkserver 進程
  3. 用mfsmetalogger stop 命令停止metalogger 進程
  4. 用mfsmaster stop 命令停止master 進程

Chunkservers 的維護

若每個文件的goal(目標)都不小於2,並且沒有under-goal 文件(這些可以用mfsgetgoal –r和mfsdirinfo 命令來檢查),那么一個單一的chunkserver 在任何時刻都可能做停止或者是重新啟動。以后每當需要做停止或者是重新啟動另一個chunkserver 的時候,要確定之前的chunkserver 被連接,而且要沒有under-goal chunks。

MFS元數據備份

通常元數據有兩部分的數據:

  • 主要元數據文件metadata.mfs,當mfsmaster 運行的時候會被命名為metadata.mfs.back
  • 元數據改變日志changelog.*.mfs,存儲了過去的N 小時的文件改變(N 的數值是由BACK_LOGS參數設置的,參數的設置在mfschunkserver.cfg 配置文件中)。

主要的元數據文件需要定期備份,備份的頻率取決於取決於多少小時changelogs 儲存。元數據changelogs 實時的自動復制。1.6版本中這個工作都由metalogger完成。

MFS Master的恢復

一旦mfsmaster 崩潰(例如因為主機或電源失敗),需要最后一個元數據日志changelog 並入主要的metadata 中。這個操作時通過 mfsmetarestore 工具做的,最簡單的方法是:

mfsmetarestore -a

如果master 數據被存儲在MooseFS 編譯指定地點外的路徑,則要利用-d 參數指定使用,如:

mfsmetarestore -a -d /opt/mfsmaster

從MetaLogger中恢復Master

如果mfsmetarestore -a無法修復,則使用metalogger也可能無法修復,暫時沒遇到過這種情況,這里不暫不考慮。

  1. 找回metadata.mfs.back 文件,可以從備份中找,也可以中metalogger 主機中找(如果啟動了metalogger 服務),然后把metadata.mfs.back 放入data 目錄,一般為{prefix}/var/mfs
  2. 從在master 宕掉之前的任何運行metalogger 服務的服務器上拷貝最后metadata 文件,然后放入mfsmaster 的數據目錄。
  3. 利用mfsmetarestore 命令合並元數據changelogs,可以用自動恢復模式mfsmetarestore –a,也可以利用非自動化恢復模式
mfsmetarestore -m metadata.mfs.back -o metadata.mfs changelog_ml.*.mfs 

或:強制使用metadata.mfs.back創建metadata.mfs,可以啟動master,但丟失的數據暫無法確定。

Automated Failover

生產環境使用 MooseFS 時,需要保證 master 節點的高可用。 使用 ucarp 是一種比較成熟的方案,或者 DRBD+[hearbeat|keepalived] 。 ucarp 類似於 keepalived ,通過主備服務器間的健康檢查來發現集群狀態,並執行相應操作。另外 MooseFS商業版本已經支持雙主配置,解決單點故障。

 

 

moosefs-chunkserver升級

systemctl stop moosefs-chunkserver

yum -y update  moosefs-chunkserver

systemctl start moosefs-chunkserver   #啟動異常

 

mfschunkserver -u
open files limit has been set to: 16384
working directory: /var/lib/mfs
config: using default value for option 'FILE_UMASK' - '23'
lockfile created and locked
config: using default value for option 'LIMIT_GLIBC_MALLOC_ARENAS' - '4'
setting glibc malloc arena max to 4
setting glibc malloc arena test to 4
config: using default value for option 'DISABLE_OOM_KILLER' - '1'
initializing mfschunkserver modules ...
config: using default value for option 'HDD_LEAVE_SPACE_DEFAULT' - '256MiB'
hdd space manager: data folder '/data/mfs/' already locked (used by another process)
hdd space manager: no hdd space defined in /etc/mfs/mfshdd.cfg file
init: hdd space manager failed !!!
error occurred during initialization - exiting

 

 

ps -ef |grep mfs
mfs 51699 1 1 2017 ? 1-15:27:28 /usr/sbin/mfschunkserver start

 

把這個進程結束掉

刪除PID文件:rm -rf   /var/lib/mfs/.mfschunkserver.lock


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM