本節內容
為什么要做監控?
常用監控系統設計討論
監控系統架構設計
監控表結構設計
為什么要做監控?
–熟悉IT監控系統的設計原理
–開發一個簡版的類Zabbix監控系統
–掌握自動化開發項目的程序設計思路及架構解藕原則
常用監控系統設計討論
Zabbix
Nagios
監控系統需求討論
1.可監控常用系統服務、應用、網絡設備等
2.一台主機上可監控多個不同服務、不同服務的監控間隔可不同
3.同一個服務在不同主機上的監控間隔、報警閾值可不同
4.可以批量的給一批主機添加、刪除、修改要監控的服務
5.告警級別:
- 不同的服務 因為業務重要程度不同,如果出了問題可以設置不同的報警級別
- 可以指定特定的服務或告警級別的事件通知給特定的用戶
- 告警的升級設定
6.歷史數據 的存儲和優化
- 實現用最少的空間占用量存儲最多的有效數據
- 如何做到1s中之內取出一台主機上所有服務的5年的監控數據?
7. 數據可視化,如何做出簡潔美觀的用戶界面?
8.如何實現單機支持5000+機器監控需求?
9.采取何種通信方式?主動、被動?
10.如何實現監控服務器的水平擴展?
采用什么架構?
•Mysql
•主動通信? Snmp,wget…
•被動通信?Agent ---how to communicate with the monitor server
•Socket server –> Sockect client
•能否用現成的c/s架構? Rabbit mq, redis 訂閱發布, http ?
采用HTTP好處
1.接口設計簡單
2.容易水平擴展做分布式
3.Socket穩定成熟,省去較多的通信維護精力
Http特性:
1.短連接
2.無狀態
3.安全認證
4.被動通信
監控系統架構設計
表結構設計

1 #!_*_coding:utf8_*_ 2 from django.db import models 3 4 # Create your models here. 5 6 7 8 9 class Host(models.Model): 10 name = models.CharField(max_length=64,unique=True) 11 ip_addr = models.GenericIPAddressField(unique=True) 12 host_groups = models.ManyToManyField('HostGroup',blank=True) # A B C 13 templates = models.ManyToManyField("Template",blank=True) # A D E 14 monitored_by_choices = ( 15 ('agent','Agent'), 16 ('snmp','SNMP'), 17 ('wget','WGET'), 18 ) 19 monitored_by = models.CharField(u'監控方式',max_length=64,choices=monitored_by_choices) 20 status_choices= ( 21 (1,'Online'), 22 (2,'Down'), 23 (3,'Unreachable'), 24 (4,'Offline'), 25 ) 26 status = models.IntegerField(u'狀態',choices=status_choices,default=1) 27 memo = models.TextField(u"備注",blank=True,null=True) 28 29 def __unicode__(self): 30 return self.name 31 32 class HostGroup(models.Model): 33 name = models.CharField(max_length=64,unique=True) 34 templates = models.ManyToManyField("Template",blank=True) 35 memo = models.TextField(u"備注",blank=True,null=True) 36 def __unicode__(self): 37 return self.name 38 39 class ServiceIndex(models.Model): 40 name = models.CharField(max_length=64) 41 key =models.CharField(max_length=64) 42 data_type_choices = ( 43 ('int',"int"), 44 ('float',"float"), 45 ('str',"string") 46 ) 47 data_type = models.CharField(u'指標數據類型',max_length=32,choices=data_type_choices,default='int') 48 memo = models.CharField(u"備注",max_length=128,blank=True,null=True) 49 def __unicode__(self): 50 return "%s.%s" %(self.name,self.key) 51 52 class Service(models.Model): 53 name = models.CharField(u'服務名稱',max_length=64,unique=True) 54 interval = models.IntegerField(u'監控間隔',default=60) 55 plugin_name = models.CharField(u'插件名',max_length=64,default='n/a') 56 items = models.ManyToManyField('ServiceIndex',verbose_name=u"指標列表",blank=True) 57 memo = models.CharField(u"備注",max_length=128,blank=True,null=True) 58 59 def __unicode__(self): 60 return self.name 61 #def get_service_items(obj): 62 # return ",".join([i.name for i in obj.items.all()]) 63 64 class Template(models.Model): 65 name = models.CharField(u'模版名稱',max_length=64,unique=True) 66 services = models.ManyToManyField('Service',verbose_name=u"服務列表") 67 triggers = models.ManyToManyField('Trigger',verbose_name=u"觸發器列表",blank=True) 68 def __unicode__(self): 69 return self.name 70 ''' 71 class TriggerExpression(models.Model): 72 name = models.CharField(u"觸發器表達式名稱",max_length=64,blank=True,null=True) 73 service = models.ForeignKey(Service,verbose_name=u"關聯服務") 74 service_index = models.ForeignKey(ServiceIndex,verbose_name=u"關聯服務指標") 75 logic_type_choices = (('or','OR'),('and','AND')) 76 logic_type = models.CharField(u"邏輯關系",choices=logic_type_choices,max_length=32,blank=True,null=True) 77 left_sibling = models.ForeignKey('self',verbose_name=u"左邊條件",blank=True,null=True,related_name='left_sibling_condition' ) 78 operator_type_choices = (('eq','='),('lt','<'),('gt','>')) 79 operator_type = models.CharField(u"運算符",choices=operator_type_choices,max_length=32) 80 data_calc_type_choices = ( 81 ('avg','Average'), 82 ('max','Max'), 83 ('hit','Hit'), 84 ('last','Last'), 85 ) 86 data_calc_func= models.CharField(u"數據處理方式",choices=data_calc_type_choices,max_length=64) 87 data_calc_args = models.CharField(u"函數傳入參數",help_text=u"若是多個參數,則用,號分開,第一個值是時間",max_length=64) 88 threshold = models.IntegerField(u"閾值") 89 90 def __unicode__(self): 91 return "%s %s(%s(%s))" %(self.service_index,self.operator_type,self.data_calc_func,self.data_calc_args) 92 ''' 93 94 95 class TriggerExpression(models.Model): 96 #name = models.CharField(u"觸發器表達式名稱",max_length=64,blank=True,null=True) 97 trigger = models.ForeignKey('Trigger',verbose_name=u"所屬觸發器") 98 service = models.ForeignKey(Service,verbose_name=u"關聯服務") 99 service_index = models.ForeignKey(ServiceIndex,verbose_name=u"關聯服務指標") 100 specified_index_key = models.CharField(verbose_name=u"只監控專門指定的指標key",max_length=64,blank=True,null=True) 101 operator_type_choices = (('eq','='),('lt','<'),('gt','>')) 102 operator_type = models.CharField(u"運算符",choices=operator_type_choices,max_length=32) 103 data_calc_type_choices = ( 104 ('avg','Average'), 105 ('max','Max'), 106 ('hit','Hit'), 107 ('last','Last'), 108 ) 109 data_calc_func= models.CharField(u"數據處理方式",choices=data_calc_type_choices,max_length=64) 110 data_calc_args = models.CharField(u"函數傳入參數",help_text=u"若是多個參數,則用,號分開,第一個值是時間",max_length=64) 111 threshold = models.IntegerField(u"閾值") 112 113 114 logic_type_choices = (('or','OR'),('and','AND')) 115 logic_type = models.CharField(u"與一個條件的邏輯關系",choices=logic_type_choices,max_length=32,blank=True,null=True) 116 #next_condition = models.ForeignKey('self',verbose_name=u"右邊條件",blank=True,null=True,related_name='right_sibling_condition' ) 117 def __unicode__(self): 118 return "%s %s(%s(%s))" %(self.service_index,self.operator_type,self.data_calc_func,self.data_calc_args) 119 class Meta: 120 pass #unique_together = ('trigger_id','service') 121 122 class Trigger(models.Model): 123 name = models.CharField(u'觸發器名稱',max_length=64) 124 #expressions= models.TextField(u"表達式") 125 severity_choices = ( 126 (1,'Information'), 127 (2,'Warning'), 128 (3,'Average'), 129 (4,'High'), 130 (5,'Diaster'), 131 ) 132 #expressions = models.ManyToManyField(TriggerExpression,verbose_name=u"條件表達式") 133 severity = models.IntegerField(u'告警級別',choices=severity_choices) 134 enabled = models.BooleanField(default=True) 135 memo = models.TextField(u"備注",blank=True,null=True) 136 137 def __unicode__(self): 138 return "<serice:%s, severity:%s>" %(self.name,self.get_severity_display()) 139 140 141 142 class Action(models.Model): 143 name = models.CharField(max_length=64,unique=True) 144 host_groups = models.ManyToManyField('HostGroup',blank=True) 145 hosts = models.ManyToManyField('Host',blank=True) 146 147 conditions = models.TextField(u'告警條件') 148 interval = models.IntegerField(u'告警間隔(s)',default=300) 149 operations = models.ManyToManyField('ActionOperation') 150 151 recover_notice = models.BooleanField(u'故障恢復后發送通知消息',default=True) 152 recover_subject = models.CharField(max_length=128,blank=True,null=True) 153 recover_message = models.TextField(blank=True,null=True) 154 155 enabled = models.BooleanField(default=True) 156 157 def __unicode__(self): 158 return self.name 159 160 class ActionOperation(models.Model): 161 name = models.CharField(max_length=64) 162 step = models.SmallIntegerField(u"第n次告警",default=1) 163 action_type_choices = ( 164 ('email','Email'), 165 ('sms','SMS'), 166 ('script','RunScript'), 167 ) 168 action_type = models.CharField(u"動作類型",choices=action_type_choices,default='email',max_length=64) 169 #notifiers= models.ManyToManyField(host_models.UserProfile,verbose_name=u"通知對象",blank=True) 170 def __unicode__(self): 171 return self.name 172 173 174 class Maintenance(models.Model): 175 name = models.CharField(max_length=64,unique=True) 176 hosts = models.ManyToManyField('Host',blank=True) 177 host_groups = models.ManyToManyField('HostGroup',blank=True) 178 content = models.TextField(u"維護內容") 179 start_time = models.DateTimeField() 180 end_time = models.DateTimeField() 181 182 def __unicode__(self): 183 return self.name 184 185 '''' 186 CPU 187 idle 80 188 usage 90 189 system 30 190 user 191 iowait 50 192 193 memory : 194 usage 195 free 196 swap 197 cache 198 buffer 199 200 load: 201 load1 202 load 5 203 load 15 204 '''