前言
利用jmx_exporter的方式對cassandra進行監控。
配置JavaAgent
cassandra 集群下的所有節點都要進行如下配置
-
上傳
下載並上傳jmx_prometheus_javaagent-0.12.0.jar安裝包到cassandra集群$CASSANDRA_HOME/lib/目錄下
下載地址:https://github.com/prometheus/jmx_exporter/blob/master/README.md
-
配置
1、增加配置文件cassandra-jmx.yml到cassandra集群 conf/ 目錄下

lowercaseOutputName: true lowercaseOutputLabelNames: true whitelistObjectNames: [ "org.apache.cassandra.metrics:type=ColumnFamily,name=RangeLatency,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=LiveSSTableCount,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=SSTablesPerReadHistogram,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=SpeculativeRetries,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableOnHeapSize,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableSwitchCount,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableLiveDataSize,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableColumnsCount,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableOffHeapSize,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterFalsePositives,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterFalseRatio,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterDiskSpaceUsed,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterOffHeapMemoryUsed,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=SnapshotsSize,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=TotalDiskSpaceUsed,*", "org.apache.cassandra.metrics:type=CQL,name=RegularStatementsExecuted,*", "org.apache.cassandra.metrics:type=CQL,name=PreparedStatementsExecuted,*", "org.apache.cassandra.metrics:type=Compaction,name=PendingTasks,*", "org.apache.cassandra.metrics:type=Compaction,name=CompletedTasks,*", "org.apache.cassandra.metrics:type=Compaction,name=BytesCompacted,*", "org.apache.cassandra.metrics:type=Compaction,name=TotalCompactionsCompleted,*", "org.apache.cassandra.metrics:type=ClientRequest,name=Latency,*", "org.apache.cassandra.metrics:type=ClientRequest,name=Unavailables,*", "org.apache.cassandra.metrics:type=ClientRequest,name=Timeouts,*", "org.apache.cassandra.metrics:type=Storage,name=Exceptions,*", "org.apache.cassandra.metrics:type=Storage,name=TotalHints,*", "org.apache.cassandra.metrics:type=Storage,name=TotalHintsInProgress,*", "org.apache.cassandra.metrics:type=Storage,name=Load,*", "org.apache.cassandra.metrics:type=Connection,name=TotalTimeouts,*", "org.apache.cassandra.metrics:type=ThreadPools,name=CompletedTasks,*", "org.apache.cassandra.metrics:type=ThreadPools,name=PendingTasks,*", "org.apache.cassandra.metrics:type=ThreadPools,name=ActiveTasks,*", "org.apache.cassandra.metrics:type=ThreadPools,name=TotalBlockedTasks,*", "org.apache.cassandra.metrics:type=ThreadPools,name=CurrentlyBlockedTasks,*", "org.apache.cassandra.metrics:type=DroppedMessage,name=Dropped,*", "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=HitRate,*", "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Hits,*", "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Requests,*", "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Entries,*", "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Size,*", #"org.apache.cassandra.metrics:type=Streaming,name=TotalIncomingBytes,*", #"org.apache.cassandra.metrics:type=Streaming,name=TotalOutgoingBytes,*", "org.apache.cassandra.metrics:type=Client,name=connectedNativeClients,*", "org.apache.cassandra.metrics:type=Client,name=connectedThriftClients,*", "org.apache.cassandra.metrics:type=Table,name=WriteLatency,*", "org.apache.cassandra.metrics:type=Table,name=ReadLatency,*", "org.apache.cassandra.net:type=FailureDetector,*", ] #blacklistObjectNames: ["org.apache.cassandra.metrics:type=ColumnFamily,*"] rules: - pattern: org.apache.cassandra.metrics<type=(Connection|Streaming), scope=(\S*), name=(\S*)><>(Count|Value) name: cassandra_$1_$3 labels: address: "$2" - pattern: org.apache.cassandra.metrics<type=(ColumnFamily), name=(RangeLatency)><>(Mean) name: cassandra_$1_$2_$3 - pattern: org.apache.cassandra.net<type=(FailureDetector)><>(DownEndpointCount) name: cassandra_$1_$2 - pattern: org.apache.cassandra.metrics<type=(Keyspace), keyspace=(\S*), name=(\S*)><>(Count|Mean|95thPercentile) name: cassandra_$1_$3_$4 labels: "$1": "$2" - pattern: org.apache.cassandra.metrics<type=(Table), keyspace=(\S*), scope=(\S*), name=(\S*)><>(Count|Mean|95thPercentile) name: cassandra_$1_$4_$5 labels: "keyspace": "$2" "table": "$3" - pattern: org.apache.cassandra.metrics<type=(ClientRequest), scope=(\S*), name=(\S*)><>(Count|Mean|95thPercentile) name: cassandra_$1_$3_$4 labels: "type": "$2" - pattern: org.apache.cassandra.metrics<type=(\S*)(?:, ((?!scope)\S*)=(\S*))?(?:, scope=(\S*))?, name=(\S*)><>(Count|Value) name: cassandra_$1_$5 labels: "$1": "$4" "$2": "$3"
2、修改cassandra配置文件 conf/cassandra-env.sh,
增加javaagent :
JVM_OPTS="$JVM_OPTS -javaagent:$CASSANDRA_HOME/lib/jamm-0.3.0.jar -javaagent:$CASSANDRA_HOME/lib/jmx_prometheus_javaagent-0.12.0.jar=7070:${CASSANDRA_HOME}/conf/cassandra-jmx.yml"
注:7070端口就是給promephues收集信息的端口
-
啟動
重啟cassandra 服務,啟動成功后,可以訪問 http://10.x.xx.100:7070/metrics/ ,(IP和端口要改成相應環境的)
看抓取的信息如下:
Prometheus配置
-
配置
修改prometheus組件的prometheus.yml加入cassandra監控:
vi /usr/local/prometheus-2.15.1/prometheus.yml
-
啟動驗證
先kill掉Prometheus進程,用以下命令重啟它,然后查看targets:
cd /usr/local/prometheus-2.15.1 nohup ./prometheus --config.file=prometheus.yml &
注:State=UP,說明成功
Grafana配置
-
導入儀表盤模板
導入 https://grafana.com/dashboards/5408 儀表盤,再結合自身業務修改過的最終儀表盤:
這里需要注意下,grafana的cassandra metric dashboard的json(https://grafana.com/grafana/dashboards/5408)有一些不正確的地方,需要人為修改下。
-
預警指標
序號 |
預警名稱 |
預警規則 |
描述 |
1 |
內存預警 |
當內存使用達到閾值【>80%】時進行預警 |
|
2 |
Gc耗時預警 |
當Gc耗時達到閾值【>0.3s】時進行預警 |
|
3 |
Gc次數預警 |
當每秒Gc次數達到閾值【>5】時進行預警 |
|