Prometheus監控Oracle數據庫

本文轉載自查看原文 2021-03-12 11:03 787 監控/ Prometheus

背景

本文簡單介紹下，Prometheus如何通過exporters監控Oracle數據庫，以及應該注意哪些指標。

oracledb_exporter

oracledb_exporter是一個連接到Oracle數據庫並生成Prometheus metrics的應用程序，

設置

展示下如何安裝和設置oracledb_exporter，以使用Prometheus來監控Oracle數據庫。oracledb_exporter部署在k8s集群中

在k8s使用Deployment部署oracledb_exporter，並添加注解，以實現Prometheus自動發現oracledb_exporter斷點並收集指標

spec:
 template:
   metadata:
     annotations:
       prometheus.io/scrape: "true"
       prometheus.io/port: "9161"
       prometheus.io/path: "/metrics"

oracledb_exporter需要Oracle的連接信息才能訪問和生成指標，此參數作為環境變量傳遞到exporter。由於連接信息包含用於訪問數據庫的用戶和密碼，因此我們將使用Kubernetes Secret來存儲它。

要創建到Oracle數據庫的連接字符串的密碼，可以使用以下命令：

kubectl create secret generic oracledb-exporter-secret \
    --from-literal=datasource='YOUR_CONNECTION_STRING'

在deployment中，這樣配置環境變量

       env:
       - name: DATA_SOURCE_NAME
         valueFrom:
           secretKeyRef:
             name: oracledb-exporter-secret
             key: datasource

要確保連接信息是否正確：

system/password@//database_url:1521/database_name.your.domain.com

可以使用 sqlplus docker鏡像進行檢測

docker run --net='host' --rm --interactive guywithnose/sqlplus sqlplus system/password@//database_url:1521/database_name.my.domain.com

下面添加一些自定義指標，包括慢查詢（slow queries），錯誤查詢（bug queries）
為了使用自定義指標：

在deployment中，我們將添加另一個環境變量，該變量具有到新指標的文件的路由。
從ConfigMap將此新文件掛載為volume

完整配置如下：

apiVersion: apps/v1
kind: Deployment
metadata:
  name: oracledb-exporter
  namespace: database-namespace
spec:
  selector:
    matchLabels:
      app: oracledb-exporter
  replicas: 1
  template:
    metadata:
      labels:
        app: oracledb-exporter
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9161"
        prometheus.io/path: "/metrics"
    spec:
      containers:
      - name: oracledb-exporter
        ports:
        - containerPort: 9161
        image: iamseth/oracledb_exporter
        env:
        - name: DATA_SOURCE_NAME
          valueFrom:
            secretKeyRef:
              name: oracledb-exporter-secret
              key: datasource
        - name: CUSTOM_METRICS
          value: /tmp/custom-metrics.toml
        volumeMounts:
          - name:  custom-metrics
            mountPath:  /tmp/custom-metrics.toml
            subPath: custom-metrics.toml
      volumes:
        - name: custom-metrics
          configMap:
            defaultMode: 420
            name: custom-metrics

ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: custom-metrics
  namespace: database-namespace
data:
  custom-metrics.toml: |
    [[metric]]
    context = "slow_queries"
    metricsdesc = { p95_time_usecs= "Gauge metric with percentile 95 of elapsed time.", p99_time_usecs= "Gauge metric with percentile 99 of elapsed time." }
    request = "select  percentile_disc(0.95)  within group (order by elapsed_time) as p95_time_usecs, percentile_disc(0.99)  within group (order by elapsed_time) as p99_time_usecs from v$sql where last_active_time >= sysdate - 5/(24*60)"
    [[metric]]
    context = "big_queries"
    metricsdesc = { p95_rows= "Gauge metric with percentile 95 of returned rows.", p99_rows= "Gauge metric with percentile 99 of returned rows." }
    request = "select  percentile_disc(0.95)  within group (order by rownum) as p95_rows, percentile_disc(0.99)  within group (order by rownum) as p99_rows from v$sql where last_active_time >= sysdate - 5/(24*60)"
    [[metric]]
    context = "size_user_segments_top100"
    metricsdesc = {table_bytes="Gauge metric with the size of the tables in user segments."}
    labels = ["segment_name"]
    request = "select * from (select segment_name,sum(bytes) as table_bytes from user_segments where segment_type='TABLE' group by segment_name) order by table_bytes DESC FETCH NEXT 100 ROWS ONLY"
    [[metric]]
    context = "size_user_segments_top100"
    metricsdesc = {table_partition_bytes="Gauge metric with the size of the table partition in user segments."}
    labels = ["segment_name"]
    request = "select * from (select segment_name,sum(bytes) as table_partition_bytes from user_segments where segment_type='TABLE PARTITION' group by segment_name) order by table_partition_bytes DESC FETCH NEXT 100 ROWS ONLY"
    [[metric]]
    context = "size_user_segments_top100"
    metricsdesc = {cluster_bytes="Gauge metric with the size of the cluster in user segments."}
    labels = ["segment_name"]
    request = "select * from (select segment_name,sum(bytes) as cluster_bytes from user_segments where segment_type='CLUSTER' group by segment_name) order by cluster_bytes DESC FETCH NEXT 100 ROWS ONLY"
    [[metric]]
    context = "size_dba_segments_top100"
    metricsdesc = {table_bytes="Gauge metric with the size of the tables in user segments."}
    labels = ["segment_name"]
    request = "select * from (select segment_name,sum(bytes) as table_bytes from dba_segments where segment_type='TABLE' group by segment_name) order by table_bytes DESC FETCH NEXT 100 ROWS ONLY"
    [[metric]]
    context = "size_dba_segments_top100"
    metricsdesc = {table_partition_bytes="Gauge metric with the size of the table partition in user segments."}
    labels = ["segment_name"]
    request = "select * from (select segment_name,sum(bytes) as table_partition_bytes from dba_segments where segment_type='TABLE PARTITION' group by segment_name) order by table_partition_bytes DESC FETCH NEXT 100 ROWS ONLY"
    [[metric]]
    context = "size_dba_segments_top100"
    metricsdesc = {cluster_bytes="Gauge metric with the size of the cluster in user segments."}
    labels = ["segment_name"]
    request = "select * from (select segment_name,sum(bytes) as cluster_bytes from dba_segments where segment_type='CLUSTER' group by segment_name) order by cluster_bytes DESC FETCH NEXT 100 ROWS ONLY"

創建Secret和ConfigMap之后，就可以應用Deployment並檢查它是否正在從Oracle數據庫的端口9161中獲取指標。

如果一切正常，Prometheus將自動發現exporter帶注釋的pod，並在幾分鍾內開始抓取指標。可以在Prometheus Web界面的target部分中對其進行檢查，以查找以oracledb_開頭的任何指標。

監控什么

性能指標

等待時間: exporter在Oracle數據庫的不同活動中提供一系列等待時間的指標。它們都以oracledb_wait_time_前綴開頭，它們有助於評估數據庫在哪里花費了更多時間。它可以存在於I/O，網絡，提交，並發等中。通過這種方式，我們可以確定系統中可能影響Oracle數據庫整體性能的瓶頸。

慢查詢：某些查詢返回結果所花的時間可能比其他查詢長。如果此時間高於應用程序中配置的接收響應的超時時間，它將認為這是來自數據庫的超時錯誤，然后重試查詢。這種行為可能會使系統超負荷工作，並影響整體性能。

在上面顯示的配置中，有兩個自定義指標可提供最近5分鍾內執行查詢的響應時間的百分比95和99的信息。這些指標是：

oracledb_slow_queries_p95_time_usecs
oracledb_slow_queries_p99_time_usecs

活動會話：監視Oracle數據庫中活動會話很重要。如果超過配置的限制，則數據庫將拒絕新連接，從而導致應用程序錯誤。提供此信息的指標是oracledb_sessions_value，標簽status可以提供更多信息。

活動：監視數據庫執行的操作也很重要。為此，我們可以依靠以下指標：

oracledb_activity_execute_count
oracledb_activity_parse_count_total
oracledb_activity_user_commits
oracledb_activity_user_rollbacks

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Prometheus+Grafana監控MySQL、Redis數據庫使用 Docker 部署 Grafana + Prometheus 監控 MySQL 數據庫 Zabbix監控Oracle數據庫---orabbix 關於oracle數據庫性能監控指標使用Zabbix監控Oracle數據庫 zabbix通過Orabbix監控oracle數據庫使用zabbix監控oracle數據庫 Python Oracle數據庫監控 Oracle數據庫常用監控語句 Oracle管理監控之如何對數據庫進行監控檢查