Hadoop記錄-Hadoop監控指標匯總


系統參數監控metrics

load_one            每分鍾的系統平均負載

load_fifteen        每15分鍾的系統平均負載

load_five           每5分鍾的系統平均負載

boottime            系統啟動時間,精確到秒

bytes_in            網絡接收速度,單位bytes/sec

bytes_out           網絡發送速度,單位bytes/sec

cpu_aidle            啟動的空閑CPU百分比

cpu_idle            空閑CPU百分比  

cpu_nice            用戶進程空間內改變過優先級的進程占用CPU百分比

cpu_num             CPU線程總數

cpu_report          CPU使用情況匯總報告

cpu_speed           CPU速度(MHz)

cpu_system          內核空間占用CPU百分比

cpu_user            用戶空間占用CPU百分比

cpu_wio                CPU空閑時的最大I/O請求

proc_total          進程總數

swap_free            空閑交換分區空閑大小

swap_total            空閑交換分區大小(KBs顯示)

disk_free            剩余磁盤空間

disk_total            磁盤總大小

ip_address          ip地址列表

last_reported        最后一次報告時間

load_report            系統負載匯總報告

location            定位信息(經緯度)

machine_type        系統版本(X86或64)

mem_buffers            內核緩存的內存總量

mem_cached            緩存內存大小

mem_free            空閑內存大小

mem_report            內存匯總報告

mem_shared            共享內存大小

mem_total            物理內存總量(KBs顯示)

os_name                操作系統名稱

os_release            操作系統版本

pkts_in                每秒進來的包數

pkts_out            每秒出去的包數

proc_run            運行的進程總數

packet_report        包匯總報告

network_report        網絡匯總報告

namenode監控metrics

dfs.namenode.SafeModeTime                        safemode時間

dfs.namenode.AddBlockOps                        寫入block次數

dfs.namenode.BlockReportAvgTime                    block report的平均時間次數

dfs.namenode.BlockReportNumOps                    block report的次數

dfs.namenode.CreateFileOps                        創建文件次數

dfs.namenode.DeleteFileOps                        刪除文件次數

dfs.namenode.FileInfoOps                        查看文件info次數

dfs.namenode.FilesCreated                        已創建的文件個數

dfs.namenode.FilesDeleted                        已刪除的文件個數

dfs.namenode.FilesInGetListingOps                getlist操作次數

dfs.namenode.FilesRenamed                        重命名文件個數

dfs.namenode.FsImageLoadTime                    fsimage加載時間

dfs.namenode.GetAdditionalDatanodeOps            GetAdditionalDatanode操作次數

dfs.namenode.GetBlockLocations                    獲取block位置操作次數

dfs.namenode.GetListingOps                        getListing操作次數

dfs.namenode.SyncsAvgTime                        將操作同步為editlog的平均時間

dfs.namenode.SyncsNumOps                        將操作同步為editlog的次數

dfs.namenode.TransactionsAvgTime                transcation的平均時間

dfs.namenode.TransactionsBatchedInSync            transcation在flush時發現已經被sync的情況的次數

dfs.namenode.TransactionsNumOps                    transcation的個數

datanode參數監控metrics

dfs.datanode.BlockReportsAvgTime                            向namenode匯報block的平均時間

dfs.datanode.BlockReportsNumOps                                向namenode匯報block的次數

dfs.datanode.BlocksRead                                        從硬盤讀塊的次數

dfs.datanode.BlocksRemoved                                    刪除塊的個數

dfs.datanode.BlocksReplicated                                備份塊操作的個數

dfs.datanode.BlocksVerified                                    驗證塊的次數

dfs.datanode.BlocksWritten                                    寫入塊的個數

dfs.datanode.BytesRead                                        讀出總字節

dfs.datanode.BytesWritten                                    寫入總字節

dfs.datanode.CopyBlockOpAvgTime                                復制塊的平均時間

dfs.datanode.CopyBlockOpNumOps                                復制塊的次數                               

dfs.datanode.HeartbeatsAvgTime                                向namenode匯報的平均時間

dfs.datanode.HeartbeatsNumOps                                向namenode匯報的次數

dfs.datanode.ReadBlockOpAvgTime                                讀數據塊的平均時間

dfs.datanode.ReadBlockOpNumOps                                讀數據塊的次數

dfs.datanode.ReadsFromLocalClient                            本地讀取的次數

dfs.datanode.ReadsFromRemoteClient                            遠程讀取的次數

dfs.datanode.WriteBlockOpAvgTime                            寫數據塊的平均時間

dfs.datanode.WriteBlockOpNumOps                                寫數據塊的次數

dfs.datanode.WritesFromLocalClient                            寫本地的次數

dfs.datanode.WritesFromRemoteClient                            寫遠程的次數

dfs.datanode.PacketAckRoundTripTimeNanosAvgTime                包確認平均時間

dfs.datanode.PacketAckRoundTripTimeNanosNumOps              包確認次數

dfs.datanode.FlushNanosAvgTime                                文件系統flush平均時間

dfs.datanode.FlushNanosNumOps                               文件系統flush次數

dfs.datanode.ReplaceBlockOpAvgTime                            塊替換平均時間

dfs.datanode.ReplaceBlockOpNumOps                            塊替換次數    

dfs.datanode.SendDataPacketBlockedOnNetworkNanosAvgTime     網絡上發送塊平均時間

dfs.datanode.SendDataPacketBlockedOnNetworkNanosNumOps      網絡上發生塊次數

dfs.datanode.SendDataPacketTransferNanosAvgTime             網絡上發送包平均時間

dfs.datanode.SendDataPacketTransferNanosNumOps                網絡上發送包個數

HDFS文件系統metric

dfs.FSNamesystem.BlockCapacity                         block的總容量

dfs.FSNamesystem.BlocksTotal                        block的當前容量

dfs.FSNamesystem.CapacityRemainingGB                HDFS文件系統剩余的容量

dfs.FSNamesystem.CapacityTotalGB                    HDFS文件系統總體容量

dfs.FSNamesystem.CapacityUsedGB                        HDFS文件系統已使用的容量

dfs.FSNamesystem.CorruptBlocks                        已損壞的block數量

dfs.FSNamesystem.ExcessBlocks                        多余的block                        

dfs.FSNamesystem.ExpiredHeartbeats                    超時的心跳

dfs.FSNamesystem.FilesTotal                            文件總數

dfs.FSNamesystem.LastCheckpointTime                    最近一次做checkpoint的時間

dfs.FSNamesystem.LastWrittenTransactionId            最近一次寫入的transactionid

dfs.FSNamesystem.MillisSinceLastLoadedEdits            距離上一次加載edit的時間

dfs.FSNamesystem.MissingBlocks                        丟失的block數量

dfs.FSNamesystem.TotalFiles                            文件總個數

dfs.FSNamesystem.UnderReplicatedBlocks                副本個數不夠的block

dfs.FSNamesystem.PendingDataNodeMessageCount        datanode的請求被queue在standby namenode的個數

dfs.FSNamesystem.PendingDeletionBlocks                未被驗證的block個數

dfs.FSNamesystem.PendingReplicationBlocks            等待被備份的block個數

dfs.FSNamesystem.PostponedMisreplicatedBlocks        被推遲處理的錯誤備份的block個數

dfs.FSNamesystem.ScheduledReplicationBlocks            排定要備份的block個數

dfs.FSNamesystem.TotalLoad                            namenode的Xceiver個數

dfs.FSNamesystem.TransactionsSinceLastCheckpoint    從上次checkpoint起到現在新的transcation的個數

dfs.FSNamesystem.TransactionsSinceLastLogRoll        從上次roll editlog起到現在新的transcation的個數

hbase.master metrics

hbase.master.cluster_requests                        當前機器整體request的個數

hbase.master.splitSize_avg_time                        splitlog的大小

hbase.master.splitSize_num_ops                        splitlog次數

hbase.master.splitTime_avg_time                        splitlog的時間

hbase.master.splitTime_num_ops                        splitlog的次數

hbase參數監控metrics

hbase.regionserver.blockCacheCount                            RegionServer中緩存到blockcache中block的個數。

hbase.regionserver.blockCacheEvictedCount                    BlockCache中被換出的Block的個數。

hbase.regionserver.blockCacheFree                            返回block cache中空閑的內存大小。

hbase.regionserver.blockCacheHitCachingRatio                HitCache表示因為讀取不到而cacheblock的行為,blockCacheHitCachingRatio表示發生該行為的比率

hbase.regionserver.blockCacheHitCount                        blockCache命中次數

hbase.regionserver.blockCacheHitRatio                        blockCache命中比例

hbase.regionserver.blockCacheMissCount                        blockCache非命中比例

hbase.regionserver.blockCacheSize                            blockCache大小

hbase.regionserver.compactionQueueSize                        compaction Queue的大小

hbase.regionserver.compactionSize_avg_time                    平均執行一次Compaction的數據大小

hbase.regionserver.compactionSize_num_ops                    執行compaction的次數

hbase.regionserver.compactionTime_avg_time                    平均執行一次Compaction的時間

hbase.regionserver.compactionTime_num_ops                    執行compaction的次數

hbase.regionserver.deleteRequestLatency_75th_percentile        75%的刪除請求延時的概率統計

hbase.regionserver.deleteRequestLatency_95th_percentile        95%的刪除請求延時的概率統計

hbase.regionserver.deleteRequestLatency_99th_percentile        99%的刪除請求延時的概率統計

hbase.regionserver.deleteRequestLatency_max                    刪除請求的最大值

hbase.regionserver.deleteRequestLatency_mean                刪除請求的平均值

hbase.regionserver.deleteRequestLatency_median                刪除請求的中位值

hbase.regionserver.deleteRequestLatency_min                    刪除請求的最小值

hbase.regionserver.deleteRequestLatency_num_ops                刪除請求的個數

hbase.regionserver.deleteRequestLatency_std_dev                刪除請求的標准差

hbase.regionserver.flushQueueSize                            flush Queue的大小

hbase.regionserver.flushSize_avg_time                        平均執行一次flush的數據大小

hbase.regionserver.flushSize_num_ops                        執行flush的次數

hbase.regionserver.flushTime_avg_time                        平均執行一次flush的時間

hbase.regionserver.flushTime_num_ops                        執行flush的次數

hbase.regionserver.fsReadLatencyHistogram_75th_percentile    75%的讀HLog時間的概率統計

hbase.regionserver.fsReadLatencyHistogram_95th_percentile    95%的讀HLog時間的概率統計

hbase.regionserver.fsReadLatencyHistogram_99th_percentile    99%的讀HLog時間的概率統計

hbase.regionserver.fsReadLatencyHistogram_max                讀HLog時間的最大值

hbase.regionserver.fsReadLatencyHistogram_mean                讀HLog時間的平均值

hbase.regionserver.fsReadLatencyHistogram_median            讀HLog時間的中位值

hbase.regionserver.fsReadLatencyHistogram_min                讀HLog時間的最小值

hbase.regionserver.fsReadLatencyHistogram_num_ops            讀HLog的次數

hbase.regionserver.fsReadLatencyHistogram_std_dev            讀HLog時間的標准差

hbase.regionserver.fsReadLatency_avg_time                    讀HLog時間的平均時間

hbase.regionserver.fsReadLatency_num_ops                    讀HLog時間的次數

hbase.regionserver.fsSyncLatency_avg_time                    sync HLog的平均時間

hbase.regionserver.fsSyncLatency_num_ops                    sync HLog的次數

hbase.regionserver.fsWriteLatencyHistogram_75th_percentile    75%的寫HLog的概率統計

hbase.regionserver.fsWriteLatencyHistogram_95th_percentile    95%的寫HLog的概率統計

hbase.regionserver.fsWriteLatencyHistogram_99th_percentile    99%的寫HLog的概率統計

hbase.regionserver.fsWriteLatencyHistogram_max                寫HLog時間的最大值

hbase.regionserver.fsWriteLatencyHistogram_mean                寫HLog時間的最大值

hbase.regionserver.fsWriteLatencyHistogram_median            寫HLog時間的最大值

hbase.regionserver.fsWriteLatencyHistogram_min                寫HLog時間的最大值

hbase.regionserver.fsWriteLatencyHistogram_num_ops            寫HLog的次數

hbase.regionserver.fsWriteLatencyHistogram_std_dev            寫HLog時間的標准差

hbase.regionserver.fsWriteLatency_avg_time                    寫HLog操作的平均Latency

hbase.regionserver.fsWriteLatency_num_ops                    寫HLog操作的次數

hbase.regionserver.getRequestLatency_75th_percentile        75%的get請求時間的概率統計

hbase.regionserver.getRequestLatency_95th_percentile        95%的get請求時間的概率統計

hbase.regionserver.getRequestLatency_99th_percentile        99%的get請求時間的概率統計

hbase.regionserver.getRequestLatency_max                    get請求時間的最大值

hbase.regionserver.getRequestLatency_mean                   get請求時間的平均值

hbase.regionserver.getRequestLatency_median                 get請求時間的中位值

hbase.regionserver.getRequestLatency_min                    get請求時間的最小值

hbase.regionserver.getRequestLatency_num_ops                get請求的次數

hbase.regionserver.getRequestLatency_std_dev                get請求時間的標准差

hbase.regionserver.hdfsBlocksLocalityIndex                    統計RegionServer所在機器的數據本地化的概率

hbase.regionserver.hlogFileCount                            hlog file的個數

hbase.regionserver.mbInMemoryWithoutWAL                        RegionServer中不寫WAL的Put操作的數據在Memstore占用的空間

hbase.regionserver.memstoreSizeMB                            RegionServer中所有HRegion中的memstore大小的總和

hbase.regionserver.numPutsWithoutWAL                        RegionServer中不寫WAL(Write-Ahead-Log)的Put操作的個數

hbase.regionserver.putRequestLatency_75th_percentile        75%的put請求時間的概率統計

hbase.regionserver.putRequestLatency_95th_percentile        95%的put請求時間的概率統計

hbase.regionserver.putRequestLatency_99th_percentile        99%的put請求時間的概率統計

hbase.regionserver.putRequestLatency_max                    put請求時間的最大值

hbase.regionserver.putRequestLatency_mean                   put請求時間的平均值

hbase.regionserver.putRequestLatency_median                 put請求時間的中位值

hbase.regionserver.putRequestLatency_min                    put請求時間的最小值

hbase.regionserver.putRequestLatency_num_ops                put請求的次數

hbase.regionserver.putRequestLatency_std_dev                put請求時間的標准差

hbase.regionserver.readRequestsCount                        讀請求的數量:readRequestCount與客戶端讀取數據的個數不等價,而且大部分情況下readRequestCount 遠小於客戶端讀取數據個數,因為next(1000)只算一次請求

hbase.regionserver.regionSplitFailureCount                    region split失敗的次數

hbase.regionserver.regionSplitSuccessCount                    region split成功的次數

hbase.regionserver.regions                                    region的個數

hbase.regionserver.requests                                    請求的數量

hbase.regionserver.rootIndexSizeKB                            storefileIndex的大小,和storefileIndexSizeMB相同

hbase.regionserver.storefileIndexSizeMB                        storefileIndex的大小

hbase.regionserver.storefiles                                RegionServer中所有的Storefiles的個數

hbase.regionserver.stores                                    RegionServer包含的Store的個數

hbase.regionserver.totalStaticBloomSizeKB                    所有Store上的Bloom Filter大小的總和。

hbase.regionserver.totalStaticIndexSizeKB                    HRegionServer上每個HFile文件的IndexSize的大小,這是指未壓縮的,不帶有其它信息的所有HFileBlockIndex信息的總和 。

hbase.regionserver.writeRequestsCount                        寫請求的數量:writeRequestCount與客戶端寫操作個數不完全等價,批量寫只記做一次請求,大部分情況下writeRequestCount遠小於客戶端寫操作的個數(尤其批量寫頻繁的情況下)。

map/reduce參數監控metrics

mapred.ShuffleMetrics.ShuffleConnections                    shuffle的連接數

mapred.ShuffleMetrics.ShuffleOutputBytes                    shuffle輸出數據大小

mapred.ShuffleMetrics.ShuffleOutputsFailed                    shuffle失敗的次數

mapred.ShuffleMetrics.ShuffleOutputsOK                        shuffle成功的次數

yarn(map/reduce v2)參數監控metrics

yarn.NodeManagerMetrics.AllocatedContainers                    當前分配的container個數

yarn.NodeManagerMetrics.AllocatedGB                            當前分配的container內存

yarn.NodeManagerMetrics.AvailableGB                            當前free的內存

yarn.NodeManagerMetrics.ContainersCompleted                    完成狀態的container個數

yarn.NodeManagerMetrics.ContainersIniting                    初始化狀態的container個數

yarn.NodeManagerMetrics.ContainersKilled                    killed狀態的container個數

yarn.NodeManagerMetrics.ContainersLaunched                    啟動態的container個數

yarn.NodeManagerMetrics.ContainersRunning                    運行態的container的個數

yarn 集群metrics

yarn.ClusterMetrics.NumActiveNMs                            活的nodemanager個數

yarn.ClusterMetrics.NumLostNMs                                不健康的nodemanager個數

yarn 任務隊列metrics

yarn.QueueMetrics.ActiveApplications                        活躍的task的個數

yarn.QueueMetrics.ActiveUsers                                活躍的用戶個數        

yarn.QueueMetrics.AggregateContainersAllocated                總共分配的container個數

yarn.QueueMetrics.AggregateContainersReleased                總共釋放的container個數

yarn.QueueMetrics.AllocatedContainers                        已經分配的container個數

yarn.QueueMetrics.AllocatedMB                                已經分配的內存

yarn.QueueMetrics.AppsCompleted                                已完成的task數

yarn.QueueMetrics.AppsPending                                掛起的task數

yarn.QueueMetrics.AppsRunning                                運行的task數

yarn.QueueMetrics.AppsSubmitted                                已經提交的task數

yarn.QueueMetrics.AvailableMB                                可用的內存

yarn.QueueMetrics.PendingContainers                            掛起的container數

yarn.QueueMetrics.PendingMB                                    掛起的內存

yarn.QueueMetrics.running_0                                    運行時間在0-60分鍾內的task個數

yarn.QueueMetrics.running_1440                                運行時間在1440分鍾以上的task個數

yarn.QueueMetrics.running_300                                運行時間在300-1440分鍾內的task個數

yarn.QueueMetrics.running_60                                運行時間在60-300分鍾內的task個數

hadoop rpc參數監控metrics

rpc.metrics.NumOpenConnections                        number of open connections rpc連接打開的數目

rpc.metrics.ReceivedBytes                             number of bytes received rpc收到的字節數

rpc.metrics.RpcProcessingTime_avg_time                Average time for RPC Operations in last interval rpc在最近的交互中平均操作時間                   

rpc.metrics.RpcProcessingTime_num_ops                 rpc在最近的交互中連接數目

rpc.metrics.RpcQueueTime_avg_time                     rpc在交互中平均等待時間

rpc.metrics.RpcQueueTime_num_ops                      rpc queue中完成的rpc操作數目

rpc.metrics.SentBytes                                 number of bytes sent  rpc發送的數據字節

rpc.metrics.callQueueLen                              length of the rpc queue  rpc 隊列長度

rpc.metrics.rpcAuthenticationFailures                 number of failed authentications  rpc 驗證失敗次數

rpc.metrics.rpcAuthenticationSuccesses                number of successful authentications   驗證成功數

rpc.metrics.rpcAuthorizationFailures                  number of failed authorizations   授權失敗次數

rpc.metrics.rpcAuthorizationSuccesses                 number of successful authorizations  成功次數

rpc.detailed-metrics.canCommit_avg_time                  rpc詢問是否提交任務平均時間                                                                                                                                                                                                                                

rpc.detailed-metrics.canCommit_num_ops                rpc詢問是否提交任務次數                                                                                                                                                                                                                                     

rpc.detailed-metrics.commitPending_avg_time           rpc報告任務提交完成,但是該提交仍然處於pending狀態的平均時間                                                                                                                                                                                           

rpc.detailed-metrics.commitPending_num_ops            rpc報告任務提交完成,但是該提交仍然處於pending狀態的次數                                                                                                                                                                                                    

rpc.detailed-metrics.done_avg_time                    rpc報告任務成功完成的平均時間                                                                                                                                                                                                                              

rpc.detailed-metrics.done_num_ops                     rpc報告任務成功完成的次數                                                                                                                                                                                                                                   

rpc.detailed-metrics.fatalError_avg_time              rpc報告任務出現fatalerror的平均時間                                                                                                                                                                                                                         

rpc.detailed-metrics.fatalError_num_ops               rpc報告任務出現fatalerror的次數                                                                                                                                                                                                                            

rpc.detailed-metrics.getBlockInfo_avg_time            從指定datanode獲取block的平均時間                                                                                                                                                                                                                          

rpc.detailed-metrics.getBlockInfo_num_ops             從指定datanode獲取block的次數                                                                                                                                                                                                                               

rpc.detailed-metrics.getMapCompletionEvents_avg_time  reduce獲取已經完成的map輸出地址事件的平均時間

rpc.detailed-metrics.getMapCompletionEvents_num_ops   reduce獲取已經完成的map輸出地址事件的次數

rpc.detailed-metrics.getProtocolVersion_avg_time      獲取rpc協議版本信息的平均時間

rpc.detailed-metrics.getProtocolVersion_num_ops       獲取rpc協議版本信息的次數

rpc.detailed-metrics.getTask_avg_time                 當子進程啟動后,獲取jvmtask的平均時間

rpc.detailed-metrics.getTask_num_ops                  當子進程啟動后,獲取jvmtask的次數

rpc.detailed-metrics.ping_avg_time                    子進程周期性的檢測父進程是否還存活的平均時間

rpc.detailed-metrics.ping_num_ops                     子進程周期性的檢測父進程是否還存活的次數

rpc.detailed-metrics.recoverBlock_avg_time             為指定的block開始恢復標記生成的平均時間

rpc.detailed-metrics.recoverBlock_num_ops              為指定的block開始恢復標記生成的次數

rpc.detailed-metrics.reportDiagnosticInfo_avg_time     向父進程報告任務錯誤消息的平均時間,該操作應盡可能少,這些消息會在jobtracker中保存

rpc.detailed-metrics.reportDiagnosticInfo_num_ops      向父進程報告任務錯誤消息的次數

rpc.detailed-metrics.startBlockRecovery_avg_time       開始恢復block的平均時間

rpc.detailed-metrics.startBlockRecovery_num_ops        開始恢復block的次數

rpc.detailed-metrics.statusUpdate_avg_time             匯報子進程進度給父進程的平均時間

rpc.detailed-metrics.statusUpdate_num_ops              匯報子進程進度給父進程的次數

rpc.detailed-metrics.updateBlock_avg_time              更新block到新的標記及長度的平均操作時間

rpc.detailed-metrics.updateBlock_num_ops               更新block到新的標記及長度的次數

jvm參數監控metrics

jvm.JvmMetrics.GcCount                            JVM進行GC的次數

jvm.JvmMetrics.GcTimeMillis                        GC花費的時間,單位為微妙

jvm.JvmMetrics.LogError                            Log中輸出ERROR的次數

jvm.JvmMetrics.LogFatal                            Log中輸出FATAL的次數

jvm.JvmMetrics.LogInfo                            Log中輸出INFO的次數

jvm.JvmMetrics.LogWarn                            Log中輸出WARN的次數

jvm.JvmMetrics.MemHeapCommittedM                JVM分配的堆大小(單位MB)

jvm.JvmMetrics.MemHeapUsedM                        JVM已經使用的堆大小(單位MB)

jvm.JvmMetrics.MemNonHeapCommittedM                JVM分配給非堆的大小(單位M)

jvm.JvmMetrics.MemNonHeapUsedM                    JVM已使用的非堆的大小(單位M)

jvm.JvmMetrics.ThreadsBlocked                    處於BLOCKED狀態線程數量

jvm.JvmMetrics.ThreadsNew                        處於NEW狀態線程數量

jvm.JvmMetrics.ThreadsRunnable                    處於RUNNABLE狀態線程數量

jvm.JvmMetrics.ThreadsTerminated                處於TERMINATED狀態線程數量

jvm.JvmMetrics.ThreadsTimedWaiting                處於TIMED_WAITING狀態線程數量

jvm.JvmMetrics.ThreadsWaiting                    處於WAITING狀態線程數量

 

本文轉發自 https://www.cnblogs.com/xinfang520/p/10653335.html


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM