利用hadoop自帶的測試程序測試集群性能
使用TestDFSIO、mrbench、nnbench、Terasort 、sort 幾個使用較廣的基准測試程序
測試程序在:
${HADOOP_HOME}/share/hadoop/mapreduce/
一、查看工具
$ hadoop jar hadoop-mapreduce-client-jobclient-2.9.2-tests.jar
An example program must be given as the first argument.
Valid program names are:
DFSCIOTest: Distributed i/o benchmark of libhdfs.
DistributedFSCheck: Distributed checkup of the file system consistency.
JHLogAnalyzer: Job History Log analyzer.
MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
NNdataGenerator: Generate the data to be used by NNloadGenerator
NNloadGenerator: Generate load on Namenode using NN loadgenerator run WITHOUT MR
NNloadGeneratorMR: Generate load on Namenode using NN loadgenerator run as MR job
NNstructureGenerator: Generate the structure to be used by NNdataGenerator
SliveTest: HDFS Stress Test and Live Data Verification.
TestDFSIO: Distributed i/o benchmark.
fail: a job that always fails
filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)
gsleep: A sleep job whose mappers create 1MB buffer for every record.
largesorter: Large-Sort tester
loadgen: Generic map/reduce load generator
mapredtest: A map/reduce test check.
minicluster: Single process HDFS and MR cluster.
mrbench: A map/reduce benchmark that can create many small jobs
nnbench: A benchmark that stresses the namenode w/ MR.
nnbenchWithoutMR: A benchmark that stresses the namenode w/o MR.
sleep: A job that sleeps at each map and reduce task.
testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
testfilesystem: A test for FileSystem read/write.
testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
testsequencefile: A test for flat files of binary key value pairs.
testsequencefileinputformat: A test for sequence file input format.
testtextinputformat: A test for text input format.
threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill
timelineperformance: A job that launches mappers to test timline service performance.
二、TestDFSIO
改程序主要用於測試集群的io性能
查看參數選項
$ hadoop jar hadoop-mapreduce-client-jobclient-2.9.2-tests.jar TestDFSIO 20/05/27 14:11:42 INFO fs.TestDFSIO: TestDFSIO.1.8 Missing arguments. Usage: TestDFSIO [genericOptions] -read [-random | -backward | -skip [-skipSize Size]] | -write | -append | -truncate | -clean [-compression codecClassName] [-nrFiles N] [-size Size[B|KB|MB|GB|TB]] [-resFile resultFileName] [-bufferSize Bytes] [-rootDir][hduser@yjt mapreduce]$
1、測試HDFS寫性能
向hdfs集群寫10個128M的文件
$ hadoop jar hadoop-mapreduce-client-jobclient-2.9.2-tests.jar TestDFSIO -write -nrFiles 10 -size 128MB -resFile /home/hduser/TestDFSIO_result.txt
輸出結果截取的最后一段,這段信息也就是最終的結果,或者可以查看本地系統的/home/hduser/TestDFSIO_result.txt文件,改文件也是最終的測試結果

從上圖看:
寫入是個10件,
吞吐量:67.88 mb/sec
平均IO速率:94.41 mb/sec
IO rate std deviation(IO速率標准偏差): 43.3mb/sec
總的執行時間:47.58s
2、測試hdfs讀性能
從hdfs讀取10個128m的文件
$ hadoop jar hadoop-mapreduce-client-jobclient-2.9.2-tests.jar TestDFSIO -read -nrFiles 10 -size 128MB -resFile /home/hduser/TestDFSIO_read_result.txt
結果如下:

3、清除測試數據
hadoop jar hadoop-mapreduce-client-jobclient-2.9.2-tests.jar TestDFSIO -clean
測試數據默認存儲在集群的/benchmarks目錄下:

三、nnbech
nnbench用於測試NameNode的負載,它會生成很多與HDFS相關的請求,給NameNode施加較大的壓力。這個測試能在HDFS上模擬創建、讀取、重命名和刪除文件等操作。
用法:
$ hadoop jar hadoop-mapreduce-client-jobclient-2.9.2-tests.jar nnbench --help Usage: nnbench <options> Options: -operation <Available operations are create_write open_read rename delete. This option is mandatory> * NOTE: The open_read, rename and delete operations assume that the files they operate on, are already available. The create_write operation must be run before running the other operations. -maps <number of maps. default is 1. This is not mandatory> -reduces <number of reduces. default is 1. This is not mandatory> -startTime <time to start, given in seconds from the epoch. Make sure this is far enough into the future, so all maps (operations) will start at the same time. default is launch time + 2 mins. This is not mandatory> -blockSize <Block size in bytes. default is 1. This is not mandatory> -bytesToWrite <Bytes to write. default is 0. This is not mandatory> -bytesPerChecksum <Bytes per checksum for the files. default is 1. This is not mandatory> -numberOfFiles <number of files to create. default is 1. This is not mandatory> -replicationFactorPerFile <Replication factor for the files. default is 1. This is not mandatory> -baseDir <base DFS path. default is /benchmarks/NNBench. This is not mandatory> -readFileAfterOpen <true or false. if true, it reads the file and reports the average time to read. This is valid with the open_read operation. default is false. This is not mandatory> -help: Display the help statement
測試:使用10個map,5個reduce,創建1000個文件,並且寫入數據以及打開和讀取數據
$ hadoop jar hadoop-mapreduce-client-jobclient-2.9.2-tests.jar nnbench -operation create_write -maps 10 -reduces 5 -numberOfFiles 1000 -readFileAfterOpen true
測試結果:

四、mrbench
mrbench會多次重復執行一個小作業,用於檢查在機群上小作業的運行是否可重復以及運行是否高效。
用法:
$ hadoop jar hadoop-mapreduce-client-jobclient-2.9.2-tests.jar mrbench --help MRBenchmark.0.0.2 Usage: mrbench [-baseDir <base DFS path for output/input, default is /benchmarks/MRBench>] [-jar <local path to job jar file containing Mapper and Reducer implementations, default is current jar file>] [-numRuns <number of times to run the job, default is 1>] [-maps <number of maps for each run, default is 2>] [-reduces <number of reduces for each run, default is 1>] [-inputLines <number of input lines to generate, default is 1>] [-inputType <type of input to generate, one of ascending (default), descending, random>] [-verbose]
測試:
$ hadoop jar hadoop-mapreduce-client-jobclient-2.9.2-tests.jar mrbench -numRuns 10 -maps 10 -reduces 5 -inputLines 10 -inputType descending
結果:

五、Terasort
Terasort是測試Hadoop的一個有效的排序程序。通過Hadoop自帶的Terasort排序程序,測試不同的Map任務和Reduce任務數量,對Hadoop性能的影響。 實驗數據由程序中的teragen程序生成,數量為1G和10G。
一個TeraSort測試需要按三步:
1. TeraGen生成隨機數據
2. TeraSort對數據排序
3. TeraValidate來驗證TeraSort輸出的數據是否有序,如果檢測到問題,將亂序的key輸出到目錄
用法:
$ hadoop jar hadoop-mapreduce-examples-2.9.2.jar --help Unknown program '--help' chosen. Valid program names are: aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files. aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files. bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi. dbcount: An example job that count the pageview counts from a database. distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi. grep: A map/reduce program that counts the matches of a regex in the input. join: A job that effects a join over sorted, equally partitioned datasets multifilewc: A job that counts words from several files. pentomino: A map/reduce tile laying program to find solutions to pentomino problems. pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method. randomtextwriter: A map/reduce program that writes 10GB of random textual data per node. randomwriter: A map/reduce program that writes 10GB of random data per node. secondarysort: An example defining a secondary sort to the reduce. sort: A map/reduce program that sorts the data written by the random writer. sudoku: A sudoku solver. teragen: Generate data for the terasort terasort: Run the terasort teravalidate: Checking results of terasort wordcount: A map/reduce program that counts the words in the input files. wordmean: A map/reduce program that counts the average length of the words in the input files. wordmedian: A map/reduce program that counts the median length of the words in the input files. wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files
測試:
1. TeraGen生成隨機數,生成1G的隨機數據,結果存放在/user/hduser/test_data
$ hadoop jar hadoop-mapreduce-examples-2.9.2.jar teragen 10000000 test_data # 注意,這個測試數據大小不能寫成1g或者1t等這樣的格式,在測試的時候使用這種格式,發現生成的數據大小為0

2、TeraSort排序,將結果輸出到目錄/user/hduser/terasort-output
$ hadoop jar hadoop-mapreduce-examples-2.9.2.jar terasort test_data terasort-output
查看hdfs上的數據:

3、使用teravalidate檢查排序的結果是否正確
$ hadoop jar hadoop-mapreduce-examples-2.9.2.jar teravalidate terasort-output terasort-validate
查看terasort-validate/part-r-0000,以確定發生了哪些錯誤
$ hadoop fs -cat terasort-validate/part-r-00000 checksum 4c49607ac53602
本文借鑒:
https://blog.csdn.net/lingeio/article/details/93869306
https://www.cnblogs.com/zhaohz/p/12117079.html
$ hadoop fs -cat terasort-validate/part-r-00000checksum4c49607ac53602
