Distributed I/O Benchmark of HDFS¶
DFSIO is a built-in benchmark tool for HDFS I/O test. The jar file can be found at /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.1.jar.
Base Directory to Store Test Results¶
/benchmarks/TestDFSIO
If there are outputs, use fs commands to see the contents e.g.
hadoop fs -cat /benchmarks/TestDFSIO/io_write/part*
Run Write/Read¶
Read or Write test can be done by:
hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.1.jar TestDFSIO -write -nrFiles 16 -fileSize 1GB -resFile /tmp/$USER-dfsio-write.txt
OR
hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.1.jar TestDFSIO -read -nrFiles 16 -fileSize 1GB -resFile /tmp/$USER-dfsio-read.txt
Note
Change the number of files and the size of files to find better throughput.
Clean Up¶
Don’t forget to clean up test results after the completion, otherwise available storage space will be consumsed by the benchmark output files. The following command deletes files on the output directory (/benchmakrs/TestDFSIO) on HDFS.
hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.1.jar TestDFSIO -clean
Options¶
- -nrFiles: the number of files (equal to the number of map tasks)
- -fileSize: the size of a file to generate B|KB|MB|GB|TB is allowed