Tool for gathering blocks and replicas meta data from HDFS. It also builds a heat map showing how replicas are distributed along disks and nodes.

Find detailed information at a CERN DB blog entry dedicated to this tool: [http://db-blog.web.cern.ch/blog/daniel-lanza-garcia/2016-04-tool-visualise-block-distribution-hadoop-hdfs-cluster]

Build project:

bin/compile

Usage (Hadoop generic options can be passed):

bin/hdfs-blkd [-Dhdfs.tool.blocks.dumping.path=<FILE_PATH>] <path> [<max_number_of_blocks_to_print>]

If hdfs.tool.blocks.dumping.path configuration parameter is set, block distribution will be dump to specified path in CSV format.

Example output:

[[email protected]]$ bin/hdfs-blkd /user/ 2

Showing metadata for: hdfs://machine-1.cern.ch/user/
    isDirectory: true
    isFile: false
    isSymlink: false
    encrypted: false
    length: 0
    replication: 0
    blocksize: 0
    modification_time: Tue Mar 15 13:48:08 CET 2016
    access_time: Thu Jan 01 01:00:00 CET 1970
    owner: user
    group: supergroup
    permission: rwxr-xr-x

Collecting block locations...
Collected 269 locations.

Data directories and disk ids
  DiskId: 0  Directory: FILE:/data01/hadoop/hdfs/data
  DiskId: 1  Directory: FILE:/data1/hadoop/hdfs/data
  DiskId: 2  Directory: FILE:/data10/hadoop/hdfs/data
  DiskId: 3  Directory: FILE:/data11/hadoop/hdfs/data
  DiskId: 4  Directory: FILE:/data12/hadoop/hdfs/data
  DiskId: 5  Directory: FILE:/data13/hadoop/hdfs/data
  DiskId: 6  Directory: FILE:/data14/hadoop/hdfs/data
  DiskId: 7  Directory: FILE:/data15/hadoop/hdfs/data
  DiskId: 8  Directory: FILE:/data16/hadoop/hdfs/data
  DiskId: 9  Directory: FILE:/data17/hadoop/hdfs/data
  DiskId: 10  Directory: FILE:/data18/hadoop/hdfs/data
  DiskId: 11  Directory: FILE:/data19/hadoop/hdfs/data
  DiskId: 12  Directory: FILE:/data2/hadoop/hdfs/data
  DiskId: 13  Directory: FILE:/data20/hadoop/hdfs/data
  DiskId: 14  Directory: FILE:/data21/hadoop/hdfs/data
  DiskId: 15  Directory: FILE:/data22/hadoop/hdfs/data
  DiskId: 16  Directory: FILE:/data23/hadoop/hdfs/data
  DiskId: 17  Directory: FILE:/data24/hadoop/hdfs/data
  DiskId: 18  Directory: FILE:/data25/hadoop/hdfs/data
  DiskId: 19  Directory: FILE:/data26/hadoop/hdfs/data
  DiskId: 20  Directory: FILE:/data27/hadoop/hdfs/data
  DiskId: 21  Directory: FILE:/data28/hadoop/hdfs/data
  DiskId: 22  Directory: FILE:/data29/hadoop/hdfs/data
  DiskId: 23  Directory: FILE:/data3/hadoop/hdfs/data
  DiskId: 24  Directory: FILE:/data30/hadoop/hdfs/data
  DiskId: 25  Directory: FILE:/data31/hadoop/hdfs/data
  DiskId: 26  Directory: FILE:/data32/hadoop/hdfs/data
  DiskId: 27  Directory: FILE:/data33/hadoop/hdfs/data
  DiskId: 28  Directory: FILE:/data34/hadoop/hdfs/data
  DiskId: 29  Directory: FILE:/data35/hadoop/hdfs/data
  DiskId: 30  Directory: FILE:/data36/hadoop/hdfs/data
  DiskId: 31  Directory: FILE:/data37/hadoop/hdfs/data
  DiskId: 32  Directory: FILE:/data38/hadoop/hdfs/data
  DiskId: 33  Directory: FILE:/data39/hadoop/hdfs/data
  DiskId: 34  Directory: FILE:/data4/hadoop/hdfs/data
  DiskId: 35  Directory: FILE:/data40/hadoop/hdfs/data
  DiskId: 36  Directory: FILE:/data41/hadoop/hdfs/data
  DiskId: 37  Directory: FILE:/data42/hadoop/hdfs/data
  DiskId: 38  Directory: FILE:/data43/hadoop/hdfs/data

16/03/17 17:41:57 WARN DistributedFileSystemMetadata: list of data nodes could not be got from API (requires higher privileges).
16/03/17 17:41:57 WARN DistributedFileSystemMetadata: getting data node list from configuration file (may contain data nodes which are not active).

 === Distribution across nodes and disks ===

DiskId                   0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 
                         0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 Unknown   Count     Average
Host
machine-1.cern.ch        0 = 0 = 0 = = = = 0 0 = 0 = 0 = = 0 = = + = = = + + = = = = = 0 = = + = = = = 0         77        1
machine-2.cern.ch        0 = + = = = 0 = = 0 0 = 0 = = = 0 = 0 = = = = 0 = = = 0 = 0 = 0 + = = = = = = 0         69        1
machine-3.cern.ch        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0         0         0
machine-4.cern.ch        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0         0         0
machine-5.cern.ch        0 = = = = = 0 = = = 0 0 0 = 0 = = 0 0 = = 0 = 0 0 0 = 0 = 0 = 0 0 0 0 = 0 0 0 0         29        0
machine-6.cern.ch        0 = = 0 = 0 0 = 0 = 0 = 0 = 0 0 = = = 0 0 = 0 = = + + = = = = = = 0 0 = = = 0 0         39        0
machine-7.cern.ch        0 = = = 0 = = = = 0 = = = = = 0 + + = = = = = = 0 = = = = = = = + = = = = 0 = 0         106       2
machine-8.cern.ch        0 = = 0 = 0 0 = = 0 0 = 0 = = = = = 0 = = + = = + = = = = = = = 0 = 0 = = 0 = 0         66        1
machine-9.cern.ch        0 0 = 0 = = 0 = = 0 = = = 0 = 0 = 0 = = = = 0 0 = 0 = = = = = 0 0 = 0 = = 0 0 0         33        0
machine-10.cern.ch       0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0         0         0
machine-11.cern.ch       0 0 = = = = = = = 0 = = 0 = = = = 0 = 0 = = = 0 = = = 0 = = = = = 0 0 = 0 = 0 0         56        1
machine-12.cern.ch       0 = 0 0 = 0 0 0 = = = 0 0 = = 0 = = = 0 = = = 0 = 0 0 = = = 0 0 0 0 0 0 = = 0 0         37        0
machine-13.cern.ch       0 0 0 = 0 = 0 = = 0 0 0 = 0 = = = = = 0 = = = = 0 = = = = 0 = = = 0 = = = = 0 0         68        1
machine-14.cern.ch       0 0 = 0 0 0 0 0 = 0 = 0 0 = = 0 = 0 = = = 0 = 0 = = = 0 = 0 0 0 0 0 0 0 = 0 0 0         26        0
machine-15.cern.ch       0 = = = = = 0 0 0 = = 0 = = 0 0 = 0 + = = = = = = = = 0 = = = = = 0 0 = = = = 0         61        1
machine-16.cern.ch       0 = = = = = 0 = = = 0 0 = 0 = = 0 = = 0 0 = = 0 = + = = = = = 0 = = = = = = 0 0         64        1
machine-17.cern.ch       0 0 = = 0 0 + = = 0 0 = = 0 0 = 0 0 = = = 0 = 0 0 = = 0 = = = = 0 0 = = = 0 = 0         42        0

Legend
  0: no blocks in this disk
  +: #blocks is more than 20% of the average of blocks per disk of this host
  =: #blocks is aproximatelly the average of blocks per disk of this host
  -: #blocks is less than 20% of the average of blocks per disk of this host

 === Metadata of blocks and replicas ===

Block (0) info:
    Offset: 0
    Length: 268435456
    No cached hosts
    Hosts:
        Replica (0):
            Host: machine-11.cern.ch
            Location: FILE:/data30/hadoop/hdfs/data (DiskId: 24)
            Name: 12.12.20.26:1004
            TopologyPaths: /0513_R-0050/RL05/12.12.20.26:1004
        Replica (1):
            Host: machine-5.cern.ch
            Location: FILE:/data2/hadoop/hdfs/data (DiskId: 12)
            Name: 12.2.10.32:1004
            TopologyPaths: /0513_R-0050/RL17/12.2.10.32:1004
        Replica (2):
            Host: machine-7.cern.ch
            Location: FILE:/data34/hadoop/hdfs/data (DiskId: 28)
            Name: 12.12.21.2:1004
            TopologyPaths: /0513_R-0050/RL17/12.12.21.2:1004

Block (1) info:
    Offset: 268435456
    Length: 268435456
    No cached hosts
    Hosts:
        Replica (0):
            Host: machine-2.cern.ch
            Location: FILE:/data19/hadoop/hdfs/data (DiskId: 11)
            Name: 12.12.2.26:1004
            TopologyPaths: /0513_R-0050/RL05/12.12.2.26:1004
        Replica (1):
            Host: machine-8.cern.ch
            Location: FILE:/data7/hadoop/hdfs/data (DiskId: 46)
            Name: 12.14.10.2:1004
            TopologyPaths: /0513_R-0050/RL21/12.14.10.2:1004
        Replica (2):
            Host: machine-16.cern.ch
            Location: FILE:/data25/hadoop/hdfs/data (DiskId: 18)
            Name: 12.12.0.25:1004
            TopologyPaths: /0513_R-0050/RL21/12.12.0.25:1004

16/03/17 17:41:57 WARN DistributedFileSystemMetadata: Not showing more blocks because limit has been reached