Tools for Using Hadoop with OneFS
isilon_create_users
creates identities needed by Hadoop distributions compatible with OneFS.isilon_create_directories
creates a directory structure with appropriate ownership and permissions in HDFS on OneFS.Isilon Hadoop Tools (IHT) currently requires Python 3.5+ and supports OneFS 8+.
pipx
to install IHT.pip
to install IHT in a virtual environment.--help
can be used with any IHT script to see extended usage information.To use IHT, you will need the following:
$onefs
, an IP address, hostname, or SmartConnect name associated with the OneFS System zone
isilon_create_directories
.$iht_user
, a OneFS System zone user with the following privileges:
ISI_PRIV_LOGIN_PAPI
ISI_PRIV_AUTH
ISI_PRIV_HDFS
ISI_PRIV_IFS_BACKUP
(only needed by isilon_create_directories
)ISI_PRIV_IFS_RESTORE
(only needed by isilon_create_directories
)$zone
, the name of the access zone on OneFS that will host HDFS
$dist
, the distribution of Hadoop that will be deployed with OneFS (e.g. CDH, HDP, etc.)$cluster_name
, the name of the Hadoop clusterOneFS ships with a self-signed SSL/TLS certificate by default, and such a certificate will not be verifiable by any well-known certificate authority. If you encounter CERTIFICATE_VERIFY_FAILED
errors while using IHT, it may be because OneFS is still using the default certificate. To remedy the issue, consider encouraging your OneFS administrator to install a verifiable certificate instead. Alternatively, you may choose to skip certificate verification by using the --no-verify
option, but do so at your own risk!
Note: This is not meant to be a complete guide to setting up Hadoop with OneFS. If you stumbled upon this page or have not otherwise consulted the appropriate install guide for your distribution, please do so at https://community.emc.com/docs/DOC-61379.
There are 2 tools in IHT that are meant to assist with the setup of OneFS as HDFS for a Hadoop cluster:
isilon_create_users
, which creates users and groups that must exist on all hosts in the Hadoop cluster, including OneFSisilon_create_directories
, which sets the correct ownership and permissions on directories in HDFS on OneFSThese tools must be used in order since a user/group must exist before it can own a directory.
isilon_create_users
Using the information from above, an invocation of isilon_create_users
could look like this:
isilon_create_users --dry \
--onefs-user "$iht_user" \
--zone "$zone" \
--dist "$dist" \
--append-cluster-name "$cluster_name" \
"$onefs"
--dry
causes the script to log without executing. Use it to ensure the script will do what you intend before actually doing it.If anything goes wrong (e.g. the script stopped because you forgot to give $iht_user
the ISI_PRIV_HDFS
privilege), you can safely rerun with the same options. IHT should figure out that some of its job has been done already and work with what it finds.
After running isilon_create_users
, you will find a new file in $PWD
named like so:
$unix_timestamp-$zone-$dist-$cluster_name.sh
This script should be copied to and run on all the other hosts in the Hadoop cluster (excluding OneFS).
It will create the same users/groups with the same UIDs/GIDs and memberships as on OneFS using LSB utilities such as groupadd
, useradd
, and usermod
.
isilon_create_directories
Using the information from above, an invocation of isilon_create_directories
could look like this:
isilon_create_directories --dry \
--onefs-user "$iht_user" \
--zone "$zone" \
--dist "$dist" \
--append-cluster-name "$cluster_name" \
"$onefs"
--dry
causes the script to log without executing. Use it to ensure the script will do what you intend before actually doing it.If anything goes wrong (e.g. the script stopped because you forgot to run isilon_create_users
first), you can safely rerun with the same options. IHT should figure out that some of its job has been done already and work with what it finds.
See the Contributing Guidelines for information on project development.