Introduction to the Big Data Cluster
In this post, we’ll introduce you to the hardware, software, and entry points of our big data cluster, “bigdata”.
The Hardware
The current bigdata cluster is a test cluster comprised of commercial hardware. To conform to Cloudera recommended configuration, three nodes are dedicated to the management of the cluster:
- abd740
- abd741
- abd742
The computational guts of bigdata are the datanodes,
- abd743
- abd744
- abd745
- abd746
- abd747
- abd748
where MapReduce jobs actually take place. Each of these nodes have one 250 GB drive reserved mainly for the OS and software, but for mounting the Hadoop Distributed File System (HDFS), each node has 2 x 4 TB hard disk drives.
The Software
Cloudera Manager is deployed on the cluster to coordinate job submission, with the following services running on each node:
- abd740
- Cloudera Manager
- Alert Publisher
- Event Server
- Host Monitor
- Service Monitor
- HBase Master
- Hive
- Hiveserver2
- Hive Gateway
- Hue Server
- Oozie Server
- Spark Gateway
- YARN (MR2 Included)
- Resource Manager
- Job History Server
- Zookeeper Server
- Cloudera Manager
- abd741
- HDFS NameNode
- Hive
- Hive Metastore Server
- Hive Gateway
- Spark Gateway
- YARN (MR2 Included) Resource Manager
- ZooKeeper Server
- abd742
- HDFS SecondaryNameNode
- Hive Gateway
- Impala
- Impala StateStore
- Impala Catalog Server
- Spark Gateway
- ZooKeeper Server
Interacting with the Cluster
Obviously, there’s a lot going on in managing the cluster, but, luckily,
Cloudera provides the Hadoop User Experience (Hue) tool to coordinate
constructing and executing jobs (not to mention browsing and managing HDFS data)
through the interface bigdata.accre.vanderbilt.edu:8888
. Users need
a valid Vanderbilt ID and password to log on, and once they’ve logged on,
should contact ACCRE
about getting permission. Once approved, users will be
able to connect to bigdata.accre.vanderbilt.edu
with ssh
.
Getting started with Hue will be the subject of it’s own future post, but most people should find it fairly intuitive.