Hadoop – Part 4

I have been configuring Hadoop nodes at work with the downloaded tar ball from Apache. I have even created a basic RPM from it so that it is easy to install with Puppet on the CentOS servers.

I simply install Hadoop in the /opt directory by untarring it and creating a symlink for /opt/hadoop that point to /opt/hadoop-2.4.1.

I added a few lines to my .bashrc user that runs Hadoop:

export JAVA_HOME=/usr/lib/jvm/jre
export HADOOP_INSTALL=/opt/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native

export PATH=$PATH:$HADOOP_INSTALL/sbin:$HADOOP_INSTALL/bin:$JAVA_HOME/bin

I have also configured some Hadoop configuration file in /opt/hadoop/etc/hadoop:

mapred-site.xml


 

<?xml version=”1.0″?>
<configuration>

        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
</configuration>


 

 

yarn-site.xml


<?xml version=”1.0″?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
        <property>
<name>yarn.resourcemanager.address</name>
<value>127.0.0.1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>127.0.0.1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>127.0.0.1:8031</value>
</property>
</configuration>

core-site.xml

<?xml version=”1.0″ encoding=”UTF-8″?>

<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

hdfs-site.xml

<?xml version=”1.0″ encoding=”UTF-8″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///data/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///data/datanode</value>
</property>
</configuration>

In the hadoop-env.sh file I changed one line:
export HADOOP_LOG_DIR=/data/log
In the yarn-env.sh file I changed a similar line:
YARN_LOG_DIR=”/dfs_data/log”
After all the configuration was in place I formatted the namenode:
hdfs namednode -format
Started the basic deamons:
start-dfs.sh
start-yarn.sh
Next I will install Apache Spark and start creating jobs to manipulate the data in HDFS.
Advertisements

Published by

m5c

Java developper that loves photography and good coffee

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s