Hadoop

Trying to find simple and authorative documentation for Hadoop is harder than I expected. With the many versions out there it is easy to find documentation for the wrong version and not being able to find what really needs to be done.

Versions:

  • Hadoop 2.6.3
  • OpenJDK 1.8.0_66
  • Fedora 23

I installed the generic Hadoop package from Apache on Fedora in the /opt/hadoop directory and configured it.

Configuration files (/opt/hadoop/etc/hadoop/) and their content:

mapred-site.xml

<?xml version="1.0"?>
 <configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
</configuration>

yarn-site.xml

<?xml version="1.0"?>
<configuration>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>127.0.0.1:8032</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>127.0.0.1:8030</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>127.0.0.1:8031</value>
  </property>
</configuration>
core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
  </property>
</configuration>
hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
  <property>
    <name>dfs.name.dir</name>
    <value>file:///data/namenode</value>
  </property>
  <property>
    <name>dfs.data.dir</name>
    <value>file:///data/datanode</value>
  </property>
</configuration>

In the hadoop-env.sh file I changed one line:
export HADOOP_LOG_DIR=/data/log
In the yarn-env.sh file I changed a similar line:
YARN_LOG_DIR="/data/log"
I needed to format the namenode the first time:
hdfs namenode -format

I could start the basics deamons:

  1. start-dfs.sh
  2. start-yarn.sh

References:

Hadoop 2.2 Single Node Installation on CentOS 6.5

How to disable brp-java-repack-jars during RPM build

rpmbuild: disable automatic dependency analysis

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s