Hadoop – Part 4

I have been configuring Hadoop nodes at work with the downloaded tar ball from Apache. I have even created a basic RPM from it so that it is easy to install with Puppet on the CentOS servers.

I simply install Hadoop in the /opt directory by untarring it and creating a symlink for /opt/hadoop that point to /opt/hadoop-2.4.1.

I added a few lines to my .bashrc user that runs Hadoop:

export JAVA_HOME=/usr/lib/jvm/jre
export HADOOP_INSTALL=/opt/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native

export PATH=$PATH:$HADOOP_INSTALL/sbin:$HADOOP_INSTALL/bin:$JAVA_HOME/bin

I have also configured some Hadoop configuration file in /opt/hadoop/etc/hadoop:

mapred-site.xml


 

<?xml version=”1.0″?>
<configuration>

        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
</configuration>


 

 

yarn-site.xml


<?xml version=”1.0″?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
        <property>
<name>yarn.resourcemanager.address</name>
<value>127.0.0.1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>127.0.0.1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>127.0.0.1:8031</value>
</property>
</configuration>

core-site.xml

<?xml version=”1.0″ encoding=”UTF-8″?>

<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

hdfs-site.xml

<?xml version=”1.0″ encoding=”UTF-8″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///data/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///data/datanode</value>
</property>
</configuration>

In the hadoop-env.sh file I changed one line:
export HADOOP_LOG_DIR=/data/log
In the yarn-env.sh file I changed a similar line:
YARN_LOG_DIR=”/dfs_data/log”
After all the configuration was in place I formatted the namenode:
hdfs namednode -format
Started the basic deamons:
start-dfs.sh
start-yarn.sh
Next I will install Apache Spark and start creating jobs to manipulate the data in HDFS.
Advertisements

Starting with Hadoop – 3

I was having too much problem with SELinux while trying to run the generic Hadoop distribution on Fedora.

I decided to install the prepackaged one with yum.

  • sudo yum install hadoop-common hadoop-common-native hadoop-mapreduce

I was then able to start the different components with:

  • sudo start-dfs.sh
  • sudo start-yarn.sh

Now time to see if I can submit jobs to Hadoop.

To copy some file into HDFS:

hadoop fs -copyFromLocal ./sample.html hdfs://localhost:8020/

Installing the Oracle JDK on Fedora

Had to learn this one from this site.

In my use case I only want to compile and run Hadoop application so I have not completed all the steps for the browser setup.

Short version:

  1. Download the JDK of your choice; I picked 1.7.0_51
  2. sudo rpm -Uvh /tmp/jdk-7u51-linux-x64.rpm
  3. sudo alternatives –install /usr/bin/java java /usr/java/latest/bin/java 200000
  4. sudo alternatives –install /usr/bin/javac javac /usr/java/latest/bin/javac 200000
  5. sudo alternatives –install /usr/bin/jar jar /usr/java/latest/bin/jar 200000
  6. sudo alternatives –config java

The last step was to activate the new installation I added. I selected option 2.

As simple as that and running java -version shows me the Oracle JVM version.

No Scala for me

This presentation certainly convinced me not to learn or start any project in Scala:

Pacific Northwest Scala 2013

I am no expert in compilers but the explanations Paul is giving makes it clear that some design decisions are making it difficult for the Scala compiler to improve or be performant.

I like his ideas on how to design things better and how to avoid the insanity of taking care of everything. He had good ideas on how to improve compilers but I think that it can easily apply on how we design software.

QConSF 2013

This year’s QConSF was my first. I was looking forward to it and I loved the experience.

I have watched a few presentations online on the youtube channel. There are many that have been eye opener and informative on what you need to think about.

As any good conference there is always too many sessions you want to assist to. No exception for QConSF. What I liked very much is that everything was recorded and they started to make them available the first day. They are very quick at making the material available so you can watch it right away and that is very impressive.

The presenters are very good since they are practitioners. It makes a big difference on the quality of the presentation but even more on what they can answer from the audience. I felt that answering questions was almost allowing them a freedom than they did not allowed themselves for the presentation. Most presentations were easy to digest but some answers got me taking notes on 2 or 3 things I would need to review to understand what they were talking about.

I also took the 2 days of training and the torture of deciding what I wanted to do was worse. I could have done a week of training instead of the 2 days I was limited to. I was really interested in the Infrastructure as Code with Jez Humble but I got tempted but the “No Lock Algo in Java” from Martin Thompson as well as the NodeJS intro.

I am already looking forward to next year to have the same difficult choices.

In the mean time I will have sessions at work to present some of the information to my co-workers because there is a lot we can all learn from the presenters experiences.

How to be an exceptional programmer

It is funny that sometimes I read an article about a totally different subject but I replace the words and it apply to my aspiration of being a programmer:

How to be an exception photographer

This article highlights the fact that you need to focus on something you like and you are good at. You keep working on it and you become better. This is the way to plan your future compared to trying to being good at everything which will fail you sooner than later.

I have read enough of these articles to know these simple principles. What they also admit in the article is that it is not always easy to find where you are and where you should go. They have a few questions that you should ask yourself and answer honestly to be able to find where you want to go.

They key to success is having a realistic plan and actioning it. Keep working at it. Wishing for results is not enough.

Spring Roo bug for beginners

I did not have this issue on my macbook air but on the iMac I faced a problem when trying to build and deploy my demo app.

I would get errors like this:

[WARNING] The POM for javax.servlet:jstl:jar:1.2 is missing, no dependency information available

Took a few Google searches to find an article that recommended to replace this section:

<dependency>
<groupId>javax.servlet</groupId>
<artifactId>jstl</artifactId>
<version>1.2</version>
</dependency>

With this:

<dependency>
<groupId>javax.servlet.jsp.jstl</groupId>
<artifactId>jstl-api</artifactId>
<version>1.2</version>
</dependency>
<dependency>
<groupId>org.glassfish.web</groupId>
<artifactId>jstl-impl</artifactId>
<version>1.2</version>
</dependency>

As soon as I saved the modified pom.xml it downloaded the dependencies and the new “perform package” produced a working app on the tc server.

Next: more advanced demo app and then something of my own.