Trying to find simple and authorative documentation for Hadoop is harder than I expected. With the many versions out there it is easy to find documentation for the wrong version and not being able to find what really needs to be done.
Versions:
- Hadoop 2.2.0
- OpenJDK 1.7.0_51
- Fedora 19
I have set my environment variables in my .bash_profile:
export JAVA_HOME=/usr/lib/jvm/java export HADOOP_HOME=/opt/hadoop-2.2.0 export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin
Configuration file:
$HADOOP_HOME/etc/hadoop/core-site.xml
<!--?xml version="1.0" encoding="UTF-8"?--> <configuration> <property> <name>hadoop.tmp.dir</name> <value>/tmp/hadoop-${user.name}</value> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:54310</value> </property> <property> <name>mapred.job.tracker</name> <value>hdfs://localhost:54311</value> </property> <property> <name>dfs.replication</name> <value>8 </property> <property> <name>mapred.child.java.opts</name> <value>-Xmx512m</value> </property> </configuration>
First few Hadoop commands:
hadoop namenode -format
hadoop namenode
Things to resolve:
$HADOOP_HOME/sbin/start-all.sh – does not work at all; throws a lot of errors