Friday, December 26, 2014

Configure HADOOP in Pseudo Distributed Mode

In this post I will illustrate the steps to configure and install Hadoop in Pseudo distributed mode.

Step 01: Verify Java is installed or not, If not install and configure.
You can verify java installation using the following command
$ java –version

On executing this command, you should see output similar to the following:
java version "1.7.0_51"
OpenJDK Runtime Environment (IcedTea 2.4.6) (7u51-2.4.6-1ubuntu4)
OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode)

If java is not installed, Use the below command to begin the installation of Java
sudo apt-get install openjdk-7-jdk

This will install the full JDK under /usr/lib/jvm/java-7- sundirectory.

Configure JAVA_HOME
Hadoop requires Java installation path to work on, for this we will be setting JAVA_HOME environment variable and this will point to our Java installation dir.

JAVA_HOME can be configured in ~/.bashrc file

Use the below command to set JAVA_HOME on Ubuntu
export JAVA_HOME=/usr/lib/jvm/java-6-sun
or
Add below statement in .bashrc file.
# set to the root of your Java installation
JAVA_HOME=/usr/lib/jvm/jdk1.7.0_60

Step 02: SSH Configuration (To login into remote PC to execute commands)

Install SSH using the command
sudo apt–get install openssh-server

Check is installed or not
$ssh localhost
If not installed – Error msg:  ssh: connect to host localhost port 22: Connection refused.

Step 03: Download & Setup Hadoop

Download the latest stable release of Apache Hadoop from Apache Download Mirrors.
http://apache.mirrors.lucidnetworks.net/hadoop/common/.

Un-tar the file to an appropriate location.
$ tar xzvf <tar-filename>.tar.gz

Use the following command to create an environment variable that points to the Hadoop installation directory (HADOOP_HOME)
export HADOOP_HOME=/home/user/Hadoop

Setup 04 : Verify the class paths

Close terminal and open new Terminal to check whether the JAVA_HOME and HADOOP_HOME path are set or not.

JAVA_HOME can be verified by command
echo $JAVA_HOME

HADOOP _HOME can be verified by command
echo $ HADOOP _HOME

Use this command to verify your Hadoop is installed or not hadoop version
The o/p should be similar to below one
Hadoop 1.2.1

Setup 05 : Configurations

1. Edit the file /conf/hadoop-env.sh to set the java home path.
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_60

2. Edit the file /conf/core-site.xml and add the following parameters
1) <fs.default.name- Points to the default URI for all FileSystem requests in Hadoops.

So if the user makes a request for a file by specifying its path only, Hadoop tries to find that path on the filesystem defined by fs.default.name. If fs.default.name is set to an an HDFS URI like
hdfs://<hostname>:<port>, then Hadoop tries to find the path on HDFS whose namenode is running at <hostname>:<port>.

Note: To get hostname using command 'hostname'.

2) <hadoop.temp.dir> -used as a temporary directory for both local file system and for HDFS.

Specify the absolute path for temporary directory

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://<host-name>:9000</value>
</property>
<property>
<name>hadoop.temp.dir</name>
<value>/home/satishkumar/HADOOP/SETUPS/temp</value>
        </property>
</configuration> 
            
3. Edit the file /conf/hdfs-site.xml and add the following parameters

<dfs.replication> - Replication Factor
<dfs.name.dir> - Set the path to persist the data of NameNode
<dfs.data.dir> - Specify the path that access by the cluster to persist the data of DataNode.

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/satishkumar/HADOOP/SETUPS/dfs/name</value>
        </property>
<property>
<name>dfs.data.dir</name>
<value>/home/satishkumar/HADOOP/SETUPS/dfs/data</value>
</property>
</configuration> 

4. Edit the file /conf/master and the /conf/slaves with hostname

5. Format Hadoop Name Node
After these changes, you will have to format the filesystem
$ bin/hadoop namenode -format

6. Start Hadoop daemons
Start NameNode daemon and DataNode daemon
$ bin/start-dfs.sh

7. Check whether all the daemons are running or not
$ jps
4720 NameNode
5160 SecondaryNameNode
4936 DataNode

Troubleshoot
Note: If your master server fails to start due to the dfs safe mode issue, execute this on the Hadoop command line:
hadoop dfsadmin -safemode leave

Also make sure to format the namenode again if you make changes to your configuration.

8. Browse the web interface for the NameNode, by default it is available at: 
http://localhost:50070/

Now you have successfully installed and configured Hadoop on a single node.


1 comment:

  1. Grand Casino in Ridgefield, Washington | Mapyro
    Grand Casino in 아산 출장안마 Ridgefield is 동두천 출장샵 located in Ridgefield at 3530 North Ridgefield 서울특별 출장마사지 Road. This casino 전라북도 출장마사지 features a large selection of table games including blackjack, 1xbet korean

    ReplyDelete