Here I will show you how to install Hadoop 2.6 in Linux Mint.
I am using,
Tools - Hadoop 2.6.0, Java (JDK 1.7)
OS - Linux Mint 17 Qiana
System Update & Install Java
And check Java is installed or not, for avoiding problems when programming make sure JDK 1.7 or later for Hadoop 2.6 or later.
To check Java version,
arul@aj-pc ~ $ java -version
java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) Server VM (build 24.80-b11, mixed mode)
If not installed use the below command to install Java,
$ sudo apt-add-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java7-installer
Add Hadoop user,
arul@aj-pc ~ $ sudo addgroup hadoop
Adding user `hduser' ...
Adding new user `hduser' (1001) with group `hadoop' ...
Creating home directory `/home/hduser' ...
Copying files from `/etc/skel' ...
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for hduser
Enter the new value, or press ENTER for the default
Full Name []:
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n] y
Install SSH,
arul@aj-pc ~ $ sudo apt-get install ssh
To check it(properly installed),
arul@aj-pc ~ $ which ssh
/usr/bin/ssh
arul@aj-pc ~ $ which sshd /usr/sbin/sshd
Setup SSH,
Hadoop requires SSH access. We are going to set-up single node hadoop cluster now, so we need to configure SSH access to localhost.
To do this, need switch to our Hadoop user (i.e hduser),
arul@aj-pc ~ $ su hduser
Password:
hduser@aj-pc /home/arul $
Now set-up ssh by,
hduser@aj-pc /home/arul $ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hduser/.ssh/id_rsa):
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
The key fingerprint is:
92:73:3d:fa:67:bb:2a:ee:11:00:f4:f1:92:e6:25:8b hduser@aj-pc
The key's randomart image is:
+--[ RSA 2048]----+
| .o . |
| o + |
| B o |
| + B . |
| E * S o |
| + o . |
| o |
| .o o |
| oo.o+oo |
+-----------------+
hduser@aj-pc /home/arul $ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
We can check is it SSH working by,
hduser@aj-pc /home/arul $ ssh localhost hduser@localhost's password: Welcome to Linux Mint 17 Qiana (GNU/Linux 3.13.0-24-generic i686) Welcome to Linux Mint * Documentation: http://www.linuxmint.com 55 packages can be updated. 0 updates are security updates. Last login: Tue Sep 22 14:35:43 2015 from localhost hduser@aj-pc ~ $
Install Hadoop
First need to add our Hadoop user(i.e hduser) to sudo,
hduser@aj-pc ~ $ su arul
Password:
arul@aj-pc /home/hduser $ sudo adduser hduser sudo
[sudo] password for arul:
Adding user `hduser' to group `sudo' ...
Adding user hduser to group sudo
Done.
arul@aj-pc /home/hduser $
back to hduser,
arul@aj-pc /home/hduser $ su hduser
Password:
hduser@aj-pc ~ $
Now download and install Hadoop,
hduser@aj-pc ~ $ wget http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
--2015-09-22 14:36:03-- http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
Resolving mirrors.sonic.net (mirrors.sonic.net)... 69.12.162.27
Connecting to mirrors.sonic.net (mirrors.sonic.net)|69.12.162.27|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 195257604 (186M) [application/x-gzip]
Saving to: ‘hadoop-2.6.0.tar.gz’
100%[========================================================================================================>] 19,52,57,604 17.6KB/s in 93m 7s
2015-09-22 16:09:11 (34.1 KB/s) - ‘hadoop-2.6.0.tar.gz’ saved [195257604/195257604]
hduser@aj-pc ~ $ tar xvzf hadoop-2.6.0.tar.gz
Create folder name in directory "/usr/local"
arul@aj-pc ~ $ sudo mkdir /usr/local/hadoop
[sudo] password for arul:
arul@aj-pc ~ $
Next move extracted data to /usr/local/hadoop,
hduser@aj-pc ~/hadoop-2.6.0 $ sudo mv * /usr/local/hadoop
[sudo] password for hduser:
hduser@aj-pc ~/hadoop-2.6.0 $ sudo chown -R hduser:hadoop /usr/local/hadoop
hduser@aj-pc ~/hadoop-2.6.0 $
Configuration Files
Need to modify some files to complete Hadoop setup,
1. ~/.bashrc:
Need to append the following to the end of ~/.bashrc:
hduser@aj-pc ~ $ nano ~/.bashrc
add the following,
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386 [Java path of your system]
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
2. hadoop-env.sh
hduser@aj-pc ~ $ nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh
edit the following line by,
# The java implementation to use.
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
3. core-site.xml
hduser@aj-pc ~ $ nano /usr/local/hadoop/etc/hadoop/core-site.xml
the file will be,
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
</configuration>
add the following between <configuration></configuration>,
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
the edited file be like,
Now exit by saving.
4. mapred-site.xml
By default, hadoop folder contains "mapred-site.xml.template" which need to copied with the name "mapred-site.xml".
hduser@aj-pc ~ $ cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
Next need to add the following lines,
hduser@aj-pc ~ $ nano /usr/local/hadoop/etc/hadoop/mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
5. hdfs-site.xml
hduser@aj-pc ~ $ nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
Format the New Hadoop Filesystem
To start, the Hadoop file system needs to be formatted.
Do the following,
hduser@aj-pc ~ $ hadoop namenode -format
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
15/10/06 16:02:08 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = aj-pc/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.6.0
STARTUP_MSG: classpath = /usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/******
************************************************************/
15/10/06 16:02:08 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
15/10/06 16:02:08 INFO namenode.NameNode: createNameNode [-format]
15/10/06 16:02:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Formatting using clusterid: CID-5b597abd-707c-4f45-974d-1e526533d599
15/10/06 16:02:12 INFO namenode.FSNamesystem: No KeyProvider found.
15/10/06 16:02:12 INFO namenode.FSNamesystem: fsLock is fair:true
15/10/06 16:02:12 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
15/10/06 16:02:12 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
15/10/06 16:02:12 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
15/10/06 16:02:12 INFO blockmanagement.BlockManager: The block deletion will start around 2015 Oct 06 16:02:12
15/10/06 16:02:12 INFO util.GSet: Computing capacity for map BlocksMap
15/10/06 16:02:12 INFO util.GSet: VM type = 32-bit
15/10/06 16:02:12 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
15/10/06 16:02:12 INFO util.GSet: capacity = 2^22 = 4194304 entries
15/10/06 16:02:12 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
15/10/06 16:02:12 INFO blockmanagement.BlockManager: defaultReplication = 1
15/10/06 16:02:12 INFO blockmanagement.BlockManager: maxReplication = 512
15/10/06 16:02:12 INFO blockmanagement.BlockManager: minReplication = 1
15/10/06 16:02:12 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
15/10/06 16:02:12 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks = false
15/10/06 16:02:12 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
15/10/06 16:02:12 INFO blockmanagement.BlockManager: encryptDataTransfer = false
15/10/06 16:02:12 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
15/10/06 16:02:12 INFO namenode.FSNamesystem: fsOwner = hduser (auth:SIMPLE)
15/10/06 16:02:12 INFO namenode.FSNamesystem: supergroup = supergroup
15/10/06 16:02:12 INFO namenode.FSNamesystem: isPermissionEnabled = true
15/10/06 16:02:12 INFO namenode.FSNamesystem: HA Enabled: false
15/10/06 16:02:12 INFO namenode.FSNamesystem: Append Enabled: true
15/10/06 16:02:13 INFO util.GSet: Computing capacity for map INodeMap
15/10/06 16:02:13 INFO util.GSet: VM type = 32-bit
15/10/06 16:02:13 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
15/10/06 16:02:13 INFO util.GSet: capacity = 2^21 = 2097152 entries
15/10/06 16:02:13 INFO namenode.NameNode: Caching file names occuring more than 10 times
15/10/06 16:02:13 INFO util.GSet: Computing capacity for map cachedBlocks
15/10/06 16:02:13 INFO util.GSet: VM type = 32-bit
15/10/06 16:02:13 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
15/10/06 16:02:13 INFO util.GSet: capacity = 2^19 = 524288 entries
15/10/06 16:02:13 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
15/10/06 16:02:13 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
15/10/06 16:02:13 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000
15/10/06 16:02:13 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
15/10/06 16:02:13 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
15/10/06 16:02:13 INFO util.GSet: Computing capacity for map NameNodeRetryCache
15/10/06 16:02:13 INFO util.GSet: VM type = 32-bit
15/10/06 16:02:13 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
15/10/06 16:02:13 INFO util.GSet: capacity = 2^16 = 65536 entries
15/10/06 16:02:13 INFO namenode.NNConf: ACLs enabled? false
15/10/06 16:02:13 INFO namenode.NNConf: XAttrs enabled? true
15/10/06 16:02:13 INFO namenode.NNConf: Maximum size of an xattr: 16384
15/10/06 16:02:13 INFO namenode.FSImage: Allocated new BlockPoolId: BP-601894741-127.0.1.1-1444127533230
15/10/06 16:02:13 INFO common.Storage: Storage directory /usr/local/hadoop_store/hdfs/namenode has been successfully formatted.
15/10/06 16:02:13 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
15/10/06 16:02:13 INFO util.ExitUtil: Exiting with status 0
15/10/06 16:02:13 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at aj-pc/127.0.1.1
************************************************************/
hduser@aj-pc ~ $
Note : hadoop namenode -format command should be executed once before we start using Hadoop.
If executed again, will destroy all the data on the Hadoop file system.
Starting Hadoop:
hduser@aj-pc ~ $ cd /usr/local/hadoop/sbin/
hduser@aj-pc /usr/local/hadoop/sbin $ ls
distribute-exclude.sh start-all.cmd stop-balancer.sh
hadoop-daemon.sh start-all.sh stop-dfs.cmd
hadoop-daemons.sh start-balancer.sh stop-dfs.sh
hdfs-config.cmd start-dfs.cmd stop-secure-dns.sh
hdfs-config.sh start-dfs.sh stop-yarn.cmd
httpfs.sh start-secure-dns.sh stop-yarn.sh
kms.sh start-yarn.cmd yarn-daemon.sh
mr-jobhistory-daemon.sh start-yarn.sh yarn-daemons.sh
refresh-namenodes.sh stop-all.cmd
slaves.sh stop-all.sh
hduser@aj-pc /usr/local/hadoop/sbin $
Use start-all.sh to start.
hduser@aj-pc /usr/local/hadoop/sbin $ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
15/10/07 14:35:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
hduser@localhost's password:
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-aj-pc.out
hduser@localhost's password:
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-aj-pc.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is 20:0e:64:53:b7:20:34:e0:27:9d:0b:5a:20:28:eb:33.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
hduser@0.0.0.0's password:
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-secondarynamenode-aj-pc.out
15/10/07 14:36:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hduser-resourcemanager-aj-pc.out
hduser@localhost's password:
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hduser-nodemanager-aj-pc.out
hduser@aj-pc /usr/local/hadoop/sbin $
To check,
hduser@aj-pc /usr/local/hadoop/sbin $ jps
7714 NodeManager
7413 ResourceManager
7270 SecondaryNameNode
7095 DataNode
6958 NameNode
7950 Jps
hduser@aj-pc /usr/local/hadoop/sbin $
This means functional instance of Hadoop running on Virtual private server(VPS).
Stopping Hadoop
Use stop-all.sh to stop which present on sbin.
hduser@aj-pc /usr/local/hadoop/sbin $ stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
15/10/07 14:47:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Stopping namenodes on [localhost]
hduser@localhost's password:
localhost: stopping namenode
hduser@localhost's password:
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
hduser@0.0.0.0's password:
0.0.0.0: stopping secondarynamenode
15/10/07 14:47:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
stopping yarn daemons
stopping resourcemanager
hduser@localhost's password:
localhost: stopping nodemanager
no proxyserver to stop
hduser@aj-pc /usr/local/hadoop/sbin $
Hadoop Web Interfaces
Start again(if you have stopped),
hduser@aj-pc /usr/local/hadoop/sbin $ start-all.sh
Use http://localhost:50070/ in browser.
Datanode
Startup process
there is no process running.
Use http://localhost:50090 for SecondaryNameNode.
That's all about Hadoop installation, next blog will be some sample task on Hadoop.
For any queries, please comment below....