Hadoop3.3.1 Pseudo Distributed Mode on Ubuntu 18.04
The operating system in this tutorial is Ubuntu18.04.6. The steps to install Ubuntu18.04 is omitted.
1. Create user hadoop
Open the terminal and type in command below to create new user:
sudo useradd -m hadoop -s /bin/bash
This command creates a log-in user hadoop and uses /bin/bash as shell.
Set up password for user hadoop:
sudo passwd hadoop
Give sudo permission to user hadoop:
sudo adduser hadoop sudo
Switch Linux login user (via Ubuntu UI) to hadoop to process steps below.
upgrade apt
sudo apt-get update
install vim
sudo apt-get install vim
install ssh, set up ssh none-key login
sudo apt-get install openssh-server
login localhost
ssh localhost
exit localhost
exit
authorize the key
cd ~/.ssh/
ssh-keygen -t rsa
cat ./id_rsa.pub >> ./authorized_keys
2. Install Java
sudo apt-get install openjdk-8-jre openjdk-8-jdk
change environment variables
cd ~
vim ~/.bashrc
add details below to it
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
save changes, exit vim and refresh .bashrc
source ~/.bashrc
check java installing successfully
java -version
If details are shown on the screen, the installation is successful.
3. Install Hadoop3.3.1
change directory
cd /usr/local
download hadoop3.3.1
sudo wget <http://dlcdn.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz>
unzip and change filename, ownership
tar xzf hadoop-3.3.1.tar.gz
mv hadoop-3.3.1 hadoop
sudo chown -R hadoop:hadoop ./hadoop
4. Configure pseudo distributed mode
cd ~
vim ~/.bashrc
add details below to .bashrc
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:${HADOOP_HOME}/bin
export PATH=$PATH:${HADOOP_HOME}/sbin
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
refresh to activate new environment variables
source ~/.bashrc
Configure several files in path :/usr/local/hadoop/etc/hadoop/
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
- core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
- hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/data</value>
</property>
</configuration>
- mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
</property>
</configuration>
- yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
format NameNode:
hdfs namenode -format
start hdfs
start-dfs.sh
start yarn
start-yarn.sh
to certify that dfs and yarn started successfully:
jps
if it looks like below (all parts should show up), previous steps are successful
$jps
2961 ResourceManager
2482 DataNode
3077 NodeManager
2366 NameNode
2686 SecondaryNameNode
3199 Jps
if namenode doesn't show up, you need tou format the namenode. Please make sure there is no important files in HDFS, beacause everything will be cleared.
cd /usr/local/hadoop
./sbin/stop-dfs.sh # stop HDFS
rm -r ./tmp # note: this will clear all files in HDFS
./bin/hdfs namenode -format # format namenode
./sbin/start-dfs.sh # restart
5. Job Test
try to run an example-jar file
cd usr/local/hadoop/share/hadoop/mapreduce/
hadoop jar ./hadoop-mapreduce-examples-3.3.1.jar pi 5 10
open browser to see web UI pages
urls:
localhost:9870
localhost:8088
You can also access the web page on other PC. First, find the IP address of the ubuntu.
ifconfig -a
The information showed up gives ip address. You can access the web page on other PC's browsers.
${IP ADDRESS}:9870
${IP ADDRESS}:8088
note: the job history port is 19888
6. Shut down
use command separately :
stop-dfs.sh
stop-yarn.sh
Or stop them together:
stop-all.sh