Hello Visitor,
First of all, thank you for visiting my blog.
I've decided to learn Big Data (August 2013) and would like to post my learning or search results in this blog to capture for myself. I'm posting my learning or notes here.
Special Thanks to Swamy Gurram for the Big Data Knowledge sharing and motivation as always.
First link to start Hadoop http://hadoop.apache.org
Single Node Setup: http://hadoop.apache.org/docs/stable/single_node_setup.html
HDFS layer of Hadoop Video: http://www.youtube.com/watch?v=ziqx2hJY8Hg
Daemon: is a computer program that runs as a background process. It is similar to services in Windows OS. More details are http://en.wikipedia.org/wiki/Daemon_(computing)
A simple definition of HDFS: HDFS is the primary distributed storage used by Hadoop applications. A HDFS cluster primarily consists of a NameNode that manages the file system metadata and DataNodes that store the actual data.
Install Hadoop on Ubuntu Linux (Single-Node Cluster):
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
http://www.youtube.com/watch?v=WN2tJk_oL6E
https://www.dropbox.com/s/05aurcp42asuktp/Chiu%20Hadoop%20Pig%20Install%20Instructions.docx
So, I'm able to complete installation within 4 hours (2-3 hours for Linux and 1 hour for Hadoop and Pig).
Installing Eclipse Juno 4.2 in Ubuntu 12.04 or in Ubuntu 12.10
http://akovid.blogspot.co.uk/2012/08/installing-eclipse-juno-42-in-ubuntu.html
Hadoop MapReduce Fundamentals (Highly technical)
1 of 5: http://www.youtube.com/watch?v=7FcMhTTG1Cs (30 mins)
2 of 5: http://www.youtube.com/watch?v=pDGLe4CsrhY (1 hr)
3 of 5: http://www.youtube.com/watch?v=9h_WLsmRfFM (1 hr) - Windows Azure bases
4 of 5: http://www.youtube.com/watch?v=iiIDZTpdcuU (1 hr)
5 of 5: http://www.youtube.com/watch?v=1aen3JsxkuM (20 mins)
Data-Driven Documents - http://d3js.org/ Great graphical representation of data.
Q: Hadoop fs and Hadoop dfs command difference?
A: fs command represents both OS file system and Hadoop file system. dfs command represents only Hadoop file system
Note: The best way to optimize MapReduce is simply to add more nodes (this is whole idea behind the Hadoop)
http://www.youtube.com/watch?v=ba3qqJI6ML4 (ESXi, VSphere Clinet and Virtual OS with explanation) - 40 mins.
http://www.youtube.com/watch?v=ZBl1Tf2A4lA (Just ESXi and VSphere Client) - 10 mins.
Java JDK 7 Installation:
Eclipse Juno Installation:
http://akovid.blogspot.co.uk/2012/08/installing-eclipse-juno-42-in-ubuntu.html
Hive / HCatalog Installation:
$ java -version
$ hadoop version
$ wget http://www.gtlib.gatech.edu/pub/apache/hive/stable/hive-0.11.0-bin.tar.gz
$ jps
$ hadoop fs -mkdir /tmp
$ hadoop fs -mkdir /user/hive/warehouse
$ hadoop fs -chmod g+w /tmp
$ hadoop fs -chmod g+w /user/hive/warehouse
$ tar -xzvf hive-0.11.0-bin.tar.gz
$ mv hive-0.11.0-bin hive
$ cd hive
$ pwd
$ export HIVE_HOME=/home/hduser/hive
$ export PATH=$HIVE_HOME/bin:$PATH
$ hive
Problems during hadoop learning
# Add Java bin/ directory to PATH
export PATH=$PATH:$JAVA_HOME/bin
And restart Ubuntu
2. WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock (FSNamesystem.java:1639)
Solution: HDFS format is not completed. It might have ended up having message as aborted with message as /app/hadoop/tmp/dfs/name cannot delete, etc. Please make sure you use Y (not small 'y') when you say Yes for HDFS format during the installation.
3. Issues during cloudera Hive / Impala execution: (Error: cause=Permission denied: user=root, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x).
Sol: use hdfs as super user and change permission on HDFS as below
sudo -u hdfs hadoop fs -chown cloudera:root /user/root
Note: hdfs is not having any password (blank password).
Refer below link too ...
http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201308.mbox/%3CCAORpBsgJG1d=FOoYpCxEvKo+Z+yWi5UPrjcbT5y50Q6Heffq9w@mail.gmail.com%3E
Online Courses
1. Hadoop Fundamentals
My first baby step towards BigData. I've finished Hadoop Fundamentals in BigDataUniversity site with 13/20 marks :). It has very basic topics on HDFS, Pig, Hive, JAQL, Hadoop Administration and Flume. The course is sweet and short along with lab practice videos.
http://bigdatauniversity.com/courses/course/view.php?id=516
First of all, thank you for visiting my blog.
I've decided to learn Big Data (August 2013) and would like to post my learning or search results in this blog to capture for myself. I'm posting my learning or notes here.
Special Thanks to Swamy Gurram for the Big Data Knowledge sharing and motivation as always.
First link to start Hadoop http://hadoop.apache.org
Single Node Setup: http://hadoop.apache.org/docs/stable/single_node_setup.html
HDFS layer of Hadoop Video: http://www.youtube.com/watch?v=ziqx2hJY8Hg
Daemon: is a computer program that runs as a background process. It is similar to services in Windows OS. More details are http://en.wikipedia.org/wiki/Daemon_(computing)
A simple definition of HDFS: HDFS is the primary distributed storage used by Hadoop applications. A HDFS cluster primarily consists of a NameNode that manages the file system metadata and DataNodes that store the actual data.
Install Hadoop on Ubuntu Linux (Single-Node Cluster):
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
http://www.youtube.com/watch?v=WN2tJk_oL6E
https://www.dropbox.com/s/05aurcp42asuktp/Chiu%20Hadoop%20Pig%20Install%20Instructions.docx
So, I'm able to complete installation within 4 hours (2-3 hours for Linux and 1 hour for Hadoop and Pig).
Installing Eclipse Juno 4.2 in Ubuntu 12.04 or in Ubuntu 12.10
http://akovid.blogspot.co.uk/2012/08/installing-eclipse-juno-42-in-ubuntu.html
Hadoop MapReduce Fundamentals (Highly technical)
1 of 5: http://www.youtube.com/watch?v=7FcMhTTG1Cs (30 mins)
2 of 5: http://www.youtube.com/watch?v=pDGLe4CsrhY (1 hr)
3 of 5: http://www.youtube.com/watch?v=9h_WLsmRfFM (1 hr) - Windows Azure bases
4 of 5: http://www.youtube.com/watch?v=iiIDZTpdcuU (1 hr)
5 of 5: http://www.youtube.com/watch?v=1aen3JsxkuM (20 mins)
Data-Driven Documents - http://d3js.org/ Great graphical representation of data.
Q: Hadoop fs and Hadoop dfs command difference?
A: fs command represents both OS file system and Hadoop file system. dfs command represents only Hadoop file system
Note: The best way to optimize MapReduce is simply to add more nodes (this is whole idea behind the Hadoop)
Installations
VMWare ESXi Virtualizationhttp://www.youtube.com/watch?v=ba3qqJI6ML4 (ESXi, VSphere Clinet and Virtual OS with explanation) - 40 mins.
http://www.youtube.com/watch?v=ZBl1Tf2A4lA (Just ESXi and VSphere Client) - 10 mins.
Java JDK 7 Installation:
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java7-installer
java -version
TomCat7 Installation:
sudo
apt-get update
sudo
apt-get
install
tomcat7
sudo
service tomcat7 stop
Set
JAVA_HOME as below by editing the tomcat7 default start-up configuration file
sudo
nano
/etc/default/tomcat7
JAVA_HOME=
/usr/lib/jvm/jdk1
.7.0_09 (or the available Java JDK path in your Linux)
for example, JAVA_HOME=/usr/lib/jvm/java-7-oracle (in case of Oracle Java)
sudo
service tomcat7 start
/usr/share/tomcat7/bin/version
.sh
wget localhost:8080
Ref: http://hendrelouw73.wordpress.com/2012/11/14/how-to-install-apache-tomcat-7-0-30-on-ubuntu-12-10-linux/
Hadoop Multi-Node-Cluster Instillation (Thanks to Michael G. Noll - You made it so easy)
Installation of hadoop-on-ubuntu-linux-multi-node-cluster
Eclipse Juno Installation:
http://akovid.blogspot.co.uk/2012/08/installing-eclipse-juno-42-in-ubuntu.html
Hive / HCatalog Installation:
$ java -version
$ hadoop version
$ wget http://www.gtlib.gatech.edu/pub/apache/hive/stable/hive-0.11.0-bin.tar.gz
$ jps
$ hadoop fs -mkdir /tmp
$ hadoop fs -mkdir /user/hive/warehouse
$ hadoop fs -chmod g+w /tmp
$ hadoop fs -chmod g+w /user/hive/warehouse
$ tar -xzvf hive-0.11.0-bin.tar.gz
$ mv hive-0.11.0-bin hive
$ cd hive
$ pwd
$ export HIVE_HOME=/home/hduser/hive
$ export PATH=$HIVE_HOME/bin:$PATH
$ hive
Problems during hadoop learning
1. Hadoop: Cannot use Jps command
Alt + F2, gksudo gedit .bashrc
Alt + F2, gksudo gedit .bashrc
export PATH=$PATH:$JAVA_HOME/bin
And restart Ubuntu
2. WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock (FSNamesystem.java:1639)
Solution: HDFS format is not completed. It might have ended up having message as aborted with message as /app/hadoop/tmp/dfs/name cannot delete, etc. Please make sure you use Y (not small 'y') when you say Yes for HDFS format during the installation.
3. Issues during cloudera Hive / Impala execution: (Error: cause=Permission denied: user=root, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x).
Sol: use hdfs as super user and change permission on HDFS as below
sudo -u hdfs hadoop fs -chown cloudera:root /user/root
Note: hdfs is not having any password (blank password).
Refer below link too ...
http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201308.mbox/%3CCAORpBsgJG1d=FOoYpCxEvKo+Z+yWi5UPrjcbT5y50Q6Heffq9w@mail.gmail.com%3E
Online Courses
1. Hadoop Fundamentals
My first baby step towards BigData. I've finished Hadoop Fundamentals in BigDataUniversity site with 13/20 marks :). It has very basic topics on HDFS, Pig, Hive, JAQL, Hadoop Administration and Flume. The course is sweet and short along with lab practice videos.
http://bigdatauniversity.com/courses/course/view.php?id=516
Very Impressive Data Science tutorial. The content seems to be pretty exhaustive and excellent and will definitely help in learning Data Science course. I'm also a learner taken up Data Science training and I think your content has cleared some concepts of mine. While browsing for Data Science tutorials on YouTube i found this fantastic video on Data Science. Do check it out if you are interested to know more.:-https://www.youtube.com/watch?v=1jMR4cHBwZE
ReplyDeleteVery Impressive Big Data Hadoop tutorial. The content seems to be pretty exhaustive and excellent and will definitely help in learning Big Data Hadoop course. I'm also a learner taken up Big Data Hadoop Tutorial and I think your content has cleared some concepts of mine. While browsing for Hadoop tutorials on YouTube i found this fantastic video on Big Data Hadoop Tutorial.Do check it out if you are interested to know more.https://www.youtube.com/watch?v=nuPp-TiEeeQ&
ReplyDelete