Sunday, February 8, 2015

Lessons learned during POC

Issues / learning’s during SkyNet POC

Learning 1: BZip2Codec configuration for Hadoop Cluster

Go to Ambari / core-site.xml and add to io.compression.codecs

Learning 2: Enable Tez

Use the following instructions to enable Tez for Hive Queries:

Copy the hive-exec-0.13.0.jar to HDFS at the following location: /apps/hive/install/hive-exec-0.13.0.jar.

    $ su - hive
    $ hadoop fs -mkdir /apps/hive/install
    $ hadoop fs -copyFromLocal /usr/lib/hive/lib/hive-exec-* /apps/hive/install/hive-exec-0.13.0.jar

Enable Hive to use Tez DAG APIs. On the Hive client machine, add the following to your Hive script or execute it in the Hive shell:

    set hive.execution.engine=tez;

Disabling Tez for Hive Queries: Use the following instructions to disable Tez for Hive queries:

On the Hive client machine, add the following to your Hive script or execute it in the Hive shell:

set hive.execution.engine=mr;

Learning 3: Flume from Ambari directly

In latest Ambari, we can start Flume directly with configuration file content.

Learning 4: Flume-Ng command to start service from CLI in debug mode

$ bin/flume-ng agent --conf ./conf/ -f conf/flumeSkyNet.conf -Dflume.root.logger=DEBUG,console -n agentADSB

Learning 5: Finding Hadoop Echo Systems version from Ambari

Admin à Cluster (Hue interface à About is the other way to find from Hue interface)

Issue 1: Could not resolve org.apache.hcatalog.pig.HCatStorer  - Error in HDP2.2.

At the end of the "pig script" section, there is a text box that says "pig arguments".
Type -useHCatalog and then press Enter
It should highlight in gray and add a new empty text box
Use org.apache.hive.hcatalog.pig.HCatLoader(); instead of "org.apache.hcatalog.pig.HCatLoader();"

Issue 2: cannot access /usr/lib/hive/lib/slf4j-api-*.jar: No such file or directory

1. find / -name slf4j-api-*.jar
2. cp /usr/lib/hive/lib/

Issue 3: Pig 'bytearray' type in column 0(0-based) cannot map to HCat 'STRING'type.  Target filed must be of HCat type {BINARY}

Change pig type to chararray

Issue 4: Column names should all be in lowercase. Invalid name found: airGround

Change column names to lowercase in pig

Issue 5: Permissions issues with Linux scripts and HDFS permissions due to root id and hue id

Make sure the automated process has correct permissions to read / write / execute

Issue 6: Flume: “[ERROR - org.apache.flume.source.NetcatSource.start(] Unable to bind to socket.” during the source connectivity

In Progress ….

