Tuesday, August 18, 2015

Aster Express installation on VirtualBox/Mac OS

The following are the instructions to install Aster Express on Virtual Box at Mac Book. We need to take care of the Network configuration (Host-only along with hardcoded MAC Address) to setup the VMWare image on Virtual Box image.

  1. On Virtual Box, VirtualBox —> Preferences —> Network —> Host-only Networks
  2. At Adapters, enter 192.168.100.1 at IPv4 Address
  3. At DHCP Server - DON’T enable (Close all tabs)
  4. On Virtual Box again, Click NEW to import/create Virtual Machine
  5. Name as "Aster Queen" and Type as “Linux" and Version “OPEN SUSE (64-bit)”
  6. Give 2 GB RAM
  7. At next tab, Use an existing virtual hard drive file – select the downloaded Queen VDMK file
  8. Once you import/create the VM, open settings before you start the image
  9. Go to Network —> Adapter 1 —> Attached to: “Host-only adapter"
  10. At advanced on the same tab, replace MAC Address: with 005056368D90 for Queen Node
  11. Repeat Step 4 to 10 for Aster Worker node but replace MAC Address with 00505620E180 for Worker Node

You should be able to ping 192.168.100.100 and 192.168.100.150 from your MacBook.


The problem we may face is due to Network Adapter. We are supposed to use Static IP addresses on eth0 but as we are converting VMWare to Virtual Box, it didn’t pick it up directly during the boot. Hence we have updated the Virtual Box setting to pick the Host-Only Networks. However we should also give the exact MAC Address ( 005056368D90 for Queen Node and 00505620E180 for Worker Node) to get the eth0 configured. These MAC addresses are picked up from Aster 6.00.01 version. If the Aster version is changed, you need to find the latest MAC Addresses on those images.


Please follow the rest of instructions from the given installation instructions document as part of Aster Express downloads.


Please find the links below which I found during the investigation. Hope it helps.

https://forums.virtualbox.org/viewtopic.php?f=7&t=9057 – This is the link which gave the solution.

I've just had exactly this problem; it took me a good couple of hours to track down. The key is that Debian records the MAC address of the network adapter in its udev rules, so you can't change it externally without also changing it internally. This also means that if you copy the vdi to another host (like I did), you need to duplicate the MAC address setting in VirtualBox as well. “.


Friday, July 10, 2015

Presto Tutorial

Java 8 Installation: http://tecadmin.net/install-java-8-on-centos-rhel-and-fedora/

$ cd /opt

$ wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8u45-b14/jdk-8u45-linux-x64.tar.gz"

$ tar xzf jdk-8u45-linux-x64.tar.gz

$ wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.109/presto-server-0.109.tar.gz

$ tar -xvf presto-server-0.109.tar.gz

$ mv presto-server-0.109 /usr/local/presto

$ cd /usr/local/presto

$ mkdir /usr/local/presto/etc

Configuring Presto:

Node Properties:

$ vi etc/node.properties

# The following is a minimal etc/node.properties

node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/var/presto/data

$ mkdir /var/presto

$ mkdir /var/presto/data

JVM Config:

$ vi etc/jvm.config

# The following provides a good starting point for creating etc/jvm.config:

-server
-Xmx16G
-XX:+UseConcMarkSweepGC
-XX:+ExplicitGCInvokesConcurrent
-XX:+AggressiveOpts
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p

Config Properties:

$ vi etc/config.properties

# if you are setting up a single machine for testing that will function as both a coordinator and worker, use this configuration:

coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8084
task.max-memory=1GB
discovery-server.enabled=true
discovery.uri=http://localhost:8084

Log Levels:

$ vi etc/log.properties

com.facebook.presto=INFO

Catalog Properties:

$ mkdir etc/catalog

Hive Connector:

$ vi etc/catalog/hive.properties


#Apache Hadoop 1.x: hive-hadoop1
#Apache Hadoop 2.x: hive-hadoop2
#Cloudera CDH 4: hive-cdh4
# Cloudera CDH 5: hive-cdh5

connector.name=hive-hadoop2
hive.metastore.uri=thrift://localhost:9083



Running Presto:

$ bin/launcher start   # To run as a daemon (at the background)

$ bin/launcher run # to run in the foreground

$ cd /usr/local/presto/bin

$ wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.109/presto-cli-0.109-executable.jar

$ mv presto-cli-0.109-executable.jar presto

$ chmod +x presto

--------------
Hive Demo:
--------------

$ bin/presto --server localhost:8084 --catalog hive --schema default

Hive Tutorial: http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-hive/

> SELECT year, max(runs) FROM batting GROUP BY year;

> SELECT a.year, a.player_id, a.runs from batting a  JOIN (SELECT year, max(runs) runs FROM batting GROUP BY year ) b ON (a.year = b.year AND a.runs = b.runs) ;

===================================

Create Kafka Catalog:

https://prestodb.io/docs/current/connector/kafka-tutorial.html

$ vi etc/catalog/kafka.properties

connector.name=kafka
kafka.nodes=localhost:6667
kafka.table-names=test,twitter,tweets
kafka.hide-internal-columns=false


$ bin/presto --server localhost:8084 --catalog kafka --schema default

Queries:

> select count(*) from tweets;

> SELECT DISTINCT json_extract_scalar(_message, '$.created_at') AS raw_date FROM tweets LIMIT 5;

> SELECT created_at, raw_date FROM ( SELECT created_at, json_extract_scalar(_message, '$.created_at') AS raw_date FROM tweets) GROUP BY 1, 2 LIMIT 5;

Avro to JSON or JSON to Avro

Avro --> JSON or JSON --> Avro

http://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/

$ java -jar ~/avro-tools-1.7.4.jar
$ java -jar ~/avro-tools-1.7.4.jar fromjson
$ java -jar ~/avro-tools-1.7.4.jar fromjson --schema-file twitter.avsc twitter.json > twitter.avro
$ java -jar ~/avro-tools-1.7.4.jar tojson twitter.avro > twitter.json
$ java -jar ~/avro-tools-1.7.4.jar tojson twitter.snappy.avro > twitter.json

Wednesday, July 1, 2015

Error: Failed connecting to Hive metastore: localhost:9083

Error: Failed connecting to Hive metastore: localhost:9083

check if you can telnet 9083 ($ telnet localhost 9083). If it has failed, you need to start the Hive Metastore service with below command.

hive --service metastore

Thursday, May 21, 2015

Netcat Example

On terminal 1: nc -l 4444 (and post some messages)

On terminal 2: nc localhost 4444 (to view the posted messages)


guess nc needs to replaced with netcat on different OS. I've used nc on Mac.

Please refer below link for more details.
https://www.digitalocean.com/community/tutorials/how-to-use-netcat-to-establish-and-test-tcp-and-udp-connections-on-a-vps

Tuesday, March 3, 2015

Enable Hive Beeline

set "hive.support.concurrency" property to TRUE via Ambari at Advancedhive-site section in Hive configuration.