Tuesday, November 28, 2017

Sublime installation on Ubuntu 16.04 reference

https://www.sublimetext.com/docs/3/linux_repositories.html#apt
wget -qO - https://download.sublimetext.com/sublimehq-pub.gpg | sudo apt-key add -
sudo apt-get install apt-transport-https
echo "deb https://download.sublimetext.com/ apt/stable/" | sudo tee /etc/apt/sources.list.d/sublime-text.list
sudo apt-get update
sudo apt-get install sublime-text

https://realpython.com/blog/python/setting-up-sublime-text-3-for-full-stack-python-development/

Sublime Anaconda config
https://github.com/DamnWidget/anaconda#anaconda-autocompletion

http://damnwidget.github.io/anaconda/IDE/

http://damnwidget.github.io/anaconda/anaconda_settings/ 

Monday, November 27, 2017

Thursday, November 16, 2017

Scala Conscript, giter8, and set

brew update && brew install giter8

or

http://www.foundweekends.org/conscript/setup.html

http://www.foundweekends.org/giter8/setup.html

https://github.com/foundweekends/giter8/wiki/giter8-templates

http://www.scala-sbt.org/0.13/docs/Installing-sbt-on-Linux.html

# setup scala

sbt new scala/scala-seed.g8

# example sbt layout

https://github.com/kyrsideris/SparkUpdateCassandra


Wednesday, November 15, 2017

Spark: Read JSON references

Nginx log file proceeding in ELK - reference


https://logz.io/blog/nginx-log-analysis/

A sample NGINX access log entry:

  1. 109.65.122.142 - - [10/Nov/2015:07:06:59 +0000] "POST /kibana/elasticsearch/_msearch?timeout=30000&ignore_unavailable=true&preference=1447070343481 HTTP/1.1" 200 8352 "https://app.logz.io/kibana/index.html" "Mozilla/5.0 (X11; Linux armv7l) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/45.0.2454.101 Chrome/45.0.2454.101 Safari/537.36" 0.465 0.454

The Logstash configuration to parse that NGINX access log entry:

  1. grok {
  2. match => [ "message" , "%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}"]
  3. overwrite => [ "message" ]
  4. }
  5.  
  6. mutate {
  7. convert => ["response", "integer"]
  8. convert => ["bytes", "integer"]
  9. convert => ["responsetime", "float"]
  10. }
  11.  
  12. geoip {
  13. source => "clientip"
  14. target => "geoip"
  15. add_tag => [ "nginx-geoip" ]
  16. }
  17.  
  18. date {
  19. match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
  20. remove_field => [ "timestamp" ]
  21. }
  22.  
  23. useragent {
  24. source => "agent"
  25. }

A sample NGINX error log:

  1. 2015/11/10 06:49:59 [warn] 10#0: *557119 an upstream response is buffered to a temporary file /var/lib/nginx/proxy/4/80/0000003804 while reading upstream, client: 66.249.88.173, server: 0.0.0.0, request: "GET /kibana/index.js?_b=1273 HTTP/1.1", upstream: "http://172.17.0.30:9000/kibana/index.js?_b=1273", host: "app.logz.io", referrer: "https://app.logz.io/kibana/index.html"

The Logstash configuration to parse that NGINX error log:

  1. grok {
  2. match => [ "message" , "(?%{YEAR}[./-]%{MONTHNUM}[./-]%{MONTHDAY}[- ]%{TIME}) \[%{LOGLEVEL:severity}\] %{POSINT:pid}#%{NUMBER}: %{GREEDYDATA:errormessage}(?:, client: (?%{IP}|%{HOSTNAME}))(?:, server: %{IPORHOST:server})(?:, request: %{QS:request})?(?:, upstream: \"%{URI:upstream}\")?(?:, host: %{QS:host})?(?:, referrer: \"%{URI:referrer}\")"]
  3. overwrite => [ "message" ]
  4. }
  5.  
  6. geoip {
  7. source => "client"
  8. target => "geoip"
  9. add_tag => [ "nginx-geoip" ]
  10. }
  11.  
  12. date {
  13. match => [ "timestamp" , "YYYY/MM/dd HH:mm:ss" ]
  14. remove_field => [ "timestamp" ]
  15. }

Thursday, November 9, 2017

Docker ELK Stack and the geo ip plugin

https://docs.docker.com/compose/gettingstarted/#step-3-define-services-in-a-compose-file

http://elk-docker.readthedocs.io/#running-with-docker-compose

# https://elk-docker.readthedocs.io/#installation
sudo docker pull sebp/elk
docker images

# https://elk-docker.readthedocs.io/#usage
sudo docker run -p 5601:5601 -p 9200:9200 -p 5044:5044 -it --name elk sebp/elk

# or setup a yml file
# create an entry for the ELK Docker image by adding the following lines to
# your docker-compose.yml file:
 
elk:
  image: sebp/elk
  ports:
    - "5601:5601"
    - "9200:9200"
    - "5044:5044"
You can then start the ELK container like this:
$ sudo docker-compose up elk


# follow the instructions to inject a log msg to log stash

# inject the msg

# in a browser view the injected msg

http://192.168.1.155:9200/_search?pretty

http://192.168.1.155:5601/app/kibana#/management/kibana/index?_g=()

# use the container id from the docker ps and stop the container

docker stop fce12628893c

docker stop
# Lets now build a elk-docker image using a git clone

cd

git clone https://github.com/spujadas/elk-docker

http://elk-docker.readthedocs.io/#building-image

https://stackoverflow.com/questions/36617904/extending-local-dockerfile

# build the cloned docker image


~/elk-docker$ docker build -t elk-docker

# now create the second docker file which will inject the geo ip plugin

Dockerfile like the following will extend the base image and install the GeoIP processor plugin(which adds information about the geographical location of IP addresses):
FROM sebp/elk

ENV ES_HOME /opt/elasticsearch
WORKDIR ${ES_HOME}

RUN CONF_DIR=/etc/elasticsearch gosu elasticsearch bin/elasticsearch-plugin \
    install ingest-geoip
You can now build the new image (see the Building the image section above) and run the container in the same way as you did with the base image.
~$ mkdir elk-docker-geoip
~$ cd !$
cd elk-docker-geoip
~/elk-docker-geoip$ vi Dockerfile

FROM sebp/elk ENV ES_HOME /opt/elasticsearch WORKDIR ${ES_HOME} RUN CONF_DIR=/etc/elasticsearch gosu elasticsearch bin/elasticsearch-plugin \ install ingest-geoip

~/elk-docker-geoip$ docker build -t elk-docker .

Tuesday, November 7, 2017

Spark Scala Cassandra intro

# Assumes that you have previously installed Oracle Java 8
# Use the java install instructions at this page if you have not already installed Java 8



# Install Cassandra


echo "deb http://www.apache.org/dist/cassandra/debian 311x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
curl https://www.apache.org/dist/cassandra/KEYS | sudo apt-key add -
sudo apt-get update
sudo apt-get install cassandra

https://www.datastax.com/dev/blog/kindling-an-introduction-to-spark-with-cassandra-part-1

# build the 2.11 Spark compatible spark-cassandra-connector library
$ sbt/sbt -Dscala-2.11=true assembly

# Assuming that you git cloned the Spark Cassandra connector code  in $HOME and did an set build and spark-shell is in the path.

# Start cassandra

# To start the Apache Cassandra service on your server, you can use the following command:

sudo systemctl start cassandra.service

# To stop the service, you can use the command below:

sudo systemctl stop cassandra.service

# If the service is not already enabled on system boot, you can enable it by using the command below:

sudo systemctl enable cassandra.service
# Add a key space and table for the tutorial in the Cassandra shell : cqlsh


$ cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.1 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.

cqlsh> CREATE KEYSPACE test_spark WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };

cqlsh> CREATE TABLE test_spark.test (value int PRIMARY KEY);

cqlsh:test_spark> INSERT INTO test_spark.test (value) VALUES (1);

# In another shell start spark-shell

$ cd

$ spark-shell

scala> sc.parallelize( 1 to 50 ).sum()

sc.parallelize( 1 to 50 ).sum()

res1: Double = 1275.0

scala> CNTL-D # to exit

# restart with the Cassandra connector jar

$ spark-shell --jars ~/spark-cassandra-connector/spark-cassandra-connector/target/full/scala-2.11/spark-cassandra-connector-assembly-2.0.5-70-g2ee41fc.jar


scala> import com.datastax.spark.connector._, org.apache.spark.SparkContext, org.apache.spark.SparkContext._, org.apache.spark.SparkConf

scala> val conf = new SparkConf(true).set("spark.cassandra.connection.host","localhost")

scala> val sc = new SparkContext(conf)

scala> val test_spark_rdd = sc.cassandraTable("test_spark", "test")

test_spark_rdd: com.datastax.spark.connector.rdd.CassandraTableScanRDD[com.datastax.spark.connector.CassandraRow] = CassandraTableScanRDD[4] at RDD at CassandraRDD.scala:16
scala> val data = sc.cassandraTable("my_keyspace", "my_table")
data: com.datastax.spark.connector.rdd.CassandraTableScanRDD[com.datastax.spark.connector.CassandraRow] = CassandraTableScanRDD[5] at RDD at CassandraRDD.scala:16
#########
# Start the movie tutorial

# Make the key space and table for movies

cqlsh:test_spark> USE test_spark; 

cqlsh:test_spark> CREATE KEYSPACE spark_demo WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };
cqlsh:test_spark> USE spark_demo;

cqlsh:spark_demo> CREATE TABLE spark_demo.movies (id int PRIMARY KEY, title text, genres text); 

cqlsh:spark_demo> describe table movies;


cqlsh:spark_demo> INSERT INTO spark_demo.movies (id, title, genres) VALUES (1, 'Bladerunner', 'Scifi');




cqlsh:spark_demo> INSERT INTO spark_demo.movies (id, title, genres) VALUES (2, 'The Big Short', 'Finance');


cqlsh:spark_demo> SELECT * FROM spark_demo.movies  ;

 id | genres  | title
----+---------+---------------
  1 |   Scifi |   Bladerunner
  2 | Finance | The Big Short

(2 rows)

# Spark code for movies
import com.datastax.spark.connector._, org.apache.spark.SparkContext, org.apache.spark.SparkContext._, org.apache.spark.SparkConf

val conf = new SparkConf(true).set("spark.cassandra.connection.host","localhost")


val data = sc.cassandraTable("sparc_demo", "movies")
case class Movie(Id: Int, Title: String, Genres: String)

val data = sc.cassandraTable[Movie]("spark_demo", "movies")

data.foreach(println)

#output
Movie(1,Bladerunner,Scifi)
Movie(2,The Big Short,Finance)



Sunday, November 5, 2017

Scala schema code generation

SCHEMA CODE GENERATION

The Slick code generator is a convenient tool for working with an existing or evolving database schema. It can be run stand-alone or integrated into you sbt build for creating all code Slick needs to work.


http://slick.lightbend.com/doc/3.0.0/code-generation.html