Tuesday, November 28, 2017

Sublime installation on Ubuntu 16.04 reference

https://www.sublimetext.com/docs/3/linux_repositories.html#apt
wget -qO - https://download.sublimetext.com/sublimehq-pub.gpg | sudo apt-key add -
sudo apt-get install apt-transport-https
echo "deb https://download.sublimetext.com/ apt/stable/" | sudo tee /etc/apt/sources.list.d/sublime-text.list
sudo apt-get update
sudo apt-get install sublime-text

https://realpython.com/blog/python/setting-up-sublime-text-3-for-full-stack-python-development/

Sublime Anaconda config
https://github.com/DamnWidget/anaconda#anaconda-autocompletion

http://damnwidget.github.io/anaconda/IDE/

http://damnwidget.github.io/anaconda/anaconda_settings/

Monday, November 27, 2017

Kafka, SPARK, streaming, and JSON references

https://docs.confluent.io/current/quickstart.html

https://spark.apache.org/docs/latest/streaming-programming-guide.html

https://spark.apache.org/docs/latest/streaming-kafka-integration.html

http://kafka.apache.org/documentation.html

https://spark.apache.org/docs/2.2.0/streaming-kafka-0-8-integration.html

https://www.cloudera.com/documentation/kafka/latest/topics/kafka_command_line.html
-substitute localhost for the .com domain name in the examples

Sunday, November 26, 2017

Ubuntu 16.04 hashes md5 sha...

https://help.ubuntu.com/community/HowToMD5SUM#Check_the_iso_file

http://releases.ubuntu.com/16.04/
-find the links on this page

Thursday, November 16, 2017

Scala Conscript, giter8, and set

brew update && brew install giter8

or

http://www.foundweekends.org/conscript/setup.html

http://www.foundweekends.org/giter8/setup.html

https://github.com/foundweekends/giter8/wiki/giter8-templates

http://www.scala-sbt.org/0.13/docs/Installing-sbt-on-Linux.html

# setup scala

sbt new scala/scala-seed.g8

# example sbt layout

https://github.com/kyrsideris/SparkUpdateCassandra

Wednesday, November 15, 2017

Spark: Read JSON references

https://stackoverflow.com/questions/20029412/scala-play-parse-json-into-map-instead-of-jsobject/20034844#20034844

https://manuel.bernhardt.io/2015/11/06/a-quick-tour-of-json-libraries-in-scala/

https://coderwall.com/p/o--apg/easy-json-un-marshalling-in-scala-with-jackson

https://www.scalawilliam.com/scala-json/

https://github.com/circe/circe

https://circe.github.io/circe/

https://github.com/non/jawn

Nginx log file proceeding in ELK - reference

https://logz.io/blog/nginx-log-analysis/

A sample NGINX access log entry:


109.65.122.142 - - [10/Nov/2015:07:06:59 +0000] "POST /kibana/elasticsearch/_msearch?timeout=30000&ignore_unavailable=true&preference=1447070343481 HTTP/1.1" 200 8352 "https://app.logz.io/kibana/index.html" "Mozilla/5.0 (X11; Linux armv7l) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/45.0.2454.101 Chrome/45.0.2454.101 Safari/537.36" 0.465 0.454

The Logstash configuration to parse that NGINX access log entry:


grok {
match => [ "message" , "%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}"]
overwrite => [ "message" ]
}
 
mutate {
convert => ["response", "integer"]
convert => ["bytes", "integer"]
convert => ["responsetime", "float"]
}
 
geoip {
source => "clientip"
target => "geoip"
add_tag => [ "nginx-geoip" ]
}
 
date {
match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
remove_field => [ "timestamp" ]
}
 
useragent {
source => "agent"
}

A sample NGINX error log:


2015/11/10 06:49:59 [warn] 10#0: *557119 an upstream response is buffered to a temporary file /var/lib/nginx/proxy/4/80/0000003804 while reading upstream, client: 66.249.88.173, server: 0.0.0.0, request: "GET /kibana/index.js?_b=1273 HTTP/1.1", upstream: "http://172.17.0.30:9000/kibana/index.js?_b=1273", host: "app.logz.io", referrer: "https://app.logz.io/kibana/index.html"

The Logstash configuration to parse that NGINX error log:


grok {
match => [ "message" , "(?%{YEAR}[./-]%{MONTHNUM}[./-]%{MONTHDAY}[- ]%{TIME}) \[%{LOGLEVEL:severity}\] %{POSINT:pid}#%{NUMBER}: %{GREEDYDATA:errormessage}(?:, client: (?%{IP}|%{HOSTNAME}))(?:, server: %{IPORHOST:server})(?:, request: %{QS:request})?(?:, upstream: \"%{URI:upstream}\")?(?:, host: %{QS:host})?(?:, referrer: \"%{URI:referrer}\")"]
overwrite => [ "message" ]
}
 
geoip {
source => "client"
target => "geoip"
add_tag => [ "nginx-geoip" ]
}
 
date {
match => [ "timestamp" , "YYYY/MM/dd HH:mm:ss" ]
remove_field => [ "timestamp" ]
}

Monday, November 13, 2017

Kibana aggregations references

https://www.youtube.com/watch?time_continue=4&v=j-eCKDhj-Os

Thursday, November 9, 2017

Docker ELK Stack and the geo ip plugin

https://docs.docker.com/compose/gettingstarted/#step-3-define-services-in-a-compose-file

http://elk-docker.readthedocs.io/#running-with-docker-compose

# https://elk-docker.readthedocs.io/#installation

sudo docker pull sebp/elk

docker images

# https://elk-docker.readthedocs.io/#usage

sudo docker run -p 5601:5601 -p 9200:9200 -p 5044:5044 -it --name elk sebp/elk

# or setup a yml file
# create an entry for the ELK Docker image by adding the following lines to
# your docker-compose.yml file:

elk:
  image: sebp/elk
  ports:
    - "5601:5601"
    - "9200:9200"
    - "5044:5044"

You can then start the ELK container like this:

$ sudo docker-compose up elk

# follow the instructions to inject a log msg to log stash

# inject the msg

# in a browser view the injected msg

http://192.168.1.155:9200/_search?pretty

http://192.168.1.155:5601/app/kibana#/management/kibana/index?_g=()

# use the container id from the docker ps and stop the container

docker stop fce12628893c

docker stop
# Lets now build a elk-docker image using a git clone

cd

git clone https://github.com/spujadas/elk-docker

http://elk-docker.readthedocs.io/#building-image

https://stackoverflow.com/questions/36617904/extending-local-dockerfile

# build the cloned docker image

~/elk-docker$ docker build -t elk-docker

# now create the second docker file which will inject the geo ip plugin

A Dockerfile like the following will extend the base image and install the GeoIP processor plugin(which adds information about the geographical location of IP addresses):

FROM sebp/elk

ENV ES_HOME /opt/elasticsearch
WORKDIR ${ES_HOME}

RUN CONF_DIR=/etc/elasticsearch gosu elasticsearch bin/elasticsearch-plugin \
    install ingest-geoip

You can now build the new image (see the Building the image section above) and run the container in the same way as you did with the base image.

~$ mkdir elk-docker-geoip

~$ cd !$

cd elk-docker-geoip

~/elk-docker-geoip$ vi Dockerfile

FROM sebp/elk

ENV ES_HOME /opt/elasticsearch
WORKDIR ${ES_HOME}

RUN CONF_DIR=/etc/elasticsearch gosu elasticsearch bin/elasticsearch-plugin \
    install ingest-geoip

~/elk-docker-geoip$ docker build -t elk-docker .

docker run elk-docker-geoip

# lets download a log file with ip addresses

https://logz.io/blog/nginx-log-analysis/

https://www.elastic.co/blog/geoip-in-the-elastic-stack

########################################################################################

# other resources

########################################################################################

https://www.elastic.co/guide/en/elasticsearch/plugins/master/ingest-geoip.html

https://www.elastic.co/guide/en/elasticsearch/plugins/master/ingest-geoip.html#ingest-geoip

https://www.elastic.co/guide/en/beats/packetbeat/current/packetbeat-geoip.html

https://www.elastic.co/blog/geoip-in-the-elastic-stack

# Kibana

https://github.com/elastic/kibana

https://www.elastic.co/guide/en/kibana/current/index.html

https://www.elastic.co/guide/en/kibana/current/tutorial-load-dataset.html

Tuesday, November 7, 2017

Spark Scala Cassandra intro

# Assumes that you have previously installed Oracle Java 8

# Use the java install instructions at this page if you have not already installed Java 8

https://vexxhost.com/resources/tutorials/how-to-setup-cassandra-and-run-a-single-node-cluster-on-ubuntu-16-04/

# Install Cassandra

https://www.rosehosting.com/blog/how-to-install-apache-cassandra-on-ubuntu-16-04/

echo "deb http://www.apache.org/dist/cassandra/debian 311x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
curl https://www.apache.org/dist/cassandra/KEYS | sudo apt-key add -
sudo apt-get update
sudo apt-get install cassandra

# https://www.datastax.com/dev/blog/kindling-an-introduction-to-spark-with-cassandra-part-1

# build the 2.11 Spark compatible spark-cassandra-connector library

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/12_building_and_artifacts.md

$ sbt/sbt -Dscala-2.11=true assembly

# Assuming that you git cloned the Spark Cassandra connector code in $HOME and did an set build and spark-shell is in the path.

# Start cassandra

# To start the Apache Cassandra service on your server, you can use the following command:

sudo systemctl start cassandra.service

# To stop the service, you can use the command below:

sudo systemctl stop cassandra.service

# If the service is not already enabled on system boot, you can enable it by using the command below:

sudo systemctl enable cassandra.service

# Add a key space and table for the tutorial in the Cassandra shell : cqlsh

https://docs.datastax.com/en/cql/3.1/cql/cql_reference/create_keyspace_r.html?hl=create%2Ckeyspace

https://docs.datastax.com/en/cql/3.1/cql/cql_reference/create_table_r.html

$ cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.1 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.

cqlsh> CREATE KEYSPACE test_spark WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };

cqlsh> CREATE TABLE test_spark.test (value int PRIMARY KEY);

cqlsh:test_spark> INSERT INTO test_spark.test (value) VALUES (1);

# In another shell start spark-shell

$ cd

$ spark-shell

scala> sc.parallelize( 1 to 50 ).sum()

sc.parallelize( 1 to 50 ).sum()

res1: Double = 1275.0

scala> CNTL-D # to exit

# restart with the Cassandra connector jar

$ spark-shell --jars ~/spark-cassandra-connector/spark-cassandra-connector/target/full/scala-2.11/spark-cassandra-connector-assembly-2.0.5-70-g2ee41fc.jar

scala> import com.datastax.spark.connector._, org.apache.spark.SparkContext, org.apache.spark.SparkContext._, org.apache.spark.SparkConf

scala> val conf = new SparkConf(true).set("spark.cassandra.connection.host","localhost")

scala> val sc = new SparkContext(conf)

scala> val test_spark_rdd = sc.cassandraTable("test_spark", "test")

test_spark_rdd: com.datastax.spark.connector.rdd.CassandraTableScanRDD[com.datastax.spark.connector.CassandraRow] = CassandraTableScanRDD[4] at RDD at CassandraRDD.scala:16

scala> val data = sc.cassandraTable("my_keyspace", "my_table")
data: com.datastax.spark.connector.rdd.CassandraTableScanRDD[com.datastax.spark.connector.CassandraRow] = CassandraTableScanRDD[5] at RDD at CassandraRDD.scala:16

#########

# Start the movie tutorial

# Make the key space and table for movies

cqlsh:test_spark> USE test_spark;

cqlsh:test_spark> CREATE KEYSPACE spark_demo WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };

cqlsh:test_spark> USE spark_demo;

cqlsh:spark_demo> CREATE TABLE spark_demo.movies (id int PRIMARY KEY, title text, genres text);

cqlsh:spark_demo> describe table movies;

cqlsh:spark_demo> INSERT INTO spark_demo.movies (id, title, genres) VALUES (1, 'Bladerunner', 'Scifi');

cqlsh:spark_demo> INSERT INTO spark_demo.movies (id, title, genres) VALUES (2, 'The Big Short', 'Finance');

cqlsh:spark_demo> SELECT * FROM spark_demo.movies  ;

 id | genres  | title

----+---------+---------------

  1 |   Scifi |   Bladerunner

  2 | Finance | The Big Short

(2 rows)

# Spark code for movies

import com.datastax.spark.connector._, org.apache.spark.SparkContext, org.apache.spark.SparkContext._, org.apache.spark.SparkConf

val conf = new SparkConf(true).set("spark.cassandra.connection.host","localhost")

val data = sc.cassandraTable("sparc_demo", "movies")

case class Movie(Id: Int, Title: String, Genres: String)

val data = sc.cassandraTable[Movie]("spark_demo", "movies")

data.foreach(println)

#output
Movie(1,Bladerunner,Scifi)
Movie(2,The Big Short,Finance)

Sunday, November 5, 2017

Scala schema code generation

SCHEMA CODE GENERATION

The Slick code generator is a convenient tool for working with an existing or evolving database schema. It can be run stand-alone or integrated into you sbt build for creating all code Slick needs to work.

http://slick.lightbend.com/doc/3.0.0/code-generation.html

programming matrix