Saturday, December 31, 2016

Pitch deck writing resources

https://www.blueoceanstrategy.com/tools/red-ocean-vs-blue-ocean-strategy/

https://en.wikipedia.org/wiki/Go_to_market

Wednesday, December 28, 2016

Apache Spark gpg

browse to http://www.apache.org/dist/spark/spark-2.1.0/ or the version you want

download the tag and corresponding asc files

wget http://www.apache.org/dist/spark/spark-2.1.0/spark-2.1.0-bin-hadoop2.7.tgz

wget http://www.apache.org/dist/spark/spark-2.1.0/spark-2.1.0-bin-hadoop2.7.tgz.asc

https://www.apache.org/info/verification.html

$ wget http://www.apache.org/dist/spark/KEYS

$ gpg --import KEYS

$ gpg --verify  spark-2.1.0-bin-hadoop2.7.tgz.asc spark-2.1.0-bin-hadoop2.7.tgz

gpg: Signature made Thu Dec 15 18:18:33 2016 PST using RSA key ID FC8ED089

gpg: Good signature from "Patrick Wendell " [unknown]

gpg: WARNING: This key is not certified with a trusted signature!

gpg:          There is no indication that the signature belongs to the owner.

Primary key fingerprint: EEDA BD1C 71C5 48D6 F006  61D3 7C6C 105F FC8E D089

$ gpg --keyserver pgpkeys.mit.edu --recv-key FC8ED089

gpg: requesting key FC8ED089 from hkp server pgpkeys.mit.edu

gpg: key FC8ED089: "Patrick Wendell " not changed

gpg: Total number processed: 1

gpg:              unchanged: 1

$ gpg --verify  spark-2.1.0-bin-hadoop2.7.tgz.asc spark-2.1.0-bin-hadoop2.7.tgz

Monday, December 19, 2016

Adding an nltk stop word filter

$ python

> from nltk.corpus import stopwords
> nltk_stopwords = stopwords.words('english')

except LookupError: raise e
LookupError: 
**********************************************************************
  Resource u'corpora/stopwords' not found.  Please use the NLTK
  Downloader to obtain the resource:  >>> nltk.download()
  Searched in:
    - '/Users/depappas/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************

Process finished with exit code 1

Install NLTK Data
This will take a few minutes if your Internet download speed is around 15mbs.

http://www.nltk.org/data.html
$ python

> import nltk

> nltk.download()

showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml

Click on the Corpora tab and select the stopwords package you want. Wait for it to download.

from nltk.corpus import

Now this should work...


$ python

Python 2.7.12 (default, Oct 11 2016, 05:20:59) 

[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)] on darwin

Type "help", "copyright", "credits" or "license" for more information.

>>> import nltk

>>> nltk.download()

showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml

True

>>> from nltk.corpus import stopwords

>>> nltk_stopwords = stopwords.words('english')

>>>

Done!

Sunday, December 18, 2016

New Mac setup for deep learning, Golang, Scala, and SPARK

http://brew.sh

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Brew Install

brew install tree

brew install htop

brew install python

brew install wget

Apple command line tools (gcc)

https://developer.apple.com/download/more/

After installation is completed, run “gcc -v” in terminal again. If everything fine, following output will be displayed.

Dmg Downloads

brew install gnupg

gpg --keyserver pgpkeys.mit.edu --recv-key 83135D45

gpg --verify KeePassX-2.0-beta2.dmg.sig KeePassX-2.0-beta2.dmg

Verify downloads

 GPG signatures

All downloads and Git tags are signed with the key 164C70512F7929476764AB56FE22C6FD83135D45

Python Virtual Setup

http://www.marinamele.com/2014/05/install-python-virtualenv-virtualenvwrapper-mavericks.html

Languages

Python

http://www.marinamele.com/2014/05/install-python-virtualenv-virtualenvwrapper-mavericks.html

brew install python

pip install virtualenv

pip install virtualenvwrapper

# on OS X if you can’t install virtualenv with pip then use this workaround

pip install --index-url=http://pypi.python.org/simple/ --trusted-host pypi.python.org  virtualenv

pip install --index-url=http://pypi.python.org/simple/ --trusted-host pypi.python.org  virtualenvwrapper

# for the Homebrew installed path

~/.bash_profile : export PATH=/usr/local/share/python:$PATH

# Python virtualenv workon setup

export WORKON_HOME=~/.virtualenvs

source /usr/local/bin/virtualenvwrapper.sh

#Setup the certificates

http://programmingmatrix.blogspot.com/2016/10/pip-install-fails-with-connection-error.html

$ source ~/.bash_profile

$ cd

$ mkdir .virtualenvs

$ cd .virtualenvs

$ virtualenv test

$ workon test

# now you are using the pip in the .virtualenvs/test tree

# if pip install does not work upgrade pip

easy_install --upgrade pip

Deep Learning

http://programmingmatrix.blogspot.com/2016/12/osx-how-to-install-tensorflow-upgrade.html

https://www.tensorflow.org/get_started/os_setup

wget https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-0.12.0rc1-py2-none-any.whl

pip install --upgrade tensorflow-0.12.0rc1-py2-none-any.whl

pip install keras

pip install numpy

pip install scipy

pip install matplot

pip install gensim

pip install ioutils

pip install Cython

Java

http://www.oracle.com/technetwork/java/javase/downloads/jdk8-

downloads-2133151.html

https://docs.oracle.com/javase/8/docs/technotes/guides/install/mac_jdk.html

http://stackoverflow.com/questions/1348842/what-should-i-set-java-home-to-on-osx

~/.bash_profile : export JAVA_HOME="`/usr/libexec/java_home -v '1.8*'`"

Scala

http://scala-ide.org

https://www.scala-lang.org/download/

~/.bash_profile:

export SCALA_HOME=$HOME/scala-2.12.1

export SCALA_BIN=$SCALA_HOME/bin

export PATH=$PATH:$SCALA_BIN

Spark

http://spark.apache.org/downloads.html

http://programmingmatrix.blogspot.com/2016/10/set-up-apache-spark-keys-and-verifying.html

cd

mv Downloads/spark-2.0.2-bin-hadoop2.7.tar .

tar xvf spark-2.0.2-bin-hadoop2.7.tar 

~/.bash_profile:

export SPARK_HOME=$HOME/spark

export SPARK_BIN=$SPARK_HOME/bin

export PATH=$PATH:$SPARK_BIN

ln -s spark-2.0.2-bin-hadoop2.7 spark

Golang

https://golang.org/dl/

Atom Editor 

https://atom.io

http://programmingmatrix.blogspot.com/2016/03/atom-text-editor.html

Saturday, December 17, 2016

OSX command tools (installing gcc)

http://www.mkyong.com/mac/how-to-install-gcc-compiler-on-mac-os-x/

Gensim: tutorials

http://radimrehurek.com/gensim/tut1.html

http://radimrehurek.com/gensim/tut2.html

http://radimrehurek.com/gensim/tut3.html

Thursday, December 15, 2016

OSX: how to install Tensorflow: upgrade pip and use virtualenv

After you have setup and activated you virtualenv if you run into this error on your Mac while trying to install tensorflow, upgrade pip and retry installing tensorflow.

Get the latest .whl file version from the virtualenv section of the following page:
https://www.tensorflow.org/get_started/os_setup#virtualenv_installation

pip install --upgrade tensorflow-0.12.0rc0-py2-none-any.whl

Unpacking ./tensorflow-0.12.0rc0-py2-none-any.whl

Downloading/unpacking protobuf==3.1.0 (from tensorflow==0.12.0rc0)

Could not find a version that satisfies the requirement protobuf==3.1.0 (from tensorflow==0.12.0rc0) (from versions: 3.0.0b4, 3.0.0, 3.0.0b2.post2, 3.0.0a2, 3.0.0b2, 2.6.1, 2.0.3, 2.0.0beta, 2.5.0, 2.4.1, 2.6.0, 3.0.0b2.post1, 3.0.0b3, 3.0.0b1.post2, 3.0.0b2.post2, 3.0.0b2.post1, 2.3.0, 3.0.0a3, 3.1.0.post1)

Cleaning up...

No distributions matching the version for protobuf==3.1.0 (from tensorflow==0.12.0rc0)

Now upgrade pip and reinstall tensorflow.

easy_install --upgrade pip

pip install --upgrade tensorflow

Monday, December 12, 2016

Python, Gensim, and word2vec

Word2vec in Python, Part Two: Optimizing

https://rare-technologies.com/word2vec-in-python-part-two-optimizing/

https://rare-technologies.com/parallelizing-word2vec-in-python/

https://groups.google.com/forum/#!forum/gensim

cuDNN not available: how to fix on Linux

Download CudNN from NVidia

$ python
Python 2.7.12 (default, Jul 1 2016, 15:12:24)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import theano
Using gpu device 0: GeForce GTX 1060 6GB (CNMeM is enabled with initial size: 95.0% of memory, cuDNN not available)

Download the CudNN for your OS

https://developer.nvidia.com/cudnn

http://deeplearning.net/software/theano/library/sandbox/cuda/dnn.html

Alternatively, on Linux, you can set the environment variables LD_LIBRARY_PATH, LIBRARY_PATH and CPATH to the directory extracted from the download. If needed, separate multiple directories with : as in the PATH environment variable.
example:
```
export LD_LIBRARY_PATH=/home/user/path_to_CUDNN_folder/lib64:$LD_LIBRARY_PATH
export CPATH=/home/user/path_to_CUDNN_folder/include:$CPATH
export LIBRARY_PATH=/home/user/path_to_CUDNN_folder/lib64:$LD_LIBRARY_PATH 
```

sudo mkdir /usr/local/cudnn
sudo mkdir /usr/local/cudnn
cp cuda/* /usr/local/cudnn

add the following to your ~/.bashrc
export CUDNN_ROOT=/usr/local/cudnn
export LD_LIBRARY_PATH=$CUDNN_ROOT/lib64:$LD_LIBRARY_PATH
export CPATH=$CUDNN_ROOT/include:$CPATH
export LIBRARY_PATH=$CUDNN_ROOT/lib64:$LD_LIBRARY_PATH

Fixed:
$ python
Python 2.7.12 (default, Jul 1 2016, 15:12:24)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import theano
Using gpu device 0: GeForce GTX 1060 6GB (CNMeM is enabled with initial size: 95.0% of memory, cuDNN 5105)
>>>

Tuesday, November 15, 2016

Parallelizing HTMLPage Downloaders

Threading a Python HTML down loader does not result in a significant performance improvement.

Use Python asyncio or Twister or Tulip instead.

https://pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22/asyncio-aiohttp.html

http://krondo.com/an-introduction-to-asynchronous-programming-and-twisted/

Monday, November 14, 2016

New deep learning formula for NLP

https://explosion.ai/blog/deep-learning-formula-nlp

Stochastic Gradient Descent: Momentum and Nesterov Momentum

http://www.christianherta.de/lehre/dataScience/machineLearning/neuralNetworks/nesterov-momentum.php

https://blogs.princeton.edu/imabandit/2013/02/05/orf523-advanced-optimization-introduction/

https://blogs.princeton.edu/imabandit/2015/06/30/revisiting-nesterovs-acceleration/

https://arxiv.org/pdf/1405.4980v2.pdf

https://en.wikipedia.org/wiki/Convex_hull

http://mathworld.wolfram.com/ConvexHull.html

https://en.wikipedia.org/wiki/Convex_combination

https://en.wikipedia.org/wiki/Tensor

Sunday, November 13, 2016

Installing CUDA on Ubuntu 16.04

Before you start the installation process make sure you have ssh working so that you can ssh to the machine from another machine if the graphics UI gets stuck in a loop or the black screen appears.

===============================================================

Before reinstalling CUDA remove the old versions

cd ~

rm -fr cuda

If you used apt-get/rpm to install a previous version find installation instructions elsewhere.

Use the following command to uninstall a Toolkit runfile installation:

$ sudo /usr/local/cuda-X.Y/bin/uninstall_cuda_X.Y.pl

Use the following command to uninstall a Driver runfile installation:

$ sudo /usr/bin/nvidia-uninstall

sudo apt-get purge nvidia*

# Note this might remove your cuda installation as well
sudo apt-get autoremove

=======================================================================
# under construction

# download cuda and cudaNN

===============================================================

Read more at: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#ixzz4ZNTXuOhQ

Install cudaNN
https://developer.nvidia.com/rdp/cudnn-download

Download cuDNN v5.1 Library for Linux

Follow the installation instructions:

# from https://groups.google.com/forum/#!topic/theano-users/4qKbh5C_9e4

# First download the file. Example version: cudnn-7.5-linux-x64-v5.0-rc.tgz

https://developer.nvidia.com/cudnn

Extract it to home directory

and set the LD_LIBRARY_PATH to the above extracted directory

and then follow the below steps assuming that /usr/local/cuda is soft linked to the correct cuda version:

sudo cp $HOME/cuda/include/cudnn.h /usr/local/cuda/include/

sudo cp $HOME/cuda/lib64/libcudnn* /usr/local/cuda/lib64/

======================================================================================

sudo /usr/local/cuda-X.Y/bin/uninstall_cuda_X.Y.pl

sudo /usr/bin/nvidia-uninstall

cd ~
rm -fr cuda

sudo apt-get --purge remove nvidia-*
sudo apt-get purge nvidia*
sudo apt-get autoremove

sudo service lightdm stop
sudo apt-get install linux-headers-$(uname -r)

cd Downloads
./cuda_8.0.61_375.26_linux.run -extract=~/Downloads/nvidia_installers;
cd nvidia_installers
sudo ./NVIDIA-Linux-x86_64-367.48.run --no-opengl-files
sudo ./cuda-linux64-rel-8.0.44-21122537.run
sudo ./cuda-samples-linux-8.0.44-21122537.run
sudo update-initramfs -u
sudo reboot
sudo service lightdm start

========================================================================

cd /usr/local/cuda/samples/1_Utilities/deviceQuery
sudo make
./deviceQuery

# you should see something like this

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1060 6GB"

  CUDA Driver Version / Runtime Version          8.0 / 8.0

  CUDA Capability Major/Minor version number:    6.1

  Total amount of global memory:                 6072 MBytes (6367150080 bytes)

  (10) Multiprocessors, (128) CUDA Cores/MP:     1280 CUDA Cores

  GPU Max Clock rate:                            1785 MHz (1.78 GHz)

  Memory Clock rate:                             4004 Mhz

  Memory Bus Width:                              192-bit

  L2 Cache Size:                                 1572864 bytes

  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)

  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers

  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers

  Total amount of constant memory:               65536 bytes

  Total amount of shared memory per block:       49152 bytes

  Total number of registers available per block: 65536

  Warp size:                                     32

  Maximum number of threads per multiprocessor:  2048

  Maximum number of threads per block:           1024

  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)

  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)

  Maximum memory pitch:                          2147483647 bytes

  Texture alignment:                             512 bytes

  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)

  Run time limit on kernels:                     No

  Integrated GPU sharing Host Memory:            No

  Support host page-locked memory mapping:       Yes

  Alignment requirement for Surfaces:            Yes

  Device has ECC support:                        Disabled

  Device supports Unified Addressing (UVA):      Yes

  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0

  Compute Mode:

     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 1060 6GB

Result = PASS

=======================================================================

# Using Python virtual environement setup for deep learning
workon deep_learning
cd ~/Dropbox/cuda
THEANO_FLAGS='mode=FAST_RUN,device=gpu,floatX=float32,optimizer_including=cudnn' python gpu_test.py

=======================================================================

http://kislayabhi.github.io/Installing_CUDA_with_Ubuntu/

alternatively:

https://gist.github.com/wangruohui/df039f0dc434d6486f5d4d098aa52d07#check-the-installation

REMEMBER: DO NOT FOLLOW THE NVIDIA INSTALLATION INSTRUCTIONS

Note that as of Nov. 2016 that the latest version of CUDA is 8.0...

Ignore the error when the Nvidia installing complains and shows the abort message.

Accept the sym link creation option

Set the LD_LIBRARY_LIBS in .bashrc to use the sym link to /usr/local/cuda and not the version

There may be some missing lib errors during the compilation of the /usr/local/cuda/samples directory
Add the missing libs.

sudo apt-get install freeglut3-dev

/usr/local/cuda/samples$ sudo make

[...]

Finished building CUDA samples

/usr/local/cuda/samples$ cat /proc/driver/nvidia/version

NVRM version: NVIDIA UNIX x86_64 Kernel Module 367.48 Sat Sep 3 18:21:08 PDT 2016

GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)

/usr/local/cuda/samples$ nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver

Built on Sun_Sep__4_22:14:01_CDT_2016

Cuda compilation tools, release 8.0, V8.0.44

/usr/local/cuda/samples$ cd 1_Utilities/deviceQuery

usr/local/cuda/samples/1_Utilities/deviceQuery$ ./deviceQuery

./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1060 6GB"

CUDA Driver Version / Runtime Version 8.0 / 8.0

CUDA Capability Major/Minor version number: 6.1

Total amount of global memory: 6069 MBytes (6363873280 bytes)

(10) Multiprocessors, (128) CUDA Cores/MP: 1280 CUDA Cores

GPU Max Clock rate: 1785 MHz (1.78 GHz)

Memory Clock rate: 4004 Mhz

Memory Bus Width: 192-bit

L2 Cache Size: 1572864 bytes

Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)

Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers

Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 49152 bytes

Total number of registers available per block: 65536

Warp size: 32

Maximum number of threads per multiprocessor: 2048

Maximum number of threads per block: 1024

Max dimension size of a thread block (x,y,z): (1024, 1024, 64)

Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)

Maximum memory pitch: 2147483647 bytes

Texture alignment: 512 bytes

Concurrent copy and kernel execution: Yes with 2 copy engine(s)

Run time limit on kernels: Yes

Integrated GPU sharing Host Memory: No

Support host page-locked memory mapping: Yes

Alignment requirement for Surfaces: Yes

Device has ECC support: Disabled

Device supports Unified Addressing (UVA): Yes

Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0

Compute Mode:

< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 1060 6GB

Result = PASS

https://medium.com/@acrosson/installing-nvidia-cuda-cudnn-tensorflow-and-keras-69bbf33dce8a#.yxnrimo84

Monday, November 7, 2016

Scala pattern matching

https://kerflyn.wordpress.com/2011/02/14/playing-with-scalas-pattern-matching/

http://fruzenshtein.com/scala-pattern-matching/

Wednesday, November 2, 2016

Disable ASUS Motherboard's UEFI secure boot

https://www.all4os.com/windows/disable-asus-motherboards-uefi-secure-boot.html

Monday, October 31, 2016

MkDocs

http://www.mkdocs.org/

Theme
https://github.com/snide/sphinx_rtd_theme

Sunday, October 30, 2016

NVidia GPU comparison for deep learning.

https://www.reddit.com/r/nvidia/comments/4s0h07/gtx_980ti_1060_1070_or_1080_for_deep_learning/

http://timdettmers.com/2014/08/14/which-gpu-for-deep-learning/

Saturday, October 29, 2016

Spark Dataframe encoders

http://stackoverflow.com/questions/36648128/how-to-store-custom-objects-in-a-dataset

http://stackoverflow.com/questions/39433419/encoder-error-while-trying-to-map-dataframe-row-to-updated-row

Independent of version you can simply use DataFrame API:

import org.apache.spark.sql.functions.{when, lower}

val df = Seq(
  (2012, "Tesla", "S"), (1997, "Ford", "E350"),
  (2015, "Chevy", "Volt")
).toDF("year", "make", "model")

df.withColumn("make", when(lower($"make") === "tesla", "S").otherwise($"make"))

If you really want to use map you should use statically typed Dataset:

import spark.implicits._

case class Record(year: Int, make: String, model: String)

df.as[Record].map {
  case tesla if tesla.make.toLowerCase == "tesla" => tesla.copy(make = "S")
  case rec => rec
}

or at least return an object which will have implicit encoder:

df.map {
  case Row(year: Int, make: String, model: String) => 
    (year, if(make.toLowerCase == "tesla") "S" else make, model)
}

Finally if for some completely crazy reason you really want to map over Dataset[Row] you have to provide required encoder:

import org.apache.spark.sql.catalyst.encoders.RowEncoder
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row

// Yup, it would be possible to reuse df.schema here
val schema = StructType(Seq(
  StructField("year", IntegerType),
  StructField("make", StringType),
  StructField("model", StringType)
))

val encoder = RowEncoder(schema)

df.map {
  case Row(year, make: String, model) if make.toLowerCase == "tesla" => 
    Row(year, "S", model)
  case row => row
} (encoder)

programming matrix