Saturday, December 31, 2016

Pitch deck writing resources

Wednesday, December 28, 2016

Apache Spark gpg

browse to http://www.apache.org/dist/spark/spark-2.1.0/ or the version you want

download the tag and corresponding asc files

wget http://www.apache.org/dist/spark/spark-2.1.0/spark-2.1.0-bin-hadoop2.7.tgz
wget http://www.apache.org/dist/spark/spark-2.1.0/spark-2.1.0-bin-hadoop2.7.tgz.asc

  
https://www.apache.org/info/verification.html
$ wget http://www.apache.org/dist/spark/KEYS

$ gpg --import KEYS

$ gpg --verify  spark-2.1.0-bin-hadoop2.7.tgz.asc spark-2.1.0-bin-hadoop2.7.tgz
gpg: Signature made Thu Dec 15 18:18:33 2016 PST using RSA key ID FC8ED089
gpg: Good signature from "Patrick Wendell " [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: EEDA BD1C 71C5 48D6 F006  61D3 7C6C 105F FC8E D089


$ gpg --keyserver pgpkeys.mit.edu --recv-key FC8ED089
gpg: requesting key FC8ED089 from hkp server pgpkeys.mit.edu
gpg: key FC8ED089: "Patrick Wendell " not changed
gpg: Total number processed: 1
gpg:              unchanged: 1

$ gpg --verify  spark-2.1.0-bin-hadoop2.7.tgz.asc spark-2.1.0-bin-hadoop2.7.tgz


Monday, December 19, 2016

Adding an nltk stop word filter

$ python
> from nltk.corpus import stopwords
> nltk_stopwords = stopwords.words('english')

except LookupError: raise e
LookupError: 
**********************************************************************
  Resource u'corpora/stopwords' not found.  Please use the NLTK
  Downloader to obtain the resource:  >>> nltk.download()
  Searched in:
    - '/Users/depappas/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************

Process finished with exit code 1

Install NLTK Data
This will take a few minutes if your Internet download speed is around 15mbs.

http://www.nltk.org/data.html
$ python
> import nltk
> nltk.download()

showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml

Click on the Corpora tab and select the stopwords package you want. Wait for it to download.



from nltk.corpus import



Now this should work...
 

$ python
Python 2.7.12 (default, Oct 11 2016, 05:20:59) 
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download()
showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
True
>>> from nltk.corpus import stopwords
>>> nltk_stopwords = stopwords.words('english')
>>> 
Done!

Sunday, December 18, 2016

New Mac setup for deep learning, Golang, Scala, and SPARK

http://brew.sh
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Brew Install
brew install tree
brew install htop
brew install python
brew install wget

Apple command line tools (gcc)
https://developer.apple.com/download/more/
After installation is completed, run “gcc -v” in terminal again. If everything fine, following output will be displayed.

Dmg Downloads
brew install gnupg
gpg --keyserver pgpkeys.mit.edu --recv-key 83135D45
gpg --verify KeePassX-2.0-beta2.dmg.sig KeePassX-2.0-beta2.dmg

Verify downloads
All downloads and Git tags are signed with the key 164C70512F7929476764AB56FE22C6FD83135D45

Python Virtual Setup
http://www.marinamele.com/2014/05/install-python-virtualenv-virtualenvwrapper-mavericks.html

Languages
Python
http://www.marinamele.com/2014/05/install-python-virtualenv-virtualenvwrapper-mavericks.html
brew install python
pip install virtualenv
pip install virtualenvwrapper
# on OS X if you can’t install virtualenv with pip then use this workaround
pip install --index-url=http://pypi.python.org/simple/ --trusted-host pypi.python.org  virtualenv
pip install --index-url=http://pypi.python.org/simple/ --trusted-host pypi.python.org  virtualenvwrapper

# for the Homebrew installed path
~/.bash_profile : export PATH=/usr/local/share/python:$PATH

# Python virtualenv workon setup
export WORKON_HOME=~/.virtualenvs
source /usr/local/bin/virtualenvwrapper.sh

#Setup the certificates
http://programmingmatrix.blogspot.com/2016/10/pip-install-fails-with-connection-error.html

$ source ~/.bash_profile
$ cd
$ mkdir .virtualenvs
$ cd .virtualenvs
$ virtualenv test
$ workon test

# now you are using the pip in the .virtualenvs/test tree
# if pip install does not work upgrade pip
easy_install --upgrade pip

Deep Learning
http://programmingmatrix.blogspot.com/2016/12/osx-how-to-install-tensorflow-upgrade.html
https://www.tensorflow.org/get_started/os_setup

wget https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-0.12.0rc1-py2-none-any.whl
pip install --upgrade tensorflow-0.12.0rc1-py2-none-any.whl

pip install keras
pip install numpy
pip install scipy
pip install matplot
pip install gensim
pip install ioutils
pip install Cython

Java
http://www.oracle.com/technetwork/java/javase/downloads/jdk8-
downloads-2133151.html

https://docs.oracle.com/javase/8/docs/technotes/guides/install/mac_jdk.html

http://stackoverflow.com/questions/1348842/what-should-i-set-java-home-to-on-osx

~/.bash_profile : export JAVA_HOME="`/usr/libexec/java_home -v '1.8*'`"

Scala
http://scala-ide.org

https://www.scala-lang.org/download/

~/.bash_profile:
export SCALA_HOME=$HOME/scala-2.12.1
export SCALA_BIN=$SCALA_HOME/bin
export PATH=$PATH:$SCALA_BIN

Spark
http://spark.apache.org/downloads.html

http://programmingmatrix.blogspot.com/2016/10/set-up-apache-spark-keys-and-verifying.html

cd
mv Downloads/spark-2.0.2-bin-hadoop2.7.tar .
tar xvf spark-2.0.2-bin-hadoop2.7.tar 

~/.bash_profile:
export SPARK_HOME=$HOME/spark
export SPARK_BIN=$SPARK_HOME/bin
export PATH=$PATH:$SPARK_BIN

ln -s spark-2.0.2-bin-hadoop2.7 spark

Golang
https://golang.org/dl/

Atom Editor 
https://atom.io
http://programmingmatrix.blogspot.com/2016/03/atom-text-editor.html


Thursday, December 15, 2016

OSX: how to install Tensorflow: upgrade pip and use virtualenv

After you have setup and activated you virtualenv if you run into this error on your Mac while trying to install tensorflow, upgrade pip and retry installing tensorflow.

Get the latest .whl file version from the virtualenv section of the following page:
https://www.tensorflow.org/get_started/os_setup#virtualenv_installation

pip install --upgrade tensorflow-0.12.0rc0-py2-none-any.whl
Unpacking ./tensorflow-0.12.0rc0-py2-none-any.whl
Downloading/unpacking protobuf==3.1.0 (from tensorflow==0.12.0rc0)
  Could not find a version that satisfies the requirement protobuf==3.1.0 (from tensorflow==0.12.0rc0) (from versions: 3.0.0b4, 3.0.0, 3.0.0b2.post2, 3.0.0a2, 3.0.0b2, 2.6.1, 2.0.3, 2.0.0beta, 2.5.0, 2.4.1, 2.6.0, 3.0.0b2.post1, 3.0.0b3, 3.0.0b1.post2, 3.0.0b2.post2, 3.0.0b2.post1, 2.3.0, 3.0.0a3, 3.1.0.post1)
Cleaning up...

No distributions matching the version for protobuf==3.1.0 (from tensorflow==0.12.0rc0)

Now upgrade pip and reinstall tensorflow.

easy_install --upgrade pip


pip install --upgrade tensorflow


Monday, December 12, 2016

Python, Gensim, and word2vec

cuDNN not available: how to fix on Linux

Download CudNN from NVidia

$ python
Python 2.7.12 (default, Jul  1 2016, 15:12:24)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import theano
Using gpu device 0: GeForce GTX 1060 6GB (CNMeM is enabled with initial size: 95.0% of memory, cuDNN not available)

Download the CudNN for your OS

https://developer.nvidia.com/cudnn

http://deeplearning.net/software/theano/library/sandbox/cuda/dnn.html
  • Alternatively, on Linux, you can set the environment variables LD_LIBRARY_PATH, LIBRARY_PATH and CPATH to the directory extracted from the download. If needed, separate multiple directories with : as in the PATH environment variable.
    example:
    export LD_LIBRARY_PATH=/home/user/path_to_CUDNN_folder/lib64:$LD_LIBRARY_PATH
    export CPATH=/home/user/path_to_CUDNN_folder/include:$CPATH
    export LIBRARY_PATH=/home/user/path_to_CUDNN_folder/lib64:$LD_LIBRARY_PATH 

sudo mkdir /usr/local/cudnn
sudo mkdir /usr/local/cudnn
cp cuda/* /usr/local/cudnn

add the following to your ~/.bashrc
export CUDNN_ROOT=/usr/local/cudnn
export LD_LIBRARY_PATH=$CUDNN_ROOT/lib64:$LD_LIBRARY_PATH
export CPATH=$CUDNN_ROOT/include:$CPATH
export LIBRARY_PATH=$CUDNN_ROOT/lib64:$LD_LIBRARY_PATH

Fixed:
$ python
Python 2.7.12 (default, Jul  1 2016, 15:12:24)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import theano
Using gpu device 0: GeForce GTX 1060 6GB (CNMeM is enabled with initial size: 95.0% of memory, cuDNN 5105)
>>>