Thursday, April 20, 2017

SPARK RDD operations

How to print the contents of an RDD?

https://stackoverflow.com/questions/23173488/how-to-print-the-contents-of-rdd


down voteaccepted
If you want to view the content of a RDD, one way is to use collect():
myRDD.collect().foreach(println)
That's not a good idea, though, when the RDD has billions of lines. Use take() to take just a few to print out:
myRDD.take(n).foreach(println)

How to save the contents of an RDD to a single file?


If you want to save in a single file, you can coalesce you RDD into one partition before calling saveAsTextFile, but again this may cause issues. 
I think the best option is to write in multiple files in HDFS, then use hdfs dfs --getmerge in order to merge the files – Oussama Jul 21 '15 at 16:10


SPARK Shell


To load an external file from spark-shell simply do
:load PATH_TO_FILE

https://stackoverflow.com/questions/32808053/spark-shell-command-lines

scala> :help
All commands can be abbreviated, e.g. :he instead of :help.
Those marked with a * have more detailed help, e.g. :help imports.

:cp                  add a jar or directory to the classpath
:help [command]            print this summary or command-specific help
:history [num]             show the history (optional num is commands to show)
:h?                search the history
:imports [name name ...]   show import history, identifying sources of names
:implicits [-v]            show the implicits in scope
:javap <path|class>        disassemble a file or class name
:load                load and interpret a Scala file
:paste                     enter paste mode: all input up to ctrl-D compiled together
:quit                      exit the repl
:replay                    reset execution and replay all previous commands
:reset                     reset the repl to its initial state, forgetting all session entries
:sh <command line>         run a shell command (result is implicitly => List[String])
:silent                    disable/enable automatic printing of results
:fallback                  
disable/enable advanced repl changes, these fix some issues but may introduce others. 
This mode will be removed once these fixes stablize
:type [-v]           display the type of an expression without evaluating it
:warnings                  show the suppressed warnings from the most recent line which had any
As you can see above you can invoke shell commands using :sh. For example:

scala> :sh mkdir foobar
res0: scala.tools.nsc.interpreter.ProcessResult = `mkdir foobar` (0 lines, exit 0)

scala> :sh touch foobar/foo
res1: scala.tools.nsc.interpreter.ProcessResult = `touch foobar/foo` (0 lines, exit 0)

scala> :sh touch foobar/bar
res2: scala.tools.nsc.interpreter.ProcessResult = `touch foobar/bar` (0 lines, exit 0)

scala> :sh ls foobar
res3: scala.tools.nsc.interpreter.ProcessResult = `ls foobar` (2 lines, exit 0)

scala> res3.lines foreach println
bar
foo

Wednesday, April 19, 2017

Python Numpy Slices

>>> np.arange(12)
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
>>> np.arange(12).reshape(3,4)
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> a = np.arange(12).reshape(3,4)
>>> a[slice(None, 3, None)]
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> a[:3]
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> b = np.arange(24).reshape(3,4,2)
>>> b
array([[[ 0,  1],
        [ 2,  3],
        [ 4,  5],
        [ 6,  7]],

       [[ 8,  9],
        [10, 11],
        [12, 13],
        [14, 15]],

       [[16, 17],
        [18, 19],
        [20, 21],
        [22, 23]]])
>>> b[2:]
array([[[16, 17],
        [18, 19],
        [20, 21],
        [22, 23]]])
>>> b[slice(2, None, None)]
array([[[16, 17],
        [18, 19],
        [20, 21],
        [22, 23]]])
>>> b[2:,3]
array([[22, 23]])
>>> b[2:,1:]
array([[[18, 19],
        [20, 21],
        [22, 23]]])
>>> b[2:,:1]
array([[[16, 17]]])
>>> b[slice(2, None, None), slice(None, 1, None)]
array([[[16, 17]]])

Clean Ubuntu 16.04 Tensorflow 1.0 CUDA 8.0 CUDNN 5 Python 3.5 Ubuntu 16.04 installation instructions-finally

Ignore this post if you can use use NVidia Docker...


##############################################

This is under test by a few people.

When Tensorflow changes its Nvidia driver, cuDNN, and cuda version requirements this post will have to be updated

Install Tensorflow 1.3
Ubuntu/Linux 64-bit, GPU enabled, Python 3.5
Requires CUDA toolkit 8.0 and CuDNN v5. 

https://www.tensorflow.org/install/install_linux#InstallingVirtualenv

Uses:
Ubuntu 16.04 (Nvidia cuda 8.0 supports 14.04 and 16.04)
Python 3.5
Tensorflow 1.3 (requires cuda 8.0 and cudnn 6.0)
Nvidia driver version 387
Cuda 8.0
cuDNN 6.0
Keras 2.0

Tensorflow has Nvidia driver and lib deps

https://www.tensorflow.org/install/install_linux

NVIDIA requirements to run TensorFlow with GPU support

If you are installing TensorFlow with GPU support using one of the mechanisms described in this guide, then the following NVIDIA software must be installed on your system:

  • CUDA® Toolkit 8.0. For details, see NVIDIA's documentation. Ensure that you append the relevant Cuda pathnames to the LD_LIBRARY_PATH environment variable as described in the NVIDIA documentation.
  • The NVIDIA drivers associated with CUDA Toolkit 8.0.
  • cuDNN v6. For details, see NVIDIA's documentation. Ensure that you create the CUDA_HOME environment variable as described in the NVIDIA documentation.
  • GPU card with CUDA Compute Capability 3.0 or higher. See NVIDIA documentation for a list of supported GPU cards.
  • The libcupti-dev library, which is the NVIDIA CUDA Profile Tools Interface. This library provides advanced profiling support. To install this library, issue the following command:
    $ sudo apt-get install libcupti-dev


# 1. Setup the virtual env
sudo apt-get -y install python-pip 
sudo apt-get -y install virtualenv
sudo apt-get -y install virtualenvwrapper
pip install virtualenv
pip install virtualenvwrapper

find / -name virtualenvwrapper.sh 2> /dev/null

# /usr/share/virtualenvwrapper/virtualenvwrapper.sh
# ls /usr/share/virtualenvwrapper/virtualenvwrapper.sh
# ls /home/depappas/.local/bin/virtualenvwrapper.sh


# 2. Install OpenCVsudo apt-get install  libopencv-dev
sudo apt-get -y install python-opencv

#3. Set  up the environment

# Keras/Tensorflow setup in ~/.bashrc

# Tensorflow setup
export CUDA_ROOT=/usr/local/cuda
export LD_LIBRARY_PATH=$CUDA_ROOT/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$CUDA_ROOT/lib64:$CUDA_ROOT/extras/CUPTI/lib64"
export CUDNN_ROOT=/usr/local/cuda
export CPATH=$CUDNN_ROOT/include:$CPATH
export KERAS_BACKEND=tensorflow


export CUDA_ROOT=/usr/local/cuda                                                                      
# Python VM
# OSX source /usr/local/bin/virtualenvwrapper.sh
source /home/depappas/.local/bin/virtualenvwrapper.sh # Ubuntu

export WORKON_HOME=~/python_virtual_env


# 4.
source ~/.bashrc
cd ~
mkdir python_virtual_env
cd python_virtual_env
virtualenv --system-site-packages -p python3.5   deep_learning_3.5
workon deep_learning_3.5

# now you are using the pip in the .virtualenvs/test tree

# if pip install does not work upgrade pip

easy_install --upgrade pip


# 5. Install net-tools and configure ssh

sudo apt -y install net-tools

http://ubuntuhandbook.org/index.php/2016/04/enable-ssh-ubuntu-16-04-lts/
sudo apt-get -y install openssh-server
sudo service ssh status
sudo emacs /etc/ssh/sshd_config
# Uncomment  PermitRootLogin prohibit-password
sudo service ssh restart

# set up passwordless login
http://programmingmatrix.blogspot.com/2017/10/ssh-keyless-login-on-ubuntu.html

# 6. Check NVIDIA Compute Capability of your GPU card

# https://developer.nvidia.com/cuda-gpus



# if the machine locks up or you have an older driver installed
sudo apt-get purge nvidia* # or what ever old driver you are using

To add the Proprietary GPU Drivers PPA in Ubuntu or Linux Mint and update the software sources, use the following commands:
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update


sudo apt-get install nvidia-387


# alternatively

2. Install (and activate) the latest Nvidia graphics drivers

From System Settings or directly from the menu / Dash, open Software & Updates, click on the "Additional Drivers" tab, select the driver (I selected 387 which works on Ubuntu 16.04 and a 1060 GPU) you want to use, and click "Apply changes":





# 7. Install the Nvidia 384 driver, CUDA 8.0, and CUDNN 5.1 (may change this to 6.0)


# 8. Install the libcupti-dev library, which is the NVIDIA CUDA Profile Tools Interface.
# This library provides advanced profiling support.
# To install this library, issue the following command:


sudo apt-get -y install libcupti-dev

# 9. Check that the GPU is detected
# https://github.com/tensorflow/tensorflow/issues/394

sudo reboot

nvidia-smi

Found 1 NVIDIA devices
Device ID: 0
Device name: GeForce GTX TITAN X (*PrimaryCard)
GPU internal ID: 0420115018258

Tue Dec 15 23:56:17 2015
+------------------------------------------------------+
| NVIDIA-SMI 352.63 Driver Version: 352.63 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT... Off | 0000:04:00.0 On | N/A |
| 22% 33C P8 17W / 250W | 441MiB / 12287MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1270 G /usr/bin/X 174MiB |
| 0 2183 G compiz 112MiB |
| 0 2575 G ...ves-passed-by-fd --v8-snapshot-passed-by- 127MiB |


# 9. Misc. install
sudo apt-get -y install pandoc
sudo apt-get -y install graphviz
sudo apt-get -y install pandoc

# 10. Install Python packages
pip3 install keras 
pip3 install h5py
pip3 install numpy
pip3 install matplotlib 
pip3 install gensim 
pip3 install ioutils 
pip3 install Cython
# https://github.com/jazzsaxmafia/video_to_sequence/issues/3
pip3 install opencv-python
pip3 install keras
pip3 install sklearn
pip3 install pypandoc
pip3 install pandoc
pip3 install keras_diagram
pip3 install tensorflow-gpu
pip3 install h5py
pip3 install seaborn
pip3 install python-flake8
pip3 install pandas
pip3 install pydot
pip3 install pydot-ng

# 11. Install Tensorflow

# don't do this!
# pip3 install tensorflow
# pip3 install --upgrade tensorflow-gpu

https://stackoverflow.com/questions/39817645/cuda-cudnn-installed-but-tensorflow-cant-use-the-gpu


https://www.tensorflow.org/install/install_linux#InstallingVirtualenv

################################################################################

Don't do the following, this is just an example:
  1. (Optional) If Step 4 failed (typically because you invoked a pip version lower than 8.1), install TensorFlow in the active virtualenv environment by issuing a command of the following format:
    (tensorflow)$ pip install --upgrade tfBinaryURL # Python 2.7 (tensorflow)$ pip3 install --upgrade tfBinaryURL # Python 3.n
    where tfBinaryURL identifies the URL of the TensorFlow Python package. The appropriate value of tfBinaryURLdepends on the operating system, Python version, and GPU support. Find the appropriate value for tfBinaryURL for your system here. For example, if you are installing TensorFlow for Linux, Python 3.4, and CPU-only support, issue the following command to install TensorFlow in the active virtualenv environment:
    (tensorflow)$ pip3 install --upgrade \ https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.3.0-cp34-cp34m-linux_x86_64.whl

# select a url to install based on your version requirements
https://www.tensorflow.org/install/install_linux#the_url_of_the_tensorflow_python_package

################################################################################

Do this:

# Python 3.5 GPU support

export tfBinaryURL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.3.0-cp35-cp35m-linux_x86_64.whl

pip3 install --upgrade $tfBinaryURL

# 12. test that tensorflow links with libcudnn

workon deep learning_3.5
python
>>> import tensorflow

# 13. Check the Tensorflow version

python -c "import tensorflow; print(tensorflow.__version__)"

# 14. Test that Tensorflow is using the GPU

python -c "import tensorflow as tf; sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))"

##### Expected Output #########

I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1060 6GB
major: 6 minor: 1 memoryClockRate (GHz) 1.7845
pciBusID 0000:01:00.0
Total memory: 5.93GiB
Free memory: 5.66GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0)
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0
I tensorflow/core/common_runtime/direct_session.cc:257] Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0

# http://stackoverflow.com/questions/43335531/how-to-use-sse4-1-instructions-without-install-tensorflow-from-source

# 15. Setup Keras
For best performance, set `image_data_format="channels_last"` in your Keras config at ~/.keras/keras.json.

# Or use env vars

export KERAS_BACKEND=tensorflow

# 16. test Keras
git clone http://github.com/rcmalli/keras-squeezenet.git
cd keras-squeezenet
python test.py

Using TensorFlow backend.
Downloading data from https://github.com/rcmalli/keras-squeezenet/releases/download/v1.0/squeezenet_weights_tf_dim_ordering_tf_kernels.h5
4530176/5059384 [=========================>....] - ETA: 0s2017-10-18 20:57:06.845299: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-18 20:57:06.845317: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-18 20:57:06.845321: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-10-18 20:57:06.845324: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-18 20:57:06.845326: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-10-18 20:57:06.952926: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-10-18 20:57:06.953146: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: GeForce GTX 1060 6GB
major: 6 minor: 1 memoryClockRate (GHz) 1.7845
pciBusID 0000:01:00.0
Total memory: 5.93GiB
Free memory: 5.79GiB
2017-10-18 20:57:06.953159: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2017-10-18 20:57:06.953162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2017-10-18 20:57:06.953167: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0)

(deep_learning_3.5) depappas@berlin:~/keras-squeezenet


# 17. Install TFLearn
# http://tflearn.org/installation/
# TFLearn Installation
# To install TFLearn, the easiest way is to run one of the following options.
# For the bleeding edge version:

pip install git+https://github.com/tflearn/tflearn.git

# For the latest stable version:

pip install tflearn

#You can also install from source by running this command (from source folder):


python setup.py install