Wednesday, August 2, 2017

Converting from binary strings to numpy

# https://imageio.readthedocs.io/en/latest/examples.html

import imageio
import sys

video = sys.argv[1]

reader = imageio.get_reader(video)

im = reader.get_next_data()

print(im)
print(type(im))

# a variation
# read a video frame using an imageio iterator

import imageio
import sys

video = sys.argv[1]

reader = imageio.get_reader(video)

im = reader.get_next_data()

print(im)
print(type(im))

n = np.frombuffer(im, dtype="uint8")

print(im)
print(im.shape)


################################################

$ python iter_video.py bcn_rm_test1/bcn_rm_2017.mp4
[[[ 15  28  34]
  [ 51  64  70]
  [ 55  68  74]
  ...,
 [[  6  38   6]
  [ 76 108  76]
  [ 66 102  26]
  ...,
  [ 65  99  21]
  [ 65  99  21]
  [ 46  80   2]]]


https://groups.google.com/forum/#!msg/imageio/dGKC0Iwrz78/880fyaFEAAAJ




https://stackoverflow.com/questions/22236749/numpy-what-is-the-difference-between-frombuffer-and-fromstring

They appear to give the same result to me:
In [32]: s
Out[32]: '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

In [27]: np.frombuffer(s, dtype="int8")
Out[27]:
array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0, 21,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0], dtype=int8)

In [28]: np.fromstring(s, dtype="int8")
Out[28]:
array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0, 21,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0], dtype=int8)

In [33]: b = buffer(s)

In [34]: b
Out[34]: <read-only buffer for 0x035F8020, size -1, offset 0 at 0x036F13A0>

In [35]: np.fromstring(b, dtype="int8")
Out[35]:
array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0, 21,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0], dtype=int8)

In [36]: np.frombuffer(b, dtype="int8")
Out[36]:
array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0, 21,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0], dtype=int8)
 
From a practical standpoint, the difference is that:
x = np.fromstring(s, dtype='int8')
Will make a copy of the string in memory, while:
x = np.frombuffer(s, dtype='int8')
or
x = np.frombuffer(buffer(s), dtype='int8')
Will use the memory buffer of the string directly and won't use any* additional memory. Using frombuffer will also result in a read-only array if the input to buffer is a string, as strings are immutable in python.
(*Neglecting a few bytes of memory used for an additional python ndarray object -- The underlying memory for the data will be shared.)

If you're not familiar with buffer objects (memoryview in python3.x), they're essentially a way for C-level libraries to expose a block of memory for use in python. It's basically a python interface for managed access to raw memory.
If you were working with something that exposed the buffer interface, then you'd probably want to use frombuffer. (Python 2.x strings and python 3.x bytes expose the buffer interface, but you'll get a read-only array, as python strings are immutable.)
Otherwise, use fromstring to create a numpy array from a string. (Unless you know what you're doing, and want to tightly control memory use, etc.)
 

No comments:

Post a Comment