pyarrow.plasma.PlasmaClient

class pyarrow.plasma.PlasmaClient

Bases: object

The PlasmaClient is used to interface with a plasma store and manager.

The PlasmaClient can ask the PlasmaStore to allocate a new buffer, seal a buffer, and get a buffer. Buffers are referred to by object IDs, which are strings.

__init__()

Initialize self. See help(type(self)) for accurate signature.

Methods

contains(self, ObjectID object_id) Check if the object is present and sealed in the PlasmaStore.
create(self, ObjectID object_id, …) Create a new buffer in the PlasmaStore for a particular object ID.
create_and_seal(self, ObjectID object_id, …) Store a new object in the PlasmaStore for a particular object ID.
decode_notification(self, const uint8_t *buf) Get the notification from the buffer.
delete(self, object_ids) Delete the objects with the given IDs from other object store.
disconnect(self) Disconnect this client from the Plasma store.
evict(self, int64_t num_bytes) Evict some objects until to recover some bytes.
get(self, object_ids, int timeout_ms=-1[, …]) Get one or more Python values from the object store.
get_buffers(self, object_ids[, timeout_ms, …]) Returns data buffer from the PlasmaStore based on object ID.
get_metadata(self, object_ids[, timeout_ms]) Returns metadata buffer from the PlasmaStore based on object ID.
get_next_notification(self) Get the next notification from the notification socket.
get_notification_socket(self) Get the notification socket.
hash(self, ObjectID object_id) Compute the checksum of an object in the object store.
list(self) Experimental: List the objects in the store.
put(self, value, ObjectID object_id=None, …) Store a Python value into the object store.
put_raw_buffer(self, value, …) Store Python buffer into the object store.
seal(self, ObjectID object_id) Seal the buffer in the PlasmaStore for a particular object ID.
store_capacity(self) Get the memory capacity of the store.
subscribe(self) Subscribe to notifications about sealed objects.
to_capsule(self)

Attributes

store_socket_name
contains(self, ObjectID object_id)

Check if the object is present and sealed in the PlasmaStore.

Parameters:object_id (ObjectID) – A string used to identify an object.
create(self, ObjectID object_id, int64_t data_size, string metadata=b'')

Create a new buffer in the PlasmaStore for a particular object ID.

The returned buffer is mutable until seal is called.

Parameters:
  • object_id (ObjectID) – The object ID used to identify an object.
  • size (int) – The size in bytes of the created buffer.
  • metadata (bytes) – An optional string of bytes encoding whatever metadata the user wishes to encode.
Raises:
  • PlasmaObjectExists – This exception is raised if the object could not be created because there already is an object with the same ID in the plasma store.
  • PlasmaStoreFull: This exception is raised if the object could – not be created because the plasma store is unable to evict enough objects to create room for it.
create_and_seal(self, ObjectID object_id, string data, string metadata=b'')

Store a new object in the PlasmaStore for a particular object ID.

Parameters:
  • object_id (ObjectID) – The object ID used to identify an object.
  • data (bytes) – The object to store.
  • metadata (bytes) – An optional string of bytes encoding whatever metadata the user wishes to encode.
Raises:
  • PlasmaObjectExists – This exception is raised if the object could not be created because there already is an object with the same ID in the plasma store.
  • PlasmaStoreFull: This exception is raised if the object could – not be created because the plasma store is unable to evict enough objects to create room for it.
decode_notification(self, const uint8_t *buf)

Get the notification from the buffer.

Returns:
  • ObjectID – The object ID of the object that was stored.
  • int – The data size of the object that was stored.
  • int – The metadata size of the object that was stored.
delete(self, object_ids)

Delete the objects with the given IDs from other object store.

Parameters:object_ids (list) – A list of strings used to identify the objects.
disconnect(self)

Disconnect this client from the Plasma store.

evict(self, int64_t num_bytes)

Evict some objects until to recover some bytes.

Recover at least num_bytes bytes if possible.

Parameters:num_bytes (int) – The number of bytes to attempt to recover.
get(self, object_ids, int timeout_ms=-1, serialization_context=None)

Get one or more Python values from the object store.

Parameters:
  • object_ids (list or ObjectID) – Object ID or list of object IDs associated to the values we get from the store.
  • timeout_ms (int, default -1) – The number of milliseconds that the get call should block before timing out and returning. Pass -1 if the call should block and 0 if the call should return immediately.
  • serialization_context (pyarrow.SerializationContext, default None) – Custom serialization and deserialization context.
Returns:

list or object – Python value or list of Python values for the data associated with the object_ids and ObjectNotAvailable if the object was not available.

get_buffers(self, object_ids, timeout_ms=-1, with_meta=False)

Returns data buffer from the PlasmaStore based on object ID.

If the object has not been sealed yet, this call will block. The retrieved buffer is immutable.

Parameters:
  • object_ids (list) – A list of ObjectIDs used to identify some objects.
  • timeout_ms (int) – The number of milliseconds that the get call should block before timing out and returning. Pass -1 if the call should block and 0 if the call should return immediately.
  • with_meta (bool) –
Returns:

list – If with_meta=False, this is a list of PlasmaBuffers for the data associated with the object_ids and None if the object was not available. If with_meta=True, this is a list of tuples of PlasmaBuffer and metadata bytes.

get_metadata(self, object_ids, timeout_ms=-1)

Returns metadata buffer from the PlasmaStore based on object ID.

If the object has not been sealed yet, this call will block. The retrieved buffer is immutable.

Parameters:
  • object_ids (list) – A list of ObjectIDs used to identify some objects.
  • timeout_ms (int) – The number of milliseconds that the get call should block before timing out and returning. Pass -1 if the call should block and 0 if the call should return immediately.
Returns:

list – List of PlasmaBuffers for the metadata associated with the object_ids and None if the object was not available.

get_next_notification(self)

Get the next notification from the notification socket.

Returns:
  • ObjectID – The object ID of the object that was stored.
  • int – The data size of the object that was stored.
  • int – The metadata size of the object that was stored.
get_notification_socket(self)

Get the notification socket.

hash(self, ObjectID object_id)

Compute the checksum of an object in the object store.

Parameters:object_id (ObjectID) – A string used to identify an object.
Returns:bytes – A digest string object’s hash. If the object isn’t in the object store, the string will have length zero.
list(self)

Experimental: List the objects in the store.

Returns:dict – Dictionary from ObjectIDs to an “info” dictionary describing the object. The “info” dictionary has the following entries:
data_size
size of the object in bytes
metadata_size
size of the object metadata in bytes
ref_count
Number of clients referencing the object buffer
create_time
Unix timestamp of the creation of the object
construct_duration
Time the creation of the object took in seconds
state
”created” if the object is still being created and “sealed” if it is already sealed
put(self, value, ObjectID object_id=None, int memcopy_threads=6, serialization_context=None)

Store a Python value into the object store.

Parameters:
  • value (object) – A Python object to store.
  • object_id (ObjectID, default None) – If this is provided, the specified object ID will be used to refer to the object.
  • memcopy_threads (int, default 6) – The number of threads to use to write the serialized object into the object store for large objects.
  • serialization_context (pyarrow.SerializationContext, default None) – Custom serialization and deserialization context.
Returns:

The object ID associated to the Python object.

put_raw_buffer(self, value, ObjectID object_id=None, string metadata=b'', int memcopy_threads=6)

Store Python buffer into the object store.

Parameters:
  • value (Python object that implements the buffer protocol) – A Python buffer object to store.
  • object_id (ObjectID, default None) – If this is provided, the specified object ID will be used to refer to the object.
  • metadata (bytes) – An optional string of bytes encoding whatever metadata the user wishes to encode.
  • memcopy_threads (int, default 6) – The number of threads to use to write the serialized object into the object store for large objects.
Returns:

The object ID associated to the Python buffer object.

seal(self, ObjectID object_id)

Seal the buffer in the PlasmaStore for a particular object ID.

Once a buffer has been sealed, the buffer is immutable and can only be accessed through get.

Parameters:object_id (ObjectID) – A string used to identify an object.
store_capacity(self)

Get the memory capacity of the store.

Returns:int – The memory capacity of the store in bytes.
store_socket_name
subscribe(self)

Subscribe to notifications about sealed objects.

to_capsule(self)