CUDA support¶
CUDA Contexts¶
-
class
CudaDeviceManager
¶ Public Functions
Get the CUDA driver context for a particular device.
- Parameters
device_number
: the CUDA deviceout
: cached context
Get the shared CUDA driver context for a particular device.
- Parameters
device_number
: the CUDA devicehandle
: CUDA context handler created by another libraryout
: shared context
Allocate host memory with fast access to given GPU device.
- Parameters
device_number
: the CUDA devicenbytes
: number of bytesout
: the allocated buffer
-
class
CudaContext
: public std::enable_shared_from_this<CudaContext>¶ Friendlier interface to the CUDA driver API.
Public Functions
Allocate CUDA memory on GPU device for this context.
- Return
- Status
- Parameters
nbytes
: number of bytesout
: the allocated buffer
Create a view of CUDA memory on GPU device of this context.
- Return
- Status
- Note
- The caller is responsible for allocating and freeing the memory as well as ensuring that the memory belongs to the CUDA context that this CudaContext instance holds.
- Parameters
data
: the starting device addressnbytes
: number of bytesout
: the view buffer
Open existing CUDA IPC memory handle.
- Return
- Status
- Parameters
ipc_handle
: opaque pointer to CUipcMemHandle (driver API)out
: a CudaBuffer referencing the IPC segment
-
Status
CloseIpcBuffer
(CudaBuffer *buffer)¶ Close memory mapped with IPC buffer.
- Return
- Status
- Parameters
buffer
: a CudaBuffer referencing
-
void *
handle
() const¶ Expose CUDA context handle to other libraries.
-
int
device_number
() const¶ Return device number.
-
Status
GetDeviceAddress
(uint8_t *addr, uint8_t **devaddr)¶ Return the device address that is reachable from kernels running in the context.
The device address is defined as a memory address accessible by device. While it is often a device memory address, it can be also a host memory address, for instance, when the memory is allocated as host memory (using cudaMallocHost or cudaHostAlloc) or as managed memory (using cudaMallocManaged) or the host memory is page-locked (using cudaHostRegister).
- Return
- Status
- Parameters
addr
: device or host memory addressdevaddr
: the device address
Device and Host Buffers¶
-
class
CudaBuffer
: public arrow::Buffer¶ An Arrow buffer located on a GPU device.
Be careful using this in any Arrow code which may not be GPU-aware
Public Functions
-
Status
CopyToHost
(const int64_t position, const int64_t nbytes, void *out) const¶ Copy memory from GPU device to CPU host.
- Return
- Status
- Parameters
position
: start position inside buffer to copy bytes fromnbytes
: number of bytes to copyout
: start address of the host memory area to copy to
-
Status
CopyFromHost
(const int64_t position, const void *data, int64_t nbytes)¶ Copy memory to device at position.
- Return
- Status
- Parameters
position
: start position to copy bytes todata
: the host data to copynbytes
: number of bytes to copy
-
Status
CopyFromDevice
(const int64_t position, const void *data, int64_t nbytes)¶ Copy memory from device to device at position.
- Return
- Status
- Note
- It is assumed that both source and destination device memories have been allocated within the same context.
- Parameters
position
: start position inside buffer to copy bytes todata
: start address of the device memory area to copy fromnbytes
: number of bytes to copy
Copy memory from another device to device at position.
- Return
- Status
- Parameters
src_ctx
: context of the source device memoryposition
: start position inside buffer to copy bytes todata
: start address of the another device memory area to copy fromnbytes
: number of bytes to copy
Expose this device buffer as IPC memory which can be used in other processes.
- Return
- Status
- Note
- After calling this function, this device memory will not be freed when the CudaBuffer is destructed
- Parameters
handle
: the exported IPC handle
Public Static Functions
Convert back generic buffer into CudaBuffer.
- Return
- Status
- Note
- This function returns an error if the buffer isn’t backed by GPU memory
- Parameters
buffer
: buffer to convertout
: conversion result
-
Status
Allocate CUDA-accessible memory on CPU host.
- Return
- Status
- Parameters
device_number
: device to expose host memorysize
: number of bytesout
: the allocated buffer
-
class
CudaHostBuffer
: public arrow::MutableBuffer¶ Device-accessible CPU memory created using cudaHostAlloc.
Device Memory Input / Output¶
-
class
CudaBufferReader
: public arrow::io::BufferReader¶ File interface for zero-copy read from CUDA buffers.
Note: Reads return pointers to device memory. This means you must be careful using this interface with any Arrow code which may expect to be able to do anything other than pointer arithmetic on the returned buffers
-
class
CudaBufferWriter
: public arrow::io::WritableFile¶ File interface for writing to CUDA buffers, with optional buffering.
Public Functions
-
Status
SetBufferSize
(const int64_t buffer_size)¶ Set CPU buffer size to limit calls to cudaMemcpy.
By default writes are unbuffered
- Return
- Status
- Parameters
buffer_size
: the size of CPU buffer to allocate
-
int64_t
buffer_size
() const¶ Returns size of host (CPU) buffer, 0 for unbuffered.
-
int64_t
num_bytes_buffered
() const¶ Returns number of bytes buffered on host.
-
Status
CUDA IPC¶
-
class
CudaIpcMemHandle
¶ Public Functions
Write CudaIpcMemHandle to a Buffer.
- Return
- Status
- Parameters
pool
: a MemoryPool to allocate memory fromout
: the serialized buffer
Public Static Functions
Create CudaIpcMemHandle from opaque buffer (e.g.
from another process)
- Return
- Status
- Parameters
opaque_handle
: a CUipcMemHandle as a const void*handle
: the CudaIpcMemHandle instance
Write record batch message to GPU device memory.
- Return
- Status
- Parameters
batch
: record batch to writectx
: CudaContext to allocate device memory fromout
: the returned device buffer which contains the record batch message
ReadRecordBatch specialized to handle metadata on CUDA device.
- Parameters
schema
: the Schema for the record batchbuffer
: a CudaBuffer containing the complete IPC messagepool
: a MemoryPool to use for allocating space for the metadataout
: the reconstructed RecordBatch, with device pointers