dask cudf中的内存用完

发布于 2025-02-10 07:43:01 字数 1877 浏览 0 评论 0原文

最近，我一直在尝试在我最近的项目中解决DASK_CUDF中的内存管理问题，但似乎我缺少一些东西，需要您的帮助。我正在使用15 GIB内存的Tesla T4 GPU。我有几个ETL步骤，但是最近GPU似乎在大多数方面都失败了（其中大多数只是过滤或转换步骤，但很少旋转改组）。我的数据包括大约20个500MB木板文件。对于这个特定问题，我将提供我用于过滤的代码，这使得GPU由于缺乏内存而失败。

我首先设置一个CUDA群集：

CUDA_VISIBLE_DEVICES = os.environ.get("CUDA_VISIBLE_DEVICES", "0")

cluster = LocalCUDACluster(
#     rmm_pool_size=get_rmm_size(0.6 * device_mem_size()),
    CUDA_VISIBLE_DEVICES=CUDA_VISIBLE_DEVICES,
    local_directory=os.path.join(WORKING_DIR, "dask-space"),
    device_memory_limit=parse_bytes("12GB")
)
client = Client(cluster)
client

取决于我是否提供rmm_pool_size参数，错误是不同的。提供参数后，我得到了最大池限制，否则我会收到以下错误： memoryError：std :: bad_alloc：cuda错误at：../ include/rmm/mr/mr/device/cuda_memory_resource.hpp:70：cudaerrormemoryallocation out memory

接下来，我创建一个我创建的过滤操作，我会在在数据上（旋转检查列中的值是否出现在包含80000个值的集合中）：

def remove_invalid_values_filter_factory(valid_value_set_or_series):
    def f(df):
        mask = df['col'].isin(valid_value_set_or_series)
        return df.loc[mask]
    return f

# Load valid values from another file
valid_values_info_df = pd.read_csv(...)
# The series is around 1 MiB in size
keep_known_values_only = remove_invalid_values_filter_factory(valid_values_info_df['values'])
# Tried both and both cause the error
# keep_known_values_only = remove_invalid_values_filter_factory(set(valid_values_info_df['values']))

最后，我将此过滤器操作应用于数据并获得错误：

%%time
# Error occures during this processing step
keep_known_values_only(
    dask_cudf.read_parquet(...)
).to_parquet(...)

我感到完全丢失，我遇到的大多数源都有此错误使用无DASK或不设置CUDA群集的CUDF的结果，但我都有。此外，直觉上的过滤操作不应该是内存昂贵的，所以我不知道该怎么做。我认为如何设置群集有问题，修复它将使其余的内存更昂贵的操作也希望也可以正常工作。

感谢您的帮助，谢谢！

原文

I've been trying to solve memory management issues in dask_cudf in my recent project for quite some time recently, but it seems I'm missing something and I need your help. I am working on Tesla T4 GPU with 15 GiB memory. I have several ETL steps but the GPU recently seems to be failing on most of them (most of them are just filtering or transformation steps, but few revolve shuffling). My data consists of around 20 500MB parquet files. For this specific question I will provide a piece of code I use for filtering which makes the GPU fail due to lack of memory.

I start by setting up a CUDA cluster:

CUDA_VISIBLE_DEVICES = os.environ.get("CUDA_VISIBLE_DEVICES", "0")

cluster = LocalCUDACluster(
#     rmm_pool_size=get_rmm_size(0.6 * device_mem_size()),
    CUDA_VISIBLE_DEVICES=CUDA_VISIBLE_DEVICES,
    local_directory=os.path.join(WORKING_DIR, "dask-space"),
    device_memory_limit=parse_bytes("12GB")
)
client = Client(cluster)
client

Depending whether I provide rmm_pool_size parameter the error is different. When the parameter is provided I get that Maximum pool limit is exceeded and otherwise I get the following error:
MemoryError: std::bad_alloc: CUDA error at: ../include/rmm/mr/device/cuda_memory_resource.hpp:70: cudaErrorMemoryAllocation out of memory

Next, I create a filtering operation I intend to perform on data (which revolves checking whether a value in a column appears in a set containing around 80000 values):

def remove_invalid_values_filter_factory(valid_value_set_or_series):
    def f(df):
        mask = df['col'].isin(valid_value_set_or_series)
        return df.loc[mask]
    return f

# Load valid values from another file
valid_values_info_df = pd.read_csv(...)
# The series is around 1 MiB in size
keep_known_values_only = remove_invalid_values_filter_factory(valid_values_info_df['values'])
# Tried both and both cause the error
# keep_known_values_only = remove_invalid_values_filter_factory(set(valid_values_info_df['values']))

Finally I apply this filter operation on the data and get the error:

%%time
# Error occures during this processing step
keep_known_values_only(
    dask_cudf.read_parquet(...)
).to_parquet(...)

I feel totally lost, most sources I came across have this error as a result of using cuDF without Dask or not setting CUDA cluster, but I have both. Additionally, intuitively the filtering operation shouldn't be memory expensive, so I have no clue what to do. I assume there is something wrong with how I set up the cluster, and fixing it would make the rest of more memory expensive operations hopefully work as well.

I would be grateful for your help, thanks!

分享到QQ

分享到微博