使用 PHP 处理大型（对象）数据集

发布于 2024-08-26 14:55:20 字数 1099 浏览 4 评论 0原文

我目前正在从事一个广泛依赖 EAV 模型的项目。两个实体的属性都由模型单独表示，有时会扩展其他模型（或至少是基本模型）。

到目前为止，这种方法效果很好，因为应用程序的大多数区域仅依赖于过滤后的实体集，而不是整个数据集。

然而，现在我需要解析整个数据集（即：所有实体及其所有属性），以便提供基于属性的排序/过滤算法。

该应用程序当前由大约 2200 个实体组成，每个实体大约有 100 个属性。每个实体都由单个模型表示（例如 Client_Model_Entity），并具有一个名为 $_attributes 的受保护属性，它是 Attribute 对象的数组。

每个实体对象大约有 500KB，这给服务器带来了令人难以置信的负载。对于 2000 个实体，这意味着单个任务需要 1GB RAM（以及大量 CPU 时间）才能工作，这是不可接受的。

是否有任何模式或通用方法来迭代如此大的数据集？分页并不是真正的选择，因为必须考虑所有因素才能提供排序算法。

编辑：希望使事情变得更清晰的代码示例：

// code from the resource model
for ($i=0,$n=count($rowset);$i<$n;++$i)
{
    $clientEntity = new Client_Model_Entity($rowset[$i]);
    // getattributes gets all possible attributes from the db and creates models for them
    // this is actually the big resource hog, as one client can have 100 attributes
    $clientEntity->getAttributes(); 
    $this->_rows[$i] = $clientEntity;
    // memory usage has now increased by 500KB
    echo $i . ' : ' . memory_get_usage() . '<br />';
}

原文

I am currently working on a project that extensively relies on the EAV model. Both entities as their attributes are individually represented by a model, sometimes extending other models (or at least, base models).

This has worked quite well so far since most areas of the application only rely on filtered sets of entities, and not the entire dataset.

Now, however, I need to parse the entire dataset (IE: all entities and all their attributes) in order to provide a sorting/filtering algorithm based on the attributes.

The application currently consists of aproximately 2200 entities, each with aproximately 100 attributes. Every entity is represented by a single model (for example Client_Model_Entity) and has a protected property called $_attributes, which is an array of Attribute objects.

Each entity object is about 500KB, which results in an incredible load on the server. With 2000 entities, this means a single task would take 1GB of RAM (and a lot of CPU time) in order to work, which is unacceptable.

Are there any patterns or common approaches to iterating over such large datasets? Paging is not really an option, since everything has to be taken into account in order to provide the sorting algorithm.

EDIT: a code example to hopefully make things clearer:

// code from the resource model
for ($i=0,$n=count($rowset);$i<$n;++$i)
{
    $clientEntity = new Client_Model_Entity($rowset[$i]);
    // getattributes gets all possible attributes from the db and creates models for them
    // this is actually the big resource hog, as one client can have 100 attributes
    $clientEntity->getAttributes(); 
    $this->_rows[$i] = $clientEntity;
    // memory usage has now increased by 500KB
    echo $i . ' : ' . memory_get_usage() . '<br />';
}

分享到QQ

分享到微博