可能的重复:
在 PHP 中读取文件的内存密集型方式
我在速度与内存使用方面遇到问题。
我有一个脚本需要能够非常快地运行。它所做的只是加载 1-100MB 的多个文件,其中包含一个值列表,并根据另一个列表检查其中存在多少个。
我首选的方法是将文件中的值加载到数组中 (explode
),然后循环访问该数组并使用 isset
检查该值是否存在>。
我遇到的问题是值太多,它占用了 >10GB 的内存(我不知道为什么它使用这么多)。因此,我一次将文件中的值加载到内存中,而不是仅仅分解整个文件。这会减少内存使用量,但速度非常慢。
有更好的方法吗?
代码示例:
$check=array('lots','of','values','here');
$check=array_flip($check);
$values=explode('|',file_get_contents('bigfile.txt'));
$matches=0;
foreach($values as $key)
if (isset($check[$key])) $matches++;
Possible Duplicate:
Least memory intensive way to read a file in PHP
I have a problem with speed vs. memory usage.
I have a script which needs to be able to run very quickly. All it does is load multiple files from 1-100MB, consisting of a list of values and checks how many of these exist against another list.
My preferred way of doing this is to load the values from the file into an array (explode
), and then loop through this array and check whether the value exists or not using isset
.
The problem I have is that there are too many values, it uses up >10GB of memory (I don't know why it uses so much). So I have resorted to loading the values from the file into memory a few at a time, instead of just exploding the whole file. This cuts the memory usage right down, but is VERY slow.
Is there a better method?
Code Example:
$check=array('lots','of','values','here');
$check=array_flip($check);
$values=explode('|',file_get_contents('bigfile.txt'));
$matches=0;
foreach($values as $key)
if (isset($check[$key])) $matches++;
发布评论
评论(4)
也许你可以编写自己的 PHP C 扩展(参见例如 这个问题),或者用 C 编写一个小实用程序并让 PHP 运行它(也许使用
popen
)?Maybe you could code your own C extension of PHP (see e.g. this question), or code a small utility program in C and have PHP run it (perhaps using
popen
)?这些似乎是某种形式的面向键/值的 NoSQL 数据存储(mongodb、couchdb、Riak)(甚至可能只是一个大型内存缓存实例)的经典解决方案。
假设您可以在需要进行搜索之前将大型数据文件加载到数据存储中,并且您将多次使用加载文件中的数据,您应该会看到一些令人印象深刻的收益(只要您的查询、mapreduce等并不可怕),根据数据的大小判断,您可能需要查看不需要将所有内容快速保存在内存中的数据存储。
对于我上面提到的每个数据存储,都有大量的 PHP 驱动程序(和教程)。
These seems like a classic solution for some form of Key/Value orientated NoSQL datastore (mongodb, couchdb, Riak) (or maybe even just a large memcache instance).
Assuming you can load the large data files into the datastore ahead of when you need to do the searching and that you'll be using the data from the loaded files more than once, you should see some impressive gains (as long your queries, mapreduce, etc aren't awful), judging by the size of your data you may want to look at a data store which doesn't need to hold everything in memory to be quick.
There are plenty of PHP drivers (and tutorials) for each of the datastores I mentioned above.
打开文件并逐行阅读。也许使用 MySQL 进行导入(LOAD DATA INFILE)、结果数据或两者兼而有之。
Open the files and read through them line wise. Maybe use MySQL, for import (LOAD DATA INFILE), for resulting data or both.
看来您需要一些改进的搜索引擎。
Sphinx 搜索服务器可用于快速搜索您的值。
It seems you need some improved search engine.
Sphinx search server can be used for searching your values really fast.