PHP 中的大型 mysql 查询
我有一个大约 1400 万行的大表。每行都包含一个文本块。我还有另一个大约有 6000 行的表,每行都有一个单词,每个单词有六个数值。我需要从第一个表中获取每个文本块,并找到第二个表中每个单词出现的次数,然后计算每个文本块的六个值的平均值并将其存储。
我有一台 i7 和 8GB 内存的 Debian 机器,应该能够处理它。目前我正在使用 php substr_count() 函数。然而,PHP 感觉它不是解决这个问题的正确解决方案。除了解决超时和内存限制问题之外,还有谁有更好的方法来做到这一点?是否可以只使用 SQL?如果不是,在不使服务器超载的情况下执行 PHP 的最佳方法是什么?
I have a large table of about 14 million rows. Each row has contains a block of text. I also have another table with about 6000 rows and each row has a word and six numerical values for each word. I need to take each block of text from the first table and find the amount of times each word in the second table appears then calculate the mean of the six values for each block of text and store it.
I have a debian machine with an i7 and 8gb of memory which should be able to handle it. At the moment I am using the php substr_count() function. However PHP just doesn't feel like its the right solution for this problem. Other than working around time-out and memory limit problems does anyone have a better way of doing this? Is it possible to use just SQL? If not what would be the best way to execute my PHP without overloading the server?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
一次处理“大”表中的每一条记录。将单个文本“块”加载到您的程序(php 或其他程序)中,并进行搜索和计算,然后在您需要的地方保存适当的值。
将每条记录作为自己的事务处理,与其余记录隔离。如果您被打断,请使用保存的值来确定从哪里重新开始。
完成现有记录后,您只需在以后输入或更新记录时执行此操作,因此更加容易。您只需立即采取行动即可更新数据。
Do each record from the 'big' table one-at-a-time. Load that single 'block' of text into your program (php or what ever), and do the searching and calculation, then save the appropriate values where ever you need them.
Do each record as its own transaction, in isolation from the rest. If you are interrupted, use the saved values to determine where to start again.
Once you are done the existing records, you only need to do this in the future when you enter or update a record, so it's much easier. You just need to take your big bite right now to get the data updated.
你到底想做什么?如果您正在尝试创建类似具有加权函数的搜索引擎之类的东西,您也许应该放弃它,而使用 MySQL 全文搜索函数和索引。如果您仍然需要这个特定的解决方案,您当然可以完全使用 SQL 来完成此操作。您可以在一个查询中执行此操作,也可以使用每次插入或更新行后运行的触发器来执行此操作。如果不跳过很多麻烦,您将无法使用 PHP 正确完成此任务。
为了给您一个具体的答案,我们确实需要有关查询、数据结构以及您想要执行的操作的更多信息。
What are you trying to do exactly? If you are trying to create something like a search engine with a weighting function, you maybe should drop that and instead use the MySQL fulltext search functions and indices that are there. If you still need to have this specific solution, you can of course do this completely in SQL. You can do this in one query or with a trigger that is run each time after a row is inserted or updated. You wont be able to get this done properly with PHP without jumping through a lot of hoops.
To give you a specific answer, we indeed would need more information about the queries, data structures and what you are trying to do.
重新设计 IT()
如果磁盘上的大小不重要,只需将表连接到一张
表中,将 6000 个放入内存 [ 内存表 ] 并每隔一小时进行备份
INSERT IGNORE into back.table SELECT * FROM my.table;
在大表 eq 中创建“自己的”索引
将列“名称索引”添加到具有行 id 的大表中
--
需要有关查询的更多信息才能找到解决方案
Redesign IT()
If for size on disc is not !important just joints table into one
Table with 6000 put into memory [ memory table ] and make backup every one hour
INSERT IGNORE into back.table SELECT * FROM my.table;
Create "own" index in big table eq
Add column "name index" into big table with id of row
--
Need more info about query to find solution