问题:编写 MySQL 解析器来拆分 JOIN 并将它们作为单独的查询运行(动态非规范化查询)
我正在尝试找出一个脚本来获取 MySQL 查询并将其转换为单独的查询,即动态地非规范化查询。
作为测试,我构建了一个简单的文章系统,有 4 个表:
- articles
- article_id
- article_format_id
- 文章标题
- 文章正文
- 文章日期
- 文章_类别
- article_id
- category_id
-
- category_id
- category_title
- 格式
- 格式ID
- 格式标题
一篇文章可以属于多个类别,但只有一种格式。我觉得这是现实生活中的一个很好的例子。
在列出所有文章的类别页面上(也拉入 format_title ),这可以通过以下查询轻松实现:
SELECT articles.*, formats.format_title
FROM articles
INNER JOIN formats ON articles.article_format_id = formats.format_id
INNER JOIN article_categories ON articles.article_id = article_categories.article_id
WHERE article_categories.category_id = 2
ORDER BY articles.article_date DESC
但是,我尝试构建的脚本将接收此查询,解析它并单独运行查询。
因此,在此类别页面示例中,脚本将有效地运行此操作(动态计算):
// Select article_categories
$sql = "SELECT * FROM article_categories WHERE category_id = 2";
$query = mysql_query($sql);
while ($row_article_categories = mysql_fetch_array($query, MYSQL_ASSOC)) {
// Select articles
$sql2 = "SELECT * FROM articles WHERE article_id = " . $row_article_categories['article_id'];
$query2 = mysql_query($sql2);
while ($row_articles = mysql_fetch_array($query2, MYSQL_ASSOC)) {
// Select formats
$sql3 = "SELECT * FROM formats WHERE format_id = " . $row_articles['article_format_id'];
$query3 = mysql_query($sql3);
$row_formats = mysql_fetch_array($query3, MYSQL_ASSOC);
// Merge articles and formats
$row_articles = array_merge($row_articles, $row_formats);
// Add to array
$out[] = $row_articles;
}
}
// Sort articles by date
foreach ($out as $key => $row) {
$arr[$key] = $row['article_date'];
}
array_multisort($arr, SORT_DESC, $out);
// Output articles - this would not be part of the script obviously it should just return the $out array
foreach ($out as $row) {
echo '<p><a href="article.php?id='.$row['article_id'].'">'.$row['article_title'].'</a> <i>('.$row['format_title'].')</i><br />'.$row['article_body'].'<br /><span class="date">'.date("F jS Y", strtotime($row['article_date'])).'</span></p>';
}
这样做的挑战是以正确的顺序计算出正确的查询,因为您可以在查询中以任何顺序放置 SELECT 和 JOIN 的列名称(这就是 MySQL 和其他 SQL 数据库很好地翻译的内容)并在 PHP 中制定信息逻辑。
我目前正在使用 SQL_Parser 解析查询,它可以很好地将查询拆分为多维数组,但是计算出上面提到的东西是令人头疼的。
任何帮助或建议将不胜感激。
I am trying to figure out a script to take a MySQL query and turn it into individual queries, i.e. denormalizing the query dynamically.
As a test I have built a simple article system that has 4 tables:
- articles
- article_id
- article_format_id
- article_title
- article_body
- article_date
- article_categories
- article_id
- category_id
- categories
- category_id
- category_title
- formats
- format_id
- format_title
An article can be in more than one category but only have one format. I feel this is a good example of a real-life situation.
On the category page which lists all of the articles (pulling in the format_title as well) this could be easily achieved with the following query:
SELECT articles.*, formats.format_title
FROM articles
INNER JOIN formats ON articles.article_format_id = formats.format_id
INNER JOIN article_categories ON articles.article_id = article_categories.article_id
WHERE article_categories.category_id = 2
ORDER BY articles.article_date DESC
However the script I am trying to build would receive this query, parse it and run the queries individually.
So in this category page example the script would effectively run this (worked out dynamically):
// Select article_categories
$sql = "SELECT * FROM article_categories WHERE category_id = 2";
$query = mysql_query($sql);
while ($row_article_categories = mysql_fetch_array($query, MYSQL_ASSOC)) {
// Select articles
$sql2 = "SELECT * FROM articles WHERE article_id = " . $row_article_categories['article_id'];
$query2 = mysql_query($sql2);
while ($row_articles = mysql_fetch_array($query2, MYSQL_ASSOC)) {
// Select formats
$sql3 = "SELECT * FROM formats WHERE format_id = " . $row_articles['article_format_id'];
$query3 = mysql_query($sql3);
$row_formats = mysql_fetch_array($query3, MYSQL_ASSOC);
// Merge articles and formats
$row_articles = array_merge($row_articles, $row_formats);
// Add to array
$out[] = $row_articles;
}
}
// Sort articles by date
foreach ($out as $key => $row) {
$arr[$key] = $row['article_date'];
}
array_multisort($arr, SORT_DESC, $out);
// Output articles - this would not be part of the script obviously it should just return the $out array
foreach ($out as $row) {
echo '<p><a href="article.php?id='.$row['article_id'].'">'.$row['article_title'].'</a> <i>('.$row['format_title'].')</i><br />'.$row['article_body'].'<br /><span class="date">'.date("F jS Y", strtotime($row['article_date'])).'</span></p>';
}
The challenges of this are working out the correct queries in the right order, as you can put column names for SELECT and JOIN's in any order in the query (this is what MySQL and other SQL databases translate so well) and working out the information logic in PHP.
I am currently parsing the query using SQL_Parser which works well in splitting up the query into a multi-dimensional array, but working out the stuff mentioned above is the headache.
Any help or suggestions would be much appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
据我所知,您试图在无法修改的第三方论坛应用程序(也许是混淆的代码?)和 MySQL 之间放置一个层。该层将拦截查询,将它们重写为可单独执行,并生成 PHP 代码以针对数据库执行它们并返回聚合结果。 这是一个非常糟糕的主意。
您暗示不可能添加代码并同时建议生成要添加的代码,这似乎很奇怪。希望您不打算使用 funcall 之类的东西来注入代码。 这是一个非常糟糕的主意。
其他人呼吁避免您最初的方法并专注于数据库是非常合理的建议。我将把我的声音加入到这充满希望的合唱中。
我们将假设一些限制:
三种可能的解决方案:
From what I gather you're trying to put a layer between a 3rd-party forum application that you can't modify (obfuscated code perhaps?) and MySQL. This layer will intercept queries, re-write them to be executable individually, and generate PHP code to execute them against the database and return the aggregate result. This is a very bad idea.
It seems strange that you imply the impossibility of adding code and simultaneously suggest generating code to be added. Hopefully you're not planning on using something like funcall to inject code. This is a very bad idea.
The calls from others to avoid your initial approach and focus on the database is very sound advice. I'll add my voice to that hopefully growing chorus.
We'll assume some constraints:
Three possible solutions:
为了提高数据库性能,您通常会寻找以下方法:
...而您正在做完全相反的事情?故意的?
依据什么?
抱歉,您这样做完全错误,并且您在这条路上遇到的每个问题都将是第一个决定在数据库引擎之外实现数据库引擎的后果。在交付日期之前,您将被迫一直解决变通办法。 (如果你到达那里)。
另外,我们正在谈论论坛?我的意思是,来吧!即使在最“网络规模很棒”的论坛上,我们谈论的平均吞吐量也低于 100 tps?您可以在笔记本电脑上执行此操作!
我的建议是忘记这一切,并以最简单的方式实施事情。然后将聚合(最新的、流行的、统计数据等等)缓存在应用程序层中。论坛中的其他所有内容都已经是主键查找。
To improve database performance you typically look for ways to:
...and you are doing the exact opposite? Deliberately?
On what grounds?
I'm sorry, you are doing this entirely wrong, and every single problem you encounter down this road will all be consequences of that first decision to implement a database engine outside of the database engine. You will be forced to work around work-arounds all the way to delivery date. (if you get there).
Also, we are talking about a forum? I mean, come on! Even on the most "web-scale-awesome-sauce" forums we're talking about less than what, 100 tps on average? You could do that on your laptop!
My advice is to forget about all this and implement things the most simple possible way. Then cache the aggregates (most recent, popular, statistics, whatever) in the application layer. Everything else in a forum is already primary key lookups.
我同意这听起来是一个糟糕的选择,但我可以想到在某些情况下拆分查询可能很有用。
我会尝试类似的方法,严重依赖正则表达式来解析查询。它只能在非常有限的情况下起作用,但它的支持可以在需要时逐步扩大。
晚安,祝你好运!
I agree it sounds like a bad choice, but I can think of some situations where splitting a query could be useful.
I would try something similar to this, relying heavily on regular expressions for parsing the query. It would work in a very limited of cases, but it's support could be expanded progressively when needed.
Good night, and Good luck!
我认为您应该创建一个非规范化的文章表,并在每篇文章插入/删除/更新时更改它,而不是重写 SQL。它将变得更加简单和便宜。
创建并填充它:
现在针对它发出适当的文章插入/更新/删除,您将拥有一个随时可以查询的非规范化表。
In instead of the sql rewriting I think you should create a denormalized articles table and change it at each article insert/delete/update. It will be MUCH simpler and cheaper.
Do the create and populate it:
Now issue the appropriate article insert/update/delete against it and you will have a denormalized table always ready to be queried.