问题：编写 MySQL 解析器来拆分 JOIN 并将它们作为单独的查询运行（动态非规范化查询）

发布于 2024-10-17 20:23:44 字数 2601 浏览 2 评论 0原文

我正在尝试找出一个脚本来获取 MySQL 查询并将其转换为单独的查询，即动态地非规范化查询。

作为测试，我构建了一个简单的文章系统，有 4 个表：

articles
- article_id
- article_format_id
- 文章标题
- 文章正文
- 文章日期
文章_类别
- article_id
- category_id
- category_id
- category_title
格式
- 格式ID
- 格式标题

一篇文章可以属于多个类别，但只有一种格式。我觉得这是现实生活中的一个很好的例子。

在列出所有文章的类别页面上（也拉入 format_title ），这可以通过以下查询轻松实现：

SELECT articles.*, formats.format_title 
FROM articles 
INNER JOIN formats ON articles.article_format_id = formats.format_id 
INNER JOIN article_categories ON articles.article_id = article_categories.article_id 
WHERE article_categories.category_id = 2 
ORDER BY articles.article_date DESC

但是，我尝试构建的脚本将接收此查询，解析它并单独运行查询。

因此，在此类别页面示例中，脚本将有效地运行此操作（动态计算）：

// Select article_categories
$sql = "SELECT * FROM article_categories WHERE category_id = 2";
$query = mysql_query($sql);
while ($row_article_categories = mysql_fetch_array($query, MYSQL_ASSOC)) {

    // Select articles
    $sql2 = "SELECT * FROM articles WHERE article_id = " . $row_article_categories['article_id'];
    $query2 = mysql_query($sql2);
    while ($row_articles = mysql_fetch_array($query2, MYSQL_ASSOC)) {

        // Select formats
        $sql3 = "SELECT * FROM formats WHERE format_id = " . $row_articles['article_format_id'];
        $query3 = mysql_query($sql3);
        $row_formats = mysql_fetch_array($query3, MYSQL_ASSOC);

        // Merge articles and formats
        $row_articles = array_merge($row_articles, $row_formats);

        // Add to array
        $out[] = $row_articles;
    }
}

// Sort articles by date
foreach ($out as $key => $row) {
    $arr[$key] = $row['article_date'];
}

array_multisort($arr, SORT_DESC, $out);

// Output articles - this would not be part of the script obviously it should just return the $out array
foreach ($out as $row) {
    echo '<p><a href="article.php?id='.$row['article_id'].'">'.$row['article_title'].'</a> <i>('.$row['format_title'].')</i><br />'.$row['article_body'].'<br /><span class="date">'.date("F jS Y", strtotime($row['article_date'])).'</span></p>';
}

这样做的挑战是以正确的顺序计算出正确的查询，因为您可以在查询中以任何顺序放置 SELECT 和 JOIN 的列名称（这就是 MySQL 和其他 SQL 数据库很好地翻译的内容）并在 PHP 中制定信息逻辑。

我目前正在使用 SQL_Parser 解析查询，它可以很好地将查询拆分为多维数组，但是计算出上面提到的东西是令人头疼的。

任何帮助或建议将不胜感激。

原文

I am trying to figure out a script to take a MySQL query and turn it into individual queries, i.e. denormalizing the query dynamically.

As a test I have built a simple article system that has 4 tables:

articles
- article_id
- article_format_id
- article_title
- article_body
- article_date
article_categories
- article_id
- category_id
categories
- category_id
- category_title
formats
- format_id
- format_title

An article can be in more than one category but only have one format. I feel this is a good example of a real-life situation.

On the category page which lists all of the articles (pulling in the format_title as well) this could be easily achieved with the following query:

SELECT articles.*, formats.format_title 
FROM articles 
INNER JOIN formats ON articles.article_format_id = formats.format_id 
INNER JOIN article_categories ON articles.article_id = article_categories.article_id 
WHERE article_categories.category_id = 2 
ORDER BY articles.article_date DESC

However the script I am trying to build would receive this query, parse it and run the queries individually.

So in this category page example the script would effectively run this (worked out dynamically):

// Select article_categories
$sql = "SELECT * FROM article_categories WHERE category_id = 2";
$query = mysql_query($sql);
while ($row_article_categories = mysql_fetch_array($query, MYSQL_ASSOC)) {

    // Select articles
    $sql2 = "SELECT * FROM articles WHERE article_id = " . $row_article_categories['article_id'];
    $query2 = mysql_query($sql2);
    while ($row_articles = mysql_fetch_array($query2, MYSQL_ASSOC)) {

        // Select formats
        $sql3 = "SELECT * FROM formats WHERE format_id = " . $row_articles['article_format_id'];
        $query3 = mysql_query($sql3);
        $row_formats = mysql_fetch_array($query3, MYSQL_ASSOC);

        // Merge articles and formats
        $row_articles = array_merge($row_articles, $row_formats);

        // Add to array
        $out[] = $row_articles;
    }
}

// Sort articles by date
foreach ($out as $key => $row) {
    $arr[$key] = $row['article_date'];
}

array_multisort($arr, SORT_DESC, $out);

// Output articles - this would not be part of the script obviously it should just return the $out array
foreach ($out as $row) {
    echo '<p><a href="article.php?id='.$row['article_id'].'">'.$row['article_title'].'</a> <i>('.$row['format_title'].')</i><br />'.$row['article_body'].'<br /><span class="date">'.date("F jS Y", strtotime($row['article_date'])).'</span></p>';
}

The challenges of this are working out the correct queries in the right order, as you can put column names for SELECT and JOIN's in any order in the query (this is what MySQL and other SQL databases translate so well) and working out the information logic in PHP.

I am currently parsing the query using SQL_Parser which works well in splitting up the query into a multi-dimensional array, but working out the stuff mentioned above is the headache.

Any help or suggestions would be much appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

壹場煙雨 2024-10-24 20:23:45

据我所知，您试图在无法修改的第三方论坛应用程序（也许是混淆的代码？）和 MySQL 之间放置一个层。该层将拦截查询，将它们重写为可单独执行，并生成 PHP 代码以针对数据库执行它们并返回聚合结果。 这是一个非常糟糕的主意。

您暗示不可能添加代码并同时建议生成要添加的代码，这似乎很奇怪。希望您不打算使用 funcall 之类的东西来注入代码。 这是一个非常糟糕的主意。

其他人呼吁避免您最初的方法并专注于数据库是非常合理的建议。我将把我的声音加入到这充满希望的合唱中。

我们将假设一些限制：

您正在运行 MySQL 5.0 或更高版本。
查询不能更改。
数据库表无法更改。
您已经为表准备了适当的索引麻烦的查询正在引用。
您已经三重检查访问数据库的慢速查询（并运行 EXPLAIN），并尝试设置索引来帮助它们运行得更快。
内部联接对 MySQL 安装造成的负载是不可接受的。

三种可能的解决方案：

您可以通过将运行数据库的硬件升级到具有更多内核、更多（尽可能多）RAM 和更快磁盘的硬件，向当前数据库投入资金，从而轻松解决此问题。如果你有钱的话，强烈推荐Fusion-io 的产品来解决这类问题。这可能是我将提供的三个选项中较简单的一个，
设置第二个主 MySQL 数据库并将其与第一个数据库配对。确保您有能力强制 AUTO_INCRMENT id 交替（一个数据库使用偶数 id，另一个数据库使用奇数）。这不会永远扩展，但它确实为您提供了一些喘息空间，以硬件和机架空间的价格为代价。再次，强化硬件。您可能已经这样做了，但如果没有，则值得考虑。
使用类似 dbShards 的内容。您仍然需要投入更多的硬件，但您还有一个额外的好处，即能够扩展到两台机器以上，并且随着时间的推移，您可以购买成本更低的硬件。

From what I gather you're trying to put a layer between a 3rd-party forum application that you can't modify (obfuscated code perhaps?) and MySQL. This layer will intercept queries, re-write them to be executable individually, and generate PHP code to execute them against the database and return the aggregate result. This is a very bad idea.

It seems strange that you imply the impossibility of adding code and simultaneously suggest generating code to be added. Hopefully you're not planning on using something like funcall to inject code. This is a very bad idea.

The calls from others to avoid your initial approach and focus on the database is very sound advice. I'll add my voice to that hopefully growing chorus.

We'll assume some constraints:

You're running MySQL 5.0 or greater.
The queries cannot change.
The database tables cannot be changed.
You already have appropriate indexes in place for the tables the troublesome queries are referencing.
You have triple-checked the slow queries (and run EXPLAIN) hitting your DB and have attempted to setup indexes that would help them run faster.
The load the inner joins are placing on your MySQL install is unacceptable.

Three possible solutions:

You could deal with this problem easily by investing money into your current database by upgrading the hardware it runs on to something with more cores, more (as much as you can afford) RAM, and faster disks. If you've got the money Fusion-io's products come highly recommended for this sort of thing. This is probably the simpler of the three options I'll offer
Setup a second master MySQL database and pair it with the first. Make sure you have the ability to force AUTO_INCREMENT id alternation (one DB uses even id's, the other odd). This doesn't scale forever, but it does offer you some breathing room for the price of the hardware and rack space. Again, beef up the hardware. You may have already done this, but if not it's worth consideration.
Use something like dbShards. You still need to throw more hardware at this, but you have the added benefit of being able to scale beyond two machines and you can buy lower cost hardware over time.

回复收藏 0 原文

满地尘埃落定 2024-10-24 20:23:45

为了提高数据库性能，您通常会寻找以下方法：

减少数据库调用的数量
使每个数据库调用尽可能高效（通过良好的设计）
减少要传输的数据量

...而您正在做完全相反的事情？故意的？
依据什么？

抱歉，您这样做完全错误，并且您在这条路上遇到的每个问题都将是第一个决定在数据库引擎之外实现数据库引擎的后果。在交付日期之前，您将被迫一直解决变通办法。（如果你到达那里）。

另外，我们正在谈论论坛？我的意思是，来吧！即使在最“网络规模很棒”的论坛上，我们谈论的平均吞吐量也低于 100 tps？您可以在笔记本电脑上执行此操作！

我的建议是忘记这一切，并以最简单的方式实施事情。然后将聚合（最新的、流行的、统计数据等等）缓存在应用程序层中。论坛中的其他所有内容都已经是主键查找。

回复收藏 0 原文

烟沫凡尘 2024-10-24 20:23:45

我同意这听起来是一个糟糕的选择，但我可以想到在某些情况下拆分查询可能很有用。

我会尝试类似的方法，严重依赖正则表达式来解析查询。它只能在非常有限的情况下起作用，但它的支持可以在需要时逐步扩大。

<?php
/**
 * That's a weird problem, but an interesting challenge!
 * @link http://stackoverflow.com/questions/5019467/problem-writing-a-mysql-parser-to-split-joins-and-run-them-as-individual-query
 */

// Taken from the given example:
$sql = "SELECT articles.*, formats.format_title 
FROM articles 
INNER JOIN formats ON articles.article_format_id = formats.format_id 
INNER JOIN article_categories ON articles.article_id = article_categories.article_id 
WHERE article_categories.category_id = 2 
ORDER BY articles.article_date DESC";

// Parse query
// (Limited to the clauses that are present in the example...)
// Edit: Made WHERE optional
if(!preg_match('/^\s*'.
    'SELECT\s+(?P<select_rows>.*[^\s])'. 
    '\s+FROM\s+(?P<from>.*[^\s])'.
    '(?:\s+WHERE\s+(?P<where>.*[^\s]))?'.
    '(?:\s+ORDER\s+BY\s+(?P<order_by>.*[^\s]))?'.
    '(?:\s+(?P<desc>DESC))?'.
    '(.*)$/is',$sql,$query)
) {
    trigger_error('Error parsing SQL!',E_USER_ERROR);
    return false;
}

## Dump matches
#foreach($query as $key => $value) if(!is_int($key)) echo "\"$key\" => \"$value\"<br/>\n";

/* We get the following matches:
"select_rows" => "articles.*, formats.format_title"
"from" => "articles INNER JOIN formats ON articles.article_format_id = formats.format_id INNER JOIN article_categories ON articles.article_id = article_categories.article_id"
"where" => "article_categories.category_id = 2"
"order_by" => "articles.article_date"
"desc" => "DESC"
/**/

// Will only support WHERE conditions separated by AND that are to be
// tested on a single individual table.
if(@$query['where']) // Edit: Made WHERE optional
    $where_conditions = preg_split('/\s+AND\s+/is',$query['where']);

// Retrieve individual table information & data
$tables = array();
$from_conditions = array();
$from_tables = preg_split('/\s+INNER\s+JOIN\s+/is',$query['from']);

foreach($from_tables as $from_table) {

    if(!preg_match('/^(?P<table_name>[^\s]*)'.
        '(?P<on_clause>\s+ON\s+(?P<table_a>.*)\.(?P<column_a>.*)\s*'.
        '=\s*(?P<table_b>.*)\.(?P<column_b>.*))?$/im',$from_table,$matches)
    ) {
        trigger_error("Error parsing SQL! Unexpected format in FROM clause: $from_table", E_USER_ERROR);
        return false;
    }
    ## Dump matches
    #foreach($matches as $key => $value) if(!is_int($key)) echo "\"$key\" => \"$value\"<br/>\n";

    // Remember on_clause for later jointure
    // We do assume each INNER JOIN's ON clause compares left table to
    // right table. Forget about parsing more complex conditions in the
    // ON clause...
    if(@$matches['on_clause'])
        $from_conditions[$matches['table_name']] = array(
            'column_a' => $matches['column_a'],
            'column_b' => $matches['column_b']
        );

    // Match applicable WHERE conditions
    $where = array();
    if(@$query['where']) // Edit: Made WHERE optional
    foreach($where_conditions as $where_condition)
        if(preg_match("/^$matches[table_name]\.(.*)$/",$where_condition,$matched))
            $where[] = $matched[1];
    $where_clause = empty($where) ? null : implode(' AND ',$where);

    // We simply ignore $query[select_rows] and use '*' everywhere...
    $query = "SELECT * FROM $matches[table_name]".($where_clause? " WHERE $where_clause" : '');
    echo "$query<br/>\n";

    // Retrieve table's data
    // Fetching the entire table data right away avoids multiplying MySQL
    // queries exponentially...
    $table = array();
    if($results = mysql_query($table))
        while($row = mysql_fetch_array($results, MYSQL_ASSOC))
            $table[] = $row;

    // Sort table if applicable
    if(preg_match("/^$matches[table_name]\.(.*)$/",$query['order_by'],$matched)) {
        $sort_key = $matched[1];

        // @todo Do your bubble sort here!

        if(@$query['desc']) array_reverse($table);
    }

    $tables[$matches['table_name']] = $table;
}

// From here, all data is fetched.
// All left to do is the actual jointure.

/**
 * Equijoin/Theta-join.
 * Joins relation $R and $S where $a from $R compares to $b from $S.
 * @param array $R A relation (set of tuples).
 * @param array $S A relation (set of tuples).
 * @param string $a Attribute from $R to compare.
 * @param string $b Attribute from $S to compare.
 * @return array A relation resulting from the equijoin/theta-join.
 */
function equijoin($R,$S,$a,$b) {
    $T = array();
    if(empty($R) or empty($S)) return $T;
    foreach($R as $tupleR) foreach($S as $tupleS)
        if($tupleR[$a] == @$tupleS[$b])
            $T[] = array_merge($tupleR,$tupleS);
    return $T;
}

$jointure = array_shift($tables);
if(!empty($tables)) foreach($tables as $table_name => $table)
    $jointure = equijoin($jointure, $table,
        $from_conditions[$table_name]['column_a'],
        $from_conditions[$table_name]['column_b']);

return $jointure;

?>

晚安，祝你好运！

I agree it sounds like a bad choice, but I can think of some situations where splitting a query could be useful.

I would try something similar to this, relying heavily on regular expressions for parsing the query. It would work in a very limited of cases, but it's support could be expanded progressively when needed.

<?php
/**
 * That's a weird problem, but an interesting challenge!
 * @link http://stackoverflow.com/questions/5019467/problem-writing-a-mysql-parser-to-split-joins-and-run-them-as-individual-query
 */

// Taken from the given example:
$sql = "SELECT articles.*, formats.format_title 
FROM articles 
INNER JOIN formats ON articles.article_format_id = formats.format_id 
INNER JOIN article_categories ON articles.article_id = article_categories.article_id 
WHERE article_categories.category_id = 2 
ORDER BY articles.article_date DESC";

// Parse query
// (Limited to the clauses that are present in the example...)
// Edit: Made WHERE optional
if(!preg_match('/^\s*'.
    'SELECT\s+(?P<select_rows>.*[^\s])'. 
    '\s+FROM\s+(?P<from>.*[^\s])'.
    '(?:\s+WHERE\s+(?P<where>.*[^\s]))?'.
    '(?:\s+ORDER\s+BY\s+(?P<order_by>.*[^\s]))?'.
    '(?:\s+(?P<desc>DESC))?'.
    '(.*)$/is',$sql,$query)
) {
    trigger_error('Error parsing SQL!',E_USER_ERROR);
    return false;
}

## Dump matches
#foreach($query as $key => $value) if(!is_int($key)) echo "\"$key\" => \"$value\"<br/>\n";

/* We get the following matches:
"select_rows" => "articles.*, formats.format_title"
"from" => "articles INNER JOIN formats ON articles.article_format_id = formats.format_id INNER JOIN article_categories ON articles.article_id = article_categories.article_id"
"where" => "article_categories.category_id = 2"
"order_by" => "articles.article_date"
"desc" => "DESC"
/**/

// Will only support WHERE conditions separated by AND that are to be
// tested on a single individual table.
if(@$query['where']) // Edit: Made WHERE optional
    $where_conditions = preg_split('/\s+AND\s+/is',$query['where']);

// Retrieve individual table information & data
$tables = array();
$from_conditions = array();
$from_tables = preg_split('/\s+INNER\s+JOIN\s+/is',$query['from']);

foreach($from_tables as $from_table) {

    if(!preg_match('/^(?P<table_name>[^\s]*)'.
        '(?P<on_clause>\s+ON\s+(?P<table_a>.*)\.(?P<column_a>.*)\s*'.
        '=\s*(?P<table_b>.*)\.(?P<column_b>.*))?$/im',$from_table,$matches)
    ) {
        trigger_error("Error parsing SQL! Unexpected format in FROM clause: $from_table", E_USER_ERROR);
        return false;
    }
    ## Dump matches
    #foreach($matches as $key => $value) if(!is_int($key)) echo "\"$key\" => \"$value\"<br/>\n";

    // Remember on_clause for later jointure
    // We do assume each INNER JOIN's ON clause compares left table to
    // right table. Forget about parsing more complex conditions in the
    // ON clause...
    if(@$matches['on_clause'])
        $from_conditions[$matches['table_name']] = array(
            'column_a' => $matches['column_a'],
            'column_b' => $matches['column_b']
        );

    // Match applicable WHERE conditions
    $where = array();
    if(@$query['where']) // Edit: Made WHERE optional
    foreach($where_conditions as $where_condition)
        if(preg_match("/^$matches[table_name]\.(.*)$/",$where_condition,$matched))
            $where[] = $matched[1];
    $where_clause = empty($where) ? null : implode(' AND ',$where);

    // We simply ignore $query[select_rows] and use '*' everywhere...
    $query = "SELECT * FROM $matches[table_name]".($where_clause? " WHERE $where_clause" : '');
    echo "$query<br/>\n";

    // Retrieve table's data
    // Fetching the entire table data right away avoids multiplying MySQL
    // queries exponentially...
    $table = array();
    if($results = mysql_query($table))
        while($row = mysql_fetch_array($results, MYSQL_ASSOC))
            $table[] = $row;

    // Sort table if applicable
    if(preg_match("/^$matches[table_name]\.(.*)$/",$query['order_by'],$matched)) {
        $sort_key = $matched[1];

        // @todo Do your bubble sort here!

        if(@$query['desc']) array_reverse($table);
    }

    $tables[$matches['table_name']] = $table;
}

// From here, all data is fetched.
// All left to do is the actual jointure.

/**
 * Equijoin/Theta-join.
 * Joins relation $R and $S where $a from $R compares to $b from $S.
 * @param array $R A relation (set of tuples).
 * @param array $S A relation (set of tuples).
 * @param string $a Attribute from $R to compare.
 * @param string $b Attribute from $S to compare.
 * @return array A relation resulting from the equijoin/theta-join.
 */
function equijoin($R,$S,$a,$b) {
    $T = array();
    if(empty($R) or empty($S)) return $T;
    foreach($R as $tupleR) foreach($S as $tupleS)
        if($tupleR[$a] == @$tupleS[$b])
            $T[] = array_merge($tupleR,$tupleS);
    return $T;
}

$jointure = array_shift($tables);
if(!empty($tables)) foreach($tables as $table_name => $table)
    $jointure = equijoin($jointure, $table,
        $from_conditions[$table_name]['column_a'],
        $from_conditions[$table_name]['column_b']);

return $jointure;

?>

Good night, and Good luck!

回复收藏 0 原文

回忆躺在深渊里 2024-10-24 20:23:45

我认为您应该创建一个非规范化的文章表，并在每篇文章插入/删除/更新时更改它，而不是重写 SQL。它将变得更加简单和便宜。

创建并填充它：

create table articles_denormalized
...

insert into articles_denormalized 
    SELECT articles.*, formats.format_title 
    FROM articles 
    INNER JOIN formats ON articles.article_format_id = formats.format_id 
    INNER JOIN article_categories ON articles.article_id = article_categories.article_id

现在针对它发出适当的文章插入/更新/删除，您将拥有一个随时可以查询的非规范化表。

In instead of the sql rewriting I think you should create a denormalized articles table and change it at each article insert/delete/update. It will be MUCH simpler and cheaper.

Do the create and populate it:

create table articles_denormalized
...

insert into articles_denormalized 
    SELECT articles.*, formats.format_title 
    FROM articles 
    INNER JOIN formats ON articles.article_format_id = formats.format_id 
    INNER JOIN article_categories ON articles.article_id = article_categories.article_id

Now issue the appropriate article insert/update/delete against it and you will have a denormalized table always ready to be queried.

回复收藏 0 原文

~没有更多了~