MySQL 在值更改时插入新行

发布于 2024-12-01 00:09:46 字数 4989 浏览 1 评论 0原文

对于我现在正在进行的个人项目，我想制作一张 Steam、Impulse、EA Origins 和其他几个网站上游戏价格随时间变化的折线图。目前，我已经修改了 SteamCalculator.com 使用的脚本来记录每个可能的国家/地区代码或每个网站中每个游戏的当前价格（如果适用，则为促销价）。我还有一列用于存储价格的日期。我当前的表格看起来像这样：

THIS STRUCTURE IS NO LONGER VALID. SEE BELOW
+----------+------+------+------+------+------+------+------------+
| steam_id |  us  |  at  |  au  |  de  |  no  |  uk  |    date    |
+----------+------+------+------+------+------+------+------------+
|  112233  |  999 |  899 |  999 | NULL |  899 |  699 |  2011-8-21 |
|  123456  | 1999 |  999 | 1999 |  999 |  999 |  999 |  2011-8-20 |
|    ...   |  ... |  ... |  ... |  ... |  ... |  ... |     ...    |
+----------+------+------+------+------+------+------+------------+

目前每个国家/地区都是单独更新的（有一个遍历国家/地区的 for 循环），尽管如果可以简化它，那么可以对其进行修改以暂时将新价格存储到数组中，然后更新整个国家/地区一次排。无论如何，出于性能原因，我最终可能会这样做。

现在我的问题是确定如果其中一个价格发生变化，如何最好地更新此表。例如，假设游戏 112233 于 2011 年 8 月 22 日在美国发售，售价为 4.99 美元，在奥地利售价为 3.99 欧元，其他价格保持不变。我需要表格看起来像这样：

THIS STRUCTURE IS NO LONGER VALID. SEE BELOW
+----------+------+------+------+------+------+------+------------+
| steam_id |  us  |  at  |  au  |  de  |  no  |  uk  |    date    |
+----------+------+------+------+------+------+------+------------+
|  112233  |  999 |  899 |  999 | NULL |  899 |  699 |  2011-8-21 |
|  123456  | 1999 |  999 | 1999 |  999 |  999 |  999 |  2011-8-20 |
|    ...   |  ... |  ... |  ... |  ... |  ... |  ... |     ...    |
|  112233  |  499 |  399 |  999 | NULL |  899 |  699 |  2011-8-22 |
+----------+------+------+------+------+------+------+------------+

我不想每次检查价格时都创建一个新行，否则我最终会日复一日地得到数百万行重复的价格。我也不想为每个更改的价格创建一个新行，如下所示：

THIS STRUCTURE IS NO LONGER VALID. SEE BELOW
+----------+------+------+------+------+------+------+------------+
| steam_id |  us  |  at  |  au  |  de  |  no  |  uk  |    date    |
+----------+------+------+------+------+------+------+------------+
|  112233  |  999 |  899 |  999 | NULL |  899 |  699 |  2011-8-21 |
|  123456  | 1999 |  999 | 1999 |  999 |  999 |  999 |  2011-8-20 |
|    ...   |  ... |  ... |  ... |  ... |  ... |  ... |     ...    |
|  112233  |  499 |  899 |  999 | NULL |  899 |  699 |  2011-8-22 |
|  112233  |  499 |  399 |  999 | NULL |  899 |  699 |  2011-8-22 |
+----------+------+------+------+------+------+------+------------+

我可以通过使每个 (steam_id,) 成为唯一索引然后添加来防止第一个问题，但不能防止第二个问题ON DUPLICATE KEY UPDATE 对每个数据库查询。如果价格不同，这只会添加一行，但它会为每个发生变化的国家/地区添加一个新行。它还不允许单个游戏在不同的两天内采用相同的价格（例如，假设游戏 112233 稍后停售并返回到 9.99 美元），因此这显然是一个糟糕的选择。

我可以通过将 (steam_id, date) 设置为唯一索引，然后向每个查询添加 ON DUPLICATE KEY UPDATE 来防止第二个问题，但不能防止第一个问题。每天运行脚本时，日期都会发生变化，因此它将创建一个新行。这种方法最终会导致每天都有数百条相同价格的行。

当（且仅当）任何价格自最新日期以来发生变化时，我如何告诉 MySQL 创建一个新行？

更新 -

根据此线程中的人员的建议，我更改了数据库的架构，以便于将来添加新的国家/地区代码，并避免需要一次更新整行的问题。新模式看起来像这样：

+----------+------+---------+------------+
| steam_id |  cc  |  price  |    date    |
+----------+------+---------+------------+
|  112233  |  us  |   999   |  2011-8-21 |
|  123456  |  uk  |   699   |  2011-8-20 |
|    ...   |  ... |   ...   |     ...    |
+----------+------+---------+------------+

在这个新模式之上，我发现我可以使用以下 SQL 查询从最近的更新中获取价格：

SELECT `price` FROM `steam_prices` WHERE `steam_id` = 112233 AND `cc`='us' ORDER BY `date` ASC LIMIT 1

此时我的问题归结为：

是否可以（使用仅 SQL 而不是应用程序逻辑）仅在条件为真时才插入行？例如：

INSERT INTO `steam_prices` (...) VALUES (...) IF price<>(SELECT `price` FROM `steam_prices` WHERE `steam_id` = 112233 AND `cc`='us' ORDER BY `date` ASC LIMIT 1)

从 MySQL 手册我找不到任何方法来做到这一点。我只发现如果唯一索引相同，您可以忽略或更新。但是，如果我将价格设置为唯一索引（如果日期相同，则允许我更新日期），那么我将无法识别游戏何时开始销售，然后返回到其原始价格。例如：

+----------+------+---------+------------+
| steam_id |  cc  |  price  |    date    |
+----------+------+---------+------------+
|  112233  |  us  |   999   |  2011-8-20 |
|  112233  |  us  |   499   |  2011-8-21 |
|  112233  |  us  |   999   |  2011-8-22 |
|    ...   |  ... |   ...   |     ...    |
+----------+------+---------+------------+

另外，在找到并阅读 MySQL Conditional INSERT 之后，我创建并尝试了以下查询：

INSERT INTO `steam_prices`(
    `steam_id`,
    `cc`,
    `update`,
    `price`
)
SELECT '7870', 'us', NOW(), 999
FROM `steam_prices`
WHERE
    `price`<>999
    AND `update` IN (
        SELECT `update`
        FROM `steam_prices`
        ORDER BY `update`
        ASC LIMIT 1
    )

这个想法是当（且仅当）最新的 price 时插入行 '7870', 'us', NOW(), 999 update 不是 999。当我运行此命令时，出现以下错误：

1235 - 此版本的 MySQL 尚不支持“LIMIT &” IN/ALL/ANY/SOME 子查询'

有什么想法吗？

原文

For a personal project I'm working on right now I want to make a line graph of game prices on Steam, Impulse, EA Origins, and several other sites over time. At the moment I've modified a script used by SteamCalculator.com to record the current price (sale price if applicable) for every game in every country code possible or each of these sites. I also have a column for the date in which the price was stored. My current tables look something like so:

THIS STRUCTURE IS NO LONGER VALID. SEE BELOW
+----------+------+------+------+------+------+------+------------+
| steam_id |  us  |  at  |  au  |  de  |  no  |  uk  |    date    |
+----------+------+------+------+------+------+------+------------+
|  112233  |  999 |  899 |  999 | NULL |  899 |  699 |  2011-8-21 |
|  123456  | 1999 |  999 | 1999 |  999 |  999 |  999 |  2011-8-20 |
|    ...   |  ... |  ... |  ... |  ... |  ... |  ... |     ...    |
+----------+------+------+------+------+------+------+------------+

At the moment each country is updated separately (there's a for loop going through the countries), although if it would simplify it then this could be modified to temporarily store new prices to an array then update an entire row at a time. I'll likely be doing this eventually, anyway, for performance reasons.

Now my issue is determining how to best update this table if one of the prices changes. For instance, let's suppose that on 8/22/2011 the game 112233 goes on sale in America for $4.99, Austria for 3.99€, and the other prices remain the same. I would need the table to look like so:

THIS STRUCTURE IS NO LONGER VALID. SEE BELOW
+----------+------+------+------+------+------+------+------------+
| steam_id |  us  |  at  |  au  |  de  |  no  |  uk  |    date    |
+----------+------+------+------+------+------+------+------------+
|  112233  |  999 |  899 |  999 | NULL |  899 |  699 |  2011-8-21 |
|  123456  | 1999 |  999 | 1999 |  999 |  999 |  999 |  2011-8-20 |
|    ...   |  ... |  ... |  ... |  ... |  ... |  ... |     ...    |
|  112233  |  499 |  399 |  999 | NULL |  899 |  699 |  2011-8-22 |
+----------+------+------+------+------+------+------+------------+

I don't want to create a new row EVERY time the price is checked, otherwise I'll end up having millions of rows of repeated prices day after day. I also don't want to create a new row per changed price like so:

THIS STRUCTURE IS NO LONGER VALID. SEE BELOW
+----------+------+------+------+------+------+------+------------+
| steam_id |  us  |  at  |  au  |  de  |  no  |  uk  |    date    |
+----------+------+------+------+------+------+------+------------+
|  112233  |  999 |  899 |  999 | NULL |  899 |  699 |  2011-8-21 |
|  123456  | 1999 |  999 | 1999 |  999 |  999 |  999 |  2011-8-20 |
|    ...   |  ... |  ... |  ... |  ... |  ... |  ... |     ...    |
|  112233  |  499 |  899 |  999 | NULL |  899 |  699 |  2011-8-22 |
|  112233  |  499 |  399 |  999 | NULL |  899 |  699 |  2011-8-22 |
+----------+------+------+------+------+------+------+------------+

I can prevent the first problem but not the second by making each (steam_id, <country>) a unique index then adding ON DUPLICATE KEY UPDATE to every database query. This will only add a row if the price is different, however it will add a new row for each country which changes. It also does not allow the same price for a single game for two different days (for instance, suppose game 112233 goes off sale later and returns to $9.99) so this is clearly an awful option.

I can prevent the second problem but not the first by making (steam_id, date) a unique index then adding ON DUPLICATE KEY UPDATE to every query. Every single day when the script is run the date has changed, so it will create a new row. This method ends up with hundreds of lines of the same prices from day to day.

How can I tell MySQL to create a new row if (and only if) any of the prices has changed since the latest date?

UPDATE -

At the recommendation of people in this thread I have changed the schema of my database to facilitate adding new country codes in the future and avoid the issue of needing to update entire rows at a time. The new schema looks something like:

+----------+------+---------+------------+
| steam_id |  cc  |  price  |    date    |
+----------+------+---------+------------+
|  112233  |  us  |   999   |  2011-8-21 |
|  123456  |  uk  |   699   |  2011-8-20 |
|    ...   |  ... |   ...   |     ...    |
+----------+------+---------+------------+

On top of this new schema I have discovered that I can use the following SQL query to grab the price from the most recent update:

SELECT `price` FROM `steam_prices` WHERE `steam_id` = 112233 AND `cc`='us' ORDER BY `date` ASC LIMIT 1

At this point my question boils down to this:

Is it possible to (using only SQL rather than application logic) insert a row only if a condition is true? For instance:

INSERT INTO `steam_prices` (...) VALUES (...) IF price<>(SELECT `price` FROM `steam_prices` WHERE `steam_id` = 112233 AND `cc`='us' ORDER BY `date` ASC LIMIT 1)

From the MySQL manual I can not find any way to do this. I have only found that you can ignore or update if a unique index is the same. However if I made the price a unique index (allowing me to update the date if it was the same) then I would not be able to recognize when a game went on sale and then returned to its original price. For instance:

+----------+------+---------+------------+
| steam_id |  cc  |  price  |    date    |
+----------+------+---------+------------+
|  112233  |  us  |   999   |  2011-8-20 |
|  112233  |  us  |   499   |  2011-8-21 |
|  112233  |  us  |   999   |  2011-8-22 |
|    ...   |  ... |   ...   |     ...    |
+----------+------+---------+------------+

Also, after just finding and reading MySQL Conditional INSERT, I created and tried the following query:

INSERT INTO `steam_prices`(
    `steam_id`,
    `cc`,
    `update`,
    `price`
)
SELECT '7870', 'us', NOW(), 999
FROM `steam_prices`
WHERE
    `price`<>999
    AND `update` IN (
        SELECT `update`
        FROM `steam_prices`
        ORDER BY `update`
        ASC LIMIT 1
    )

The idea was to insert the row '7870', 'us', NOW(), 999 if (and only if) the price of the most recent update wasn't 999. When I ran this I got the following error:

1235 - This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'

Any ideas?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

燕归巢 2024-12-08 00:09:47

经过实验，并在 MySQL Conditional INSERT 和 http://www.artfulsoftware.com/infotree/queries.php#101，我发现有效的查询：

INSERT INTO `steam_prices`( 
    `steam_id`, 
    `cc`, 
    `price`,
    `update` 
) 
SELECT 7870, 'us', 999, NOW() 
FROM `steam_prices` AS p1
LEFT JOIN `steam_prices` AS p2 ON p1.`steam_id`=p2.`steam_id` AND p1.`update` < p2.`update`
WHERE 
    p2.`steam_id` IS NULL
    AND p1.`steam_id`=7870
    AND p1.`cc`='us'
    AND (
        p1.`price`<>999
    )

答案是首先返回没有较早时间戳的所有行。这是通过组内聚合完成的。仅在时间戳较早的行上将表与其自身连接。如果它无法加入（时间戳不早），那么您就知道该行包含最新的时间戳。这些行在连接表中将有一个 NULL id（连接失败）。

选择具有最新时间戳的所有行后，仅获取 steam_id 是您要查找的 steam_id 并且价格与您输入的新价格不同的行。如果此时该游戏没有具有不同价格的行，则自上次更新以来价格没有更改，因此返回空集。当返回空集时，SELECT 语句将失败并且不会插入任何内容。如果 SELECT 语句成功（发现不同的价格），则它将返回插入到表中的行 7870, 'us', 999, NOW()。

编辑 - 实际上，不久之后我发现了上述查询的错误，并且我已经对其进行了修改。如果价格自上次更新以来发生了变化，上面的查询将插入一个新行，但如果数据库中当前没有该商品的价格，则不会插入行。

为了解决这个问题，我必须利用DUAL表（它总是包含一行），然后在where子句中使用OR来测试不同的价格OR空集

INSERT INTO `steam_prices`( 
    `steam_id`, 
    `cc`, 
    `price`,
    `update` 
) 
SELECT 12345, 'us', 999, NOW() 
FROM DUAL
WHERE
    NOT EXISTS (
        SELECT `steam_id`
        FROM `steam_prices`
        WHERE `steam_id`=12345
    )
    OR
    EXISTS (
        SELECT p1.`steam_id`
        FROM `steam_prices` AS p1 
        LEFT JOIN `steam_prices` AS p2 ON p1.`steam_id`=p2.`steam_id` AND p1.`update` < p2.`update`
        WHERE 
            p2.`steam_id` IS NULL 
            AND p1.`steam_id`=12345 
            AND p1.`cc`='us' 
            AND ( 
                p1.`price`<>999
            )
    )

很长，很丑，而且很复杂。但它的工作原理与广告中的完全一样。如果数据库中没有某个 steam_id 的价格，则会插入一个新行。如果已经存在价格，则它会检查最近更新的价格，如果不同，则插入新行。

After experimentation, and with some help from MySQL Conditional INSERT and http://www.artfulsoftware.com/infotree/queries.php#101, I found a query that worked:

INSERT INTO `steam_prices`( 
    `steam_id`, 
    `cc`, 
    `price`,
    `update` 
) 
SELECT 7870, 'us', 999, NOW() 
FROM `steam_prices` AS p1
LEFT JOIN `steam_prices` AS p2 ON p1.`steam_id`=p2.`steam_id` AND p1.`update` < p2.`update`
WHERE 
    p2.`steam_id` IS NULL
    AND p1.`steam_id`=7870
    AND p1.`cc`='us'
    AND (
        p1.`price`<>999
    )

The answer is to first return all rows where there is no earlier timestamp. This is done with a within-group aggregate. You join a table with itself only on rows where the timestamp is earlier. If it fails to join (the timestamp was not earlier) then you know that row contains the latest timestamp. These rows will have a NULL id in the joined table (failed to join).

After you have selected all rows with the latest timestamp, grab only those rows where the steam_id is the steam_id you're looking for and where the price is different from the new price that you're entering. If there are no rows with a different price for that game at this point then the price has not changed since the last update, so an empty set is returned. When an empty set is returned the SELECT statement fails and nothing is inserted. If the SELECT statement succeeds (a different price was found) then it returns the row 7870, 'us', 999, NOW() which is inserted into our table.

EDIT - I actually found a mistake with the above query a little while later and I have since revised it. The query above will insert a new row if the price has changed since the last update, but it will not insert a row if there are currently no prices in the database for that item.

To resolve this I had to take advantage of the DUAL table (which always contains one row), then use an OR in the where clause to test for a different price OR an empty set

INSERT INTO `steam_prices`( 
    `steam_id`, 
    `cc`, 
    `price`,
    `update` 
) 
SELECT 12345, 'us', 999, NOW() 
FROM DUAL
WHERE
    NOT EXISTS (
        SELECT `steam_id`
        FROM `steam_prices`
        WHERE `steam_id`=12345
    )
    OR
    EXISTS (
        SELECT p1.`steam_id`
        FROM `steam_prices` AS p1 
        LEFT JOIN `steam_prices` AS p2 ON p1.`steam_id`=p2.`steam_id` AND p1.`update` < p2.`update`
        WHERE 
            p2.`steam_id` IS NULL 
            AND p1.`steam_id`=12345 
            AND p1.`cc`='us' 
            AND ( 
                p1.`price`<>999
            )
    )

It's very long, it's very ugly, and it's very complicated. But it works exactly as advertised. If there is no price in the database for a certain steam_id then it inserts a new row. If there is already a price then it checks the price with the most recent update and, if different, inserts a new row.

回复收藏 0 原文

慈悲佛祖 2024-12-08 00:09:46

如果您只需将模式更改为类似以下内容，您可能会发现这更容易：（

steam_id      integer
country       varchar(2)
date          date
price         float
primary key   (steam_id,country,date)

使用其他适当的索引），然后只依次担心每个国家/地区。

换句话说，您的 for 循环具有唯一的 ID/国家/地区组合，因此它可以简单地查询该组合的最新日期记录，并在不同时添加新行。

这将使您的选择变得更加复杂，但我相信这是一个更好的解决方案，尤其如果将来有可能添加更多国家/地区（在这种情况下它不会破坏架构））。

You will probably find this easier if you simply change your schema to something like:

steam_id      integer
country       varchar(2)
date          date
price         float
primary key   (steam_id,country,date)

(with other appropriate indexes) and then only worrying about each country in turn.

In other words, your for loop has a unique ID/country combo so it can simply query the latest-date record for that combo and add a new row if it's different.

That will make your selections a little more complicated but I believe it's a better solution, especially if there's any chance at all that more countries may be added in future (it won't break the schema in that case).

回复收藏 0 原文

流云如水 2024-12-08 00:09:46

首先，我建议您以每个国家/地区较少硬编码的形式存储数据：

+----------+--------------+------------+-------+
| steam_id | country_code | date       | price |
+----------+--------------+------------+-------+
|   112233 | us           | 2011-08-20 | 12.45 |
|   112233 | uk           | 2011-08-20 | 12.46 |
|   112233 | de           | 2011-08-20 | 12.47 |
|   112233 | at           | 2011-08-20 | 12.48 |
|   112233 | us           | 2011-08-21 | 12.49 |
|   ...... | ..           | .......... | ..... |
+----------+--------------+------------+-------+

从这里开始，您将主键放在前三列上...

现在关于不创建额外行的问题...这就是简单事务+应用程序逻辑所擅长的。

启动一个事务
运行一个选择来查看有问题的记录是否存在
如果不存在，则插入一条

该方法有问题吗？

希望这有帮助。

First, I suggest you store your data in a form that is is less hard-coded per country:

+----------+--------------+------------+-------+
| steam_id | country_code | date       | price |
+----------+--------------+------------+-------+
|   112233 | us           | 2011-08-20 | 12.45 |
|   112233 | uk           | 2011-08-20 | 12.46 |
|   112233 | de           | 2011-08-20 | 12.47 |
|   112233 | at           | 2011-08-20 | 12.48 |
|   112233 | us           | 2011-08-21 | 12.49 |
|   ...... | ..           | .......... | ..... |
+----------+--------------+------------+-------+

From here, you place a primary key on the first three columns...

Now for your question about not creating extra rows... That is what a simple transaction + application logic is great at.