计算降价应用程序或免费应用程序 - App Store

发布于 2024-12-22 09:35:17 字数 784 浏览 2 评论 0原文

我正在开发一个网站,该网站显示应用程序商店中的所有应用程序。我通过 EPF 导入器通过 EPF 数据源获取 AppStore 数据。在该数据库中,我获得了每个商店每个应用程序的定价。该数据集中有几十行,其表结构如下:

application_price

The retail price of an application.

Name           Key  Description

export_date         The date this application was exported, in milliseconds since the UNIX Epoch.
application_id  Y   Foreign key to the application table.
retail_price        Retail price of the application, or null if the application is not available.
currency_code       The ISO3A currency code.
storefront_id   Y   Foreign key to the storefront table.

这是我现在得到的表我的问题是我没有找到任何方法来计算应用程序和新的免费应用程序的价格下降这个特定的数据集。任何人都可以知道我该如何计算它吗? 任何想法或答案都将受到高度赞赏。

我尝试存储以前的数据和当前的数据,然后尝试匹配它。问题是表本身太大,比较导致 JOIN 操作,这使得查询执行时间超过一个小时,这是我无法承受的。表中有大约 60, 000, 000 行

I am working on a Website which is displaying all the apps from the App Store. I am getting AppStore data by their EPF Data Feeds through EPF Importer. In that database I get the pricing of each App for every store. There are dozen of rows in that set of data whose table structure is like:

application_price

The retail price of an application.

Name           Key  Description

export_date         The date this application was exported, in milliseconds since the UNIX Epoch.
application_id  Y   Foreign key to the application table.
retail_price        Retail price of the application, or null if the application is not available.
currency_code       The ISO3A currency code.
storefront_id   Y   Foreign key to the storefront table.

This is the table I get now my problem is that I am not getting any way out that how I can calculate the price reduction of apps and the new free apps from this particular dataset. Can any one have idea how can I calculate it?
Any idea or answer will be highly appreciated.

I tried to store previous data and the current data and then tried to match it. Problem is the table is itself too large and comparing is causing JOIN operation which makes the query execution time to more than a hour which I cannot afford. there are approx 60, 000, 000 rows in the table

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

通过这些字段,您无法直接确定价格下降或新应用程序。您必须将它们插入您自己的数据库中,并从中确定差异。在像 MySQL 这样的关系数据库中,这并不太复杂:

要确定哪些应用程序是新的,您可以添加自己的列“first_seen”,然后查询数据库以显示 first_seen 列所在的所有对象不再是一天的路程了。

要计算降价,您必须计算当前进口的零售价格与之前进口的零售价格之间的差额。


既然您编辑了您的问题,我编辑的答案:

您似乎遇到了存储/性能问题,并且您知道自己想要实现什么目标。为了解决这个问题,您必须开始测量和调试:对于这么大的数据集,您必须确保拥有正确的索引。分析您的查询应该有助于查明它们是否存在。

也许,你的环境是“每天写一次”,“一分钟读很多次”。 (我猜你正在创建一个网站)。因此,您可以通过处理导入时的差异(价格下降和新应用程序)来加速前端,而不是在网站上显示时。

如果您仍然无法解决这个问题,我建议您提出一个更具体的问题,详细说明您的 DBMS、查询等,以便真正的数据库管理员能够帮助您。 6000 万行虽然很多,但如果有正确的索引,对于普通的数据库系统来说应该不是什么真正的麻烦。

With these fields you can't directly determine price drops or new application. You'll have to insert these in your own database, and determine the differences from there. In a relational database like MySQL this isn't too complex:

To determine which applications are new, you can add your own column "first_seen", and then query your database to show all objects where the first_seen column is no longer then a day away.

To calculate price drops you'll have to calculate the difference between the retail_price of the current import, and the previous import.


Since you've edited your question, my edited answer:

It seems like you're having storage/performance issues, and you know what you want to achieve. To solve this you'll have to start measuring and debugging: with datasets this large you'll have to make sure you have the correct indexes. Profiling your queries should helping in finding out if they do.

And probably, your environment is "write once a day", and read "many times a minute". (I'm guessing you're creating a website). So you could speed up the frontend by processing the differences (price drops and new application) on import, rather than when displaying on the website.

If you still are unable to solve this, I suggest you open a more specific question, detailing your DBMS, queries, etc, so the real database administrators will be able to help you. 60 million rows are a lot, but with the correct indexes it should be no real trouble for a normal database system.

向日葵 2024-12-29 09:35:17

将该表与您前一天下载的表进行比较,并记下差异。

额外:
对于仅 6000 万件商品,并且在现代 PC 上,您应该能够在内存中存储商店 ID 号和先前价格的排序数组,并且比从网络源到达的数据更快地进行数组查找。标记发现的任何差异,并在后处理中根据数据库仔细检查它们。

Compare the table with one you've downloaded the previous day, and note the differences.

Added:
For only 60 million items, and on a contemporary PC, you should be able to store a sorted array of the store id numbers and previous prices in memory, and do an array lookup faster than the data is arriving from the network feed. Mark any differences found and double-check them against the DB in post-processing.

孤寂小茶 2024-12-29 09:35:17

实际上我也尝试使用这些数据,我认为最适合你的方法是基于苹果的数据。

您有两种类型的数据:完整数据和增量数据(每日更新数据)。因此,在来自增量的新数据(不像完整数据那么大)中,您可以仅比较更新的记录并将它们插入到另一个表中以确定定价已更改。

因此,您有一个每天更新的记录列表(应用程序、歌曲、视频...),价格发生变化,只需从您创建的新表中获取数据,而不是从各个表中比较或连接它们。

干杯

Actually I also trying to play with these data, and I think best approach for you base on data from Apple.

You have 2 type of data : full and incremental (updated data daily). So within new data from incremental (not really big as full) you can compare only which record updated and insert them into another table to determine pricing has changed.

So you have a list of records (app, song, video...) updated daily with price has change, just get data from new table you created instead of compare or join them from various tables.

Cheers

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文