理论上的大容量问题,.NET中无法使用集合进行排序

发布于 2024-08-21 05:31:29 字数 747 浏览 8 评论 0原文

请原谅这篇文章的标题,但我真的想不出更有创意的标题。

我正在调用第三方网络服务,作者正在对最近的交易结果进行排序。总交易计数大于 100 000。更有趣的是,Web 服务会发送代表每个交易的复杂对象,因此如果我一次请求全部 100 000 个交易,则会发生超时。因此,对此 Web 服务的调用需要进行批量处理,以便一次仅返回 1000 条记录。这意味着对此 Web 服务的单独调用有 100 次。

到目前为止,一切都很好,除了交易需要从最旧到最新进行处理,所以我需要一个地方暂时保存这些交易的 ID,以便稍后我可以按正确的顺序回忆 ID(从最旧到最新)在我对它们进行排序之后。

我在这个解决方案中缺少的是 RDBMS,我正在考虑使用文本文件来存储值。

请原谅冗长的介绍,如果您还醒着,请考虑以下事项:

(1)

  1. 如果我只是将值存储在文本文件中,最终会在文本文件中以错误的顺序出现超过 100 000 行,这意味着我必须实现一种从下到上读取文件的方法,
  2. 我不确定,但可能会附加到现有文本文件的开头,而不会造成任何性能损失,这样一旦创建文件,我就可以使用内置的.net 从顶部读取文件 ->向下。
  3. 我可以连接一个文本 odbc 驱动程序,也许还可以使用一些 SQL order by 子句,但我以前从未这样做过,而且我不想向我的应用程序添加任何更多部署步骤。
  4. 也许使用文本文件不是可行的方法,也许对于我不知道的这个问题有更好的解决方案。

这是一个架构/物流问题,任何帮助将不胜感激,谢谢

Excuse the title of this post, but I can't really think of a more creative title.

I am calling a 3rd party web service where the authors are ordering transaction results from most recent. The total transaction count is greater than 100 000. To make matters more interesting the web service sends down complex objects representing each transaction, so if I ask for all 100 000 at once, a timeout will occur. So calls to this web service needs to be batched to return only 1000 records at once. This means 100 individual calls to this web service.

So far all is good, except the transactions need to be processed from oldest to newest, so I need a place to temporarily hold JUST the IDs of these transactions, so that later I can recall the IDs in the correct order (oldest to newest) after I have sorted them.

What I am missing in this solution is an RDBMS, I am thinking of using a text file to store the values.

Excuse the long intro, if you're still awake here are the considerations:

(1)

  1. If I just store the values in a text file, I'll end up with over 100 000 lines in the text file in the wrong order, meaning I have to implement a way to read the file from bottom to top
  2. I am not sure, but there might be append to beginning of an existing text file without any performance penalties, in this way once the file is created, I could use built in .net to read the file from top -> down.
  3. I could hook up a text odbc driver and perhaps use some SQL order by clause, but I've never done this before, and I don't want to add any more deployment steps to my app.
  4. Perhaps using a text file is not the way to go, maybe there is a better solution out there for this problem I am not aware of.

This is an architecture / logistics question, any assistance would be appreciated, thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

晨光如昨 2024-08-28 05:31:29

如果您在典型的 PC/服务器类计算机上运行,​​则存储 100,000 个 ID 和关联时间戳的内存并不被认​​为是大容量。考虑使用内存中的排序列表。

如果您确实想写入文件,可以使用 File.ReadAllLines 并向后迭代结果字符串数组。

If you're running on a typical PC/Server class machine, memory to store 100,000 ID's and associated timestamps is not considered large volume. Consider using an in-memory sorted list.

If you really want to write to a file, you could use File.ReadAllLines and iterate through the resulting string array backwards.

夜巴黎 2024-08-28 05:31:29

如果它们只是 ID,那么您首先肯定需要使用文件吗?

假设它们是 32 字节 ID……其中 100,000 个仍然只有 3MB 多一点。你真的那么需要记忆力吗?

我肯定会尝试从内存中解决方案开始 - 确保在最坏的情况下(例如,预期体积的两倍)它会没问题,然后继续使用。

基本道德是不要太害怕听起来很大的数字:对于人类来说,100,000 个项目可能很多,但除非每个项目有相当多的数据,否则对于现代计算机来说这只是微不足道的。

If they're just IDs, do you definitely need to use a file in the first place?

Suppose they're 32-byte IDs... 100,000 of them is still only just over 3MB. Are you really that pushed for memory?

I would definitely try for an in-memory solution to start with - make sure it's going to be okay in the worst conceivable case (e.g. double your expected volume) but then go for it.

The basic moral is not to be too scared of numbers which sound big: 100,000 items may be a lot in human terms, but unless there's quite a lot of data per item, it's peanuts for a modern computer.

熊抱啵儿 2024-08-28 05:31:29

您可以尝试将信息存储在 DataSet / DataTable 组合中,并在从中获取数据时使用附加到 DataSet 的 DataView 来更改排序顺序。

根据您从 Web 服务返回的 XML 的结构,您也许能够将其直接读入 DataSet 并让它为您将其解析为 DataTable(如果可行,我会选择这样做)简单因素)。

此方法将涉及最少的代码 - 但您必须评估包含 100,000 个项目的数据集的性能。

我应该注意的是,我建议您以这种方式存储整个交易(包括 ID),然后您将拥有需要处理的所有数据,并且您可以按照您指定的任何排序顺序循环遍历它。

我的印象是,您最初只是存储 ID,对它们进行排序 - 然后重新查询 Web 服务以获取排序顺序中的每个 ID,但这意味着要为相同的数据访问该服务两次。如果可能的话我会避免这种情况。

You might try storing the information in a DataSet / DataTable combination, and using a DataView attached to the DataSet to change the sort order when you get your data out of it.

Depending on the structure of the XML you are getting back from the Web service, you might be able to read it directly into the DataSet and let it parse it into the DataTables for you (if that works, I'd go for it for the simplicity factor).

This method would involve the least code - but you would have to evaluate the performance of the DataSet with the 100,000 items in it.

I should note that I'm suggesting you store the entire transaction this way (including the ID) then you will have all the data you need to process, and you can loop through it in any sorted order you specify.

I get the impression that you were originally going to just store the IDs, sort them - then re-query the Web service for each id in your sorted order but that would mean hitting the service twice for the same data. I'd avoid that if possible.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文