用于查找独特项目的良好 .NET 数据结构是什么?

发布于 2024-08-18 04:30:05 字数 629 浏览 4 评论 0原文

我有大量从系统中的查询中检索到的自定义对象。假设这些对象都有 5 个不同的属性 - 名字、姓氏、性别、邮政编码和生日。对于每个不同的属性,我希望能够获取所有唯一值及其计数的列表,并按降序对它们进行排序。它是一种多面导航系统。因此,如果我的初始查询中有大约 5000 个结果,那么我希望能够显示从最流行到最不流行的前 10 个名字以及旁边的计数。然后其他属性也一样。

目前,我有一个例程,一次检查每个项目并检查不同的属性,并保留一堆不同的哈希表和信息。它可以工作,但速度超级慢。我认为一次一项地浏览每一项的效率不是很高。我是否可以使用其他类型的 C# 结构来更轻松地获取此类信息?我知道 SQL Server 在此类事情上做得很好 - 但我认为这实际上不太可能。我正在从不同系统的 API 获取自定义对象列表。因此,我必须以某种方式获取该对象列表并将它们放入临时表中,这违背了我认为的目的。另外,我认为 SQL Server 临时表是特定于连接的,我的应用程序会重复使用连接。

编辑:我试图避免的是必须迭代列表并处理每个单独的项目。我想知道是否有某种数据结构可以让我立即查询整个列表(如数据库)并获取信息。问题是我们的前端 Web 服务器正在遭受重创,因为我们的服务器上有大量流量,人们正在访问这些多面导航页面,我正在寻找一种更有效的方法来做到这一点。

有什么想法吗?

谢谢, 科里

I have a large collection of custom objects that I have retrieved from a query in my system. Let's say these objects all have 5 different properties - FirstName, LastName, Gender, ZipCode and Birthday. For each of the different properties I would like to be able to get a list of all of the unique values and their counts and sort them in descending order. It is sort of a faceted navigation system. So if I have like 5000 results in my initial query then I would like to be able to display the top 10 FirstNames from most popular to least popular with the count next to it. And then the same with the other properties.

Currently I have a routine that goes through each item one at a time and examines the different properties and keeps a bunch of different hashtables with the information. It works but it is super slow. I think that going through each item one at a time is not very efficient. Is there some other type of C# structure I could use that would make getting this type of information easier? I know that SQL Server does a great job of this type of thing - but I don't think that is really a possibility here. I'm getting my list of custom objects from the API of a different system. So I would have to then take that list of objects and put them in to a temp table somehow and that sort of defeats the purpose I think. Plus SQL Server temp tables are connection specific I think and my app would re-use connections.

EDIT: What I am trying to avoid is having to iterate through the list and process each individual item. I was wondering if there was some data structure that would allow me to sort of query the whole list at once (like a database) and get the information. The problem is that our front end web server is just getting hammered because we have a lot of traffic on the server and people are hitting these faceted nav pages and I am looking for a more efficient way of doing it.

Any ideas?

Thanks,
Corey

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

若无相欠,怎会相见 2024-08-25 04:30:05

不幸的是,我很确定你的问题的答案是“不”。如果您获取数据的唯一方式是未索引的 List,那么某些内容将必须逐一浏览这些项目并进行分析将它们用于 Top-N 或创建索引。即使您将其传递给另一个工具(临时数据库或第三方数据结构),您也只是将处理放在其他地方,您的 CPU 也会同样运转。您在原始问题中概述的解决方案似乎是最合理的做法。

一些建议:

  • 这些 Top-N 列表对于所有用户来说都是相同的吗?还是可以将它们分为不同数量的用例?您可以获取它们一次并将它们存储在网络缓存中。也许设置一个后台进程每 M 分钟更新一次,以使它们保持最新状态。
  • 这只是UI感知问题吗?您能否先计算并显示最重要的结果,然后在后台计算其他结果并异步传递到页面?
  • 请求 API 提供商提供更可靠的方法来获取结果? :)
  • 投入更多硬件? :)

很抱歉没有回答,但我认为这里没有灵丹妙药。

Unfortunately, I'm pretty sure the answer to your question is, "No." If the only way you have of getting your data is an unindexed List<MyObject>, then something is going to have to go through those items one-by-one and analyze them for Top-N or create indices. Even if you pass that on to another tool (a temp database or third party data structure), you're just putting the processing somewhere else and your CPU will crank just as much. The solution you outline in your original question seems like the most reasonable thing to do.

A few suggestions:

  • Are these Top-N lists the same for all users, or could they be broken into a distinct number of use cases? You could get them once and store them in web cache. Maybe set a background process to update them every M minutes to keep them somewhat up-to-date.
  • Is it just a UI perception problem? Could you calculate and display the most important results first and then calculate the others in the background and deliver to the page asynchronously?
  • Beg the API provider for a more robust way to get results?? :)
  • Throw more hardware at it?? :)

Sorry for the non-answer, but I don't think there's a magic bullet here.

蓝咒 2024-08-25 04:30:05

i4o - 索引 LINQ http://www.codeplex.com/i4o 允许在对象上放置索引。

它基本上为 clr 提供 RDBMS 风格的索引。

您是否使用 DBMS 进行初始查询?在这种情况下,答案是:
为什么不直接设计特定的 SQL 查询呢?

i4o - Indexed LINQ http://www.codeplex.com/i4o allows to put indexes on objects.

It basically provides RDBMS-style indexing for clr.

Are you using a DBMS for your initial query? In this case the answer would be:
Why not just design specific SQL queries?

鸢与 2024-08-25 04:30:05

每个属性保留一本字典应该可以正常工作。有多慢?您能向我们展示您正在使用的代码吗?眨眼间就应处理 5000 件物品。

您使用的是.NET 3.5吗?如果是这样,LINQ 可以帮助您解决很多问题 - 特别是,依次对每个属性使用 ToLookup 效果会非常好。

Keeping one dictionary per property should work fine. How slow is it? Can you show us the code you're using? 5000 items should be processed in the blink of an eye.

Are you using .NET 3.5? If so, LINQ could help you with a lot of this - in particular, using ToLookup with each property in turn would work pretty well.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文