我需要某种数据结构来执行以下操作:
一个由多种类型组成的“集合”,例如字符串、整数、日期时间和双精度。
许多集合是动态添加的
在提取信息的地方动态检索集合
现在,显而易见的解决方案是使用 DataTable。定义数据表结构,并在每次需要添加新集时添加新行。需要时从数据表中提取数据。
实际上我已经使用数据表实现了它,但问题是由于某种原因它非常慢。由于此操作已执行数千到数百万次,因此性能可能会出现问题。
是否有我可以使用的性能更好的替代数据表类型的数据结构,或者我应该使用 Lists<> 构建自己的类? ?
I need a data structure of some sort to do the following:
One "set" composed of many types such as string, integer, datetime and double.
Many sets are added dynamically
The sets are retrieved dynamically where information is pulled
Now the obvious solution is to use a DataTable. Define the datatable structure, and add a new row each time you need to add a new set. Pull data from the datatable when you need to.
Actually I have implemented it already using a datatable, but the problem is it is extremely slow for some reason. Since this is done thousands to millions of times performance can be problematic.
Is there an alternative datatable type of data structure with better performance that I can use or should I build my own class using Lists<> ?
发布评论
评论(3)
根据您的用例,我建议使用
List
(因为您提到了动态架构)作为中央数据结构,但如果稍后需要,您将需要自己维护架构信息在。如果需要将 UI 绑定到数据,这种方法会增加很多额外的手动工作,它更适合大量数据的后台处理。
我们过去曾使用过这种方法,与数据表相比,在批量处理数据时能够节省 2/3 的内存和 80% 的执行时间。
Depending on your use case I would recommend using
List<object[]>
(since you mentioned dynamic schema) as central data structure, but you will need to maintain the schema info yourself if you need it later on.If you need to bind the UI to the data this approach will add a lot of extra manual work, it's better suited for background processing of large amounts of data.
We have used this approach in the past and were able to save 2/3 of memory and 80% of execution time when bulk handling data compared to data tables.
解决此类问题的另一种方法是:使用内存中的 sqlite 数据库。
一开始听起来像是一件奇怪的事情,但您可以将相当复杂的结构放入表中,并且您可以利用 SQL 的全部功能来处理数据。 SQLite 是一个很小的库,因此它不会使您的代码变得臃肿。将数据库集成到代码中一开始可能有点奇怪,性能应该适用于巨大的数据集(因为这就是数据库的用途)。如果您需要将该数据保存到磁盘,那么您已经完成了。
根据问题的详细信息,转移到更大的数据库后端(例如 postgres)甚至可能是一个好主意,但这很难从这里看出。只是不要轻易否定这个想法。
One alternative way of approaching problems like this: use a sqlite database in memory.
Sounds like a weird thing to do at first, but you can put quite complex structures into tables, and you get the whole power of SQL to work on your data. SQLite is a tiny lib, so it won't bloat up your code. Integrating the DB into your code might be kinda strange at first, put performance should work on huge data sets (since that's what DBs are made for). And if you ever need to save that data to disk, you are already done.
Depending on the details of your problem, it might even be a good idea to move to a bigger db back end (e.g. postgres), but that is hard to tell from here. Just don't dismiss this idea too easily.
stackoverflow 上有几个类似的问题,但没有一个提供好的答案。通用替代方案不应该是
List
,因为YourObject
不是通用的。 DataTable 的美妙之处在于它没有数据模型。DataTable 是行的集合,而每行是单元格的集合。单元格可以是字符串或数字。因此,我们可以将 Cell 定义为:
那么一行将是
Dictionary
,其中 string 是列名。然后,DataTable 的替代方案只是一个List>
。假设您将 Rows 定义为
public List>行;
。现在您可以轻松查询行,如下所示:
There are several similar questions on stackoverflow, but none provides a good answer. A generic alternative should not be
List<YourObject>
, becauseYourObject
is not generic. The beauty of DataTable is that it does not have a data model.A DataTable is a collection of rows, while each row is a collection of cells. A cell could be a string or a number. So we can define a Cell as:
Then a row would be
Dictionary<string, Cell>
, where string is the column name. And then a DataTable alternative is simply aList<Dictionary<string, Cell>>
.Let's say you define Rows as
public List<Dictionary<string, Cell>> Rows;
.Now you can easily query the Rows like: