容纳百万件物品的最佳收藏?
我想问一个(对我而言)感兴趣的问题。
如果集合包含大量项目(超过 100 万个),则根据标准性能,哪个集合是最佳的。
例如,我创建简单的 List(10000000) 集合并尝试添加大约 500000 个不同的项目。运行后 10 秒内将添加前 30000 个项目,但运行后 1 分钟内收集将仅包含 60000 个项目,5 分钟内将包含 150000 个项目。
据我了解,通过添加新项目,集合中的内存使用存在非线性依赖性(因为每个项目都是在“相似相等”的时间段内创建的)。但我可能会犯错误。
编辑: 你是对的,没有样本还不够清楚。 我正在尝试将树填充为连接列表。 您可以在下面找到示例代码。
public class Matrix
{
public int Id { get; private set; }
public byte[,] Items { get; private set; }
public int ParentId { get; private set; }
public int Lvl { get; private set; }
public int HorizontalCounts
{
get { return 3; }
}
public int VerticalCounts
{
get { return 3; }
}
public Matrix(int id) : this(id, null, 0, 1)
{
}
public Matrix(int id, byte[,] items, int parentId, int lvl)
{
Id = id;
Items = (items ?? (new byte[HorizontalCounts, VerticalCounts]));
ParentId = parentId;
Lvl = lvl;
}
public bool IsEmpty(int hCounter, int vCounter)
{
return (Items[hCounter, vCounter] == 0);
}
public Matrix CreateChild(int id)
{
return (new Matrix(id, (byte[,])Items.Clone(), Id, (Lvl + 1)));
}
}
public class Program
{
public static void Main(string[] args)
{
Matrix node = new Matrix(1);
const int capacity = 10000000;
List<Matrix> tree = new List<Matrix>(capacity) { node };
FillTree(ref tree, ref node);
int l1 = tree.Where(n => (n.Lvl == 1)).Count();
int l2 = tree.Where(n => (n.Lvl == 2)).Count();
int l3 = tree.Where(n => (n.Lvl == 3)).Count();
int l4 = tree.Where(n => (n.Lvl == 4)).Count();
int l5 = tree.Where(n => (n.Lvl == 5)).Count();
}
private static void FillTree(ref List<Matrix> tree, ref Matrix node)
{
for (int hCounter = 0; hCounter < node.HorizontalCounts; hCounter++)
{
for (int vCounter = 0; vCounter < node.VerticalCounts; vCounter++)
{
if (!node.IsEmpty(hCounter, vCounter))
{
continue;
}
int childId = (tree.Select(n => n.Id).Max() + 1);
Matrix childNode = node.CreateChild(childId);
childNode.Items[hCounter, vCounter] = 1;
tree.Add(childNode);
FillTree(ref tree, ref childNode);
}
}
}
}
最新版本:非常抱歉,问题不在于所需收藏中的物品数量。性能问题在这一行: int childId = (tree.Select(n => n.Id).Max() + 1);非常感谢您的回答和评论。
I would like to ask one interested (for me) question.
What collection is the best by criteria performance if collection contains a lot of items (more than 1 million).
By example, I create simple List(10000000) collection and try to add about 500000 different items. First 30000 items will be added in 10 seconds after running, but collection will contain just 60000 items in 1 minute after running and 150000 items in 5 minutes.
As I understand, there is non-linear dependency from memory usage in collection by adding of new item (because every item is creating during "similar equal" time period). But I can make a mistake.
Edit:
You are right it is not clear enough without sample.
I am trying to fill tree as connected list.
You can find sample code below.
public class Matrix
{
public int Id { get; private set; }
public byte[,] Items { get; private set; }
public int ParentId { get; private set; }
public int Lvl { get; private set; }
public int HorizontalCounts
{
get { return 3; }
}
public int VerticalCounts
{
get { return 3; }
}
public Matrix(int id) : this(id, null, 0, 1)
{
}
public Matrix(int id, byte[,] items, int parentId, int lvl)
{
Id = id;
Items = (items ?? (new byte[HorizontalCounts, VerticalCounts]));
ParentId = parentId;
Lvl = lvl;
}
public bool IsEmpty(int hCounter, int vCounter)
{
return (Items[hCounter, vCounter] == 0);
}
public Matrix CreateChild(int id)
{
return (new Matrix(id, (byte[,])Items.Clone(), Id, (Lvl + 1)));
}
}
public class Program
{
public static void Main(string[] args)
{
Matrix node = new Matrix(1);
const int capacity = 10000000;
List<Matrix> tree = new List<Matrix>(capacity) { node };
FillTree(ref tree, ref node);
int l1 = tree.Where(n => (n.Lvl == 1)).Count();
int l2 = tree.Where(n => (n.Lvl == 2)).Count();
int l3 = tree.Where(n => (n.Lvl == 3)).Count();
int l4 = tree.Where(n => (n.Lvl == 4)).Count();
int l5 = tree.Where(n => (n.Lvl == 5)).Count();
}
private static void FillTree(ref List<Matrix> tree, ref Matrix node)
{
for (int hCounter = 0; hCounter < node.HorizontalCounts; hCounter++)
{
for (int vCounter = 0; vCounter < node.VerticalCounts; vCounter++)
{
if (!node.IsEmpty(hCounter, vCounter))
{
continue;
}
int childId = (tree.Select(n => n.Id).Max() + 1);
Matrix childNode = node.CreateChild(childId);
childNode.Items[hCounter, vCounter] = 1;
tree.Add(childNode);
FillTree(ref tree, ref childNode);
}
}
}
}
Latest Edition: I am very sorry, problem was not in amount of items into required collection. Performance problem was in this line: int childId = (tree.Select(n => n.Id).Max() + 1); Thank you very much for your answers and comments.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
这个问题的答案是视情况而定。您是否要进行多次插入而不进行排序?链接列表
您要进行大量查找吗?哈希映射/字典
你打算只拥有一组无序的东西吗?列表和/或数组
你不想要重复的吗?设置
您不想重复,但想要快速查找吗?哈希集
您是否有一个按键排序的有序列表?树形图
The answer to this is it depends. Are you going to be doing many inserts with no sorting? Linked List
Are you going to be doing a lot of lookups? HashMap/Dictionary
Are you going to just have an unordered group of things? List and/or Array
Do you not want duplicates? Set
Do you not want duplicates, but want a fast lookup? HashSet
Do you have an ordered List that is to be sorted by keys? TreeMap
如果您想添加一百万个项目,请像这样创建它:
存储 150 万个引用(或小结构)并不昂贵,让 List 的自适应增长算法分配空间会很昂贵。
If you want to add a million items, create it like:
Storing 1.5 million references (or small structs) isn't expensive, letting List's adaptive grow algorithm allocate the space will be expensive.
除非数组将被创建一次并在应用程序的生命周期中存在,否则我倾向于建议某种类型的嵌套数组,其中每个数组的大小如果包含任何双精度浮点数,则保持在 8000 字节以下 -点数,如果没有则为 85,000 字节。该大小的对象被放置在大对象堆上。与普通堆可以有效地处理许多对象的创建和放弃不同,大对象堆在.net 2.0-3.5下处理得很差,在4.0下仅稍好一些。
如果您不打算进行插入或删除操作,我建议使用 1024 个数组(每个数组包含 1024 个元素)的数组可能是最简单的。通过索引访问元素非常简单,只需将索引右移十位,使用结果选择数组,然后使用底部 10 位查找数组中的项目。
如果需要插入和删除,我建议使用锯齿状数组以及某种数据结构来跟踪每个子数组的逻辑长度,并帮助将索引转换为数组位置。这样做可以避免在执行插入或删除时复制大量数据,但代价是更昂贵的下标操作。
Unless the array is going to be created once and exist for the life of the application, I would be inclined to suggest some type of nested array, where the size of each array is kept below 8000 bytes if it contains any double-precision floating-point numbers, or 85,000 bytes if it does not. Objects that size get placed on the Large Object Heap. Unlike the ordinary heap, which can efficiently handle the creation and abandonment of many objects, the large object heap handles it poorly under .net 2.0-3.5, and only somewhat better under 4.0.
If you will not be doing insertions or deletions, I would suggest that it may be easiest to use an array of 1024 arrays of 1024 elements each. Accessing an element by index would be a simple matter of shifting the index right by ten, using the result to select an array, and then using the bottom 10 bits to find the item within the array.
If insertions and deletions will be required, I would suggest using a jagged array along with some sort of data structure to keep track of the logical length of each sub-array, and to help convert indices into array locations. Doing that would avoid the need to copy large amounts of data when performing an insert or delete, at the cost of more expensive subscripting operations.
如果您事先确切知道有多少个数组,那么您需要一个数组。如果你可以分配一次,然后简单地填充,那么简单的数组就完美了。没有浪费的内存,最快的填充,最快的删除。
You want an array, if you know exactly how many beforehand. If you can allocate once, and then simply fill up, then a simple array is perfect. No wasted memory, fastest to fill, fastest to remove from.
当您处理数百万(或更多)项时,最好使用数组。即使您通过使阵列大于绝对必要而浪费了几千个插槽,所获得的时间效率也可能弥补空间效率的损失。
当然,如果您处理的数据量太大而无法完全存储在内存中,则建议使用基于磁盘的数据结构。
When you're dealing with millions (or more) of items, its best to use an array. Even if you waste a few thousand slots by making your array larger than absolutely necessary, the time efficiency gained may make up for the loss of space efficiency.
Of course, if you're dealing with an amount of data that's too large to store entirely in memory, a disk-based data structure is advisable.