容纳百万件物品的最佳收藏？

发布于 2024-09-17 09:06:19 字数 2592 浏览 4 评论 0原文

我想问一个（对我而言）感兴趣的问题。

如果集合包含大量项目（超过 100 万个），则根据标准性能，哪个集合是最佳的。

例如，我创建简单的 List(10000000) 集合并尝试添加大约 500000 个不同的项目。运行后 10 秒内将添加前 30000 个项目，但运行后 1 分钟内收集将仅包含 60000 个项目，5 分钟内将包含 150000 个项目。

据我了解，通过添加新项目，集合中的内存使用存在非线性依赖性（因为每个项目都是在“相似相等”的时间段内创建的）。但我可能会犯错误。

编辑：你是对的，没有样本还不够清楚。我正在尝试将树填充为连接列表。您可以在下面找到示例代码。

public class Matrix
{
    public int Id { get; private set; }
    public byte[,] Items { get; private set; }
    public int ParentId { get; private set; }
    public int Lvl { get; private set; }
    public int HorizontalCounts
    {
        get { return 3; }
    }

    public int VerticalCounts
    {
        get { return 3; }
    }

    public Matrix(int id) : this(id, null, 0, 1)
    {
    }

    public Matrix(int id, byte[,] items, int parentId, int lvl)
    {
        Id = id;
        Items = (items ?? (new byte[HorizontalCounts, VerticalCounts]));
        ParentId = parentId;
        Lvl = lvl;
    }

    public bool IsEmpty(int hCounter, int vCounter)
    {
        return (Items[hCounter, vCounter] == 0);
    }

    public Matrix CreateChild(int id)
    {
        return (new Matrix(id, (byte[,])Items.Clone(), Id, (Lvl + 1)));
    }
}

public class Program
{
    public static void Main(string[] args)
    {
        Matrix node = new Matrix(1);
        const int capacity = 10000000;
        List<Matrix> tree = new List<Matrix>(capacity) { node };

        FillTree(ref tree, ref node);

        int l1 = tree.Where(n => (n.Lvl == 1)).Count();
        int l2 = tree.Where(n => (n.Lvl == 2)).Count();
        int l3 = tree.Where(n => (n.Lvl == 3)).Count();
        int l4 = tree.Where(n => (n.Lvl == 4)).Count();
        int l5 = tree.Where(n => (n.Lvl == 5)).Count();
    }

    private static void FillTree(ref List<Matrix> tree, ref Matrix node)
    {
        for (int hCounter = 0; hCounter < node.HorizontalCounts; hCounter++)
        {
            for (int vCounter = 0; vCounter < node.VerticalCounts; vCounter++)
            {
                if (!node.IsEmpty(hCounter, vCounter))
                {
                    continue;
                }

                int childId = (tree.Select(n => n.Id).Max() + 1);
                Matrix childNode = node.CreateChild(childId);
                childNode.Items[hCounter, vCounter] = 1;

                tree.Add(childNode);

                FillTree(ref tree, ref childNode);
            }
        }
    }
}

最新版本：非常抱歉，问题不在于所需收藏中的物品数量。性能问题在这一行： int childId = (tree.Select(n => n.Id).Max() + 1);非常感谢您的回答和评论。

原文

I would like to ask one interested (for me) question.

What collection is the best by criteria performance if collection contains a lot of items (more than 1 million).

By example, I create simple List(10000000) collection and try to add about 500000 different items. First 30000 items will be added in 10 seconds after running, but collection will contain just 60000 items in 1 minute after running and 150000 items in 5 minutes.

As I understand, there is non-linear dependency from memory usage in collection by adding of new item (because every item is creating during "similar equal" time period). But I can make a mistake.

Edit:
You are right it is not clear enough without sample.
I am trying to fill tree as connected list.
You can find sample code below.

public class Matrix
{
    public int Id { get; private set; }
    public byte[,] Items { get; private set; }
    public int ParentId { get; private set; }
    public int Lvl { get; private set; }
    public int HorizontalCounts
    {
        get { return 3; }
    }

    public int VerticalCounts
    {
        get { return 3; }
    }

    public Matrix(int id) : this(id, null, 0, 1)
    {
    }

    public Matrix(int id, byte[,] items, int parentId, int lvl)
    {
        Id = id;
        Items = (items ?? (new byte[HorizontalCounts, VerticalCounts]));
        ParentId = parentId;
        Lvl = lvl;
    }

    public bool IsEmpty(int hCounter, int vCounter)
    {
        return (Items[hCounter, vCounter] == 0);
    }

    public Matrix CreateChild(int id)
    {
        return (new Matrix(id, (byte[,])Items.Clone(), Id, (Lvl + 1)));
    }
}

public class Program
{
    public static void Main(string[] args)
    {
        Matrix node = new Matrix(1);
        const int capacity = 10000000;
        List<Matrix> tree = new List<Matrix>(capacity) { node };

        FillTree(ref tree, ref node);

        int l1 = tree.Where(n => (n.Lvl == 1)).Count();
        int l2 = tree.Where(n => (n.Lvl == 2)).Count();
        int l3 = tree.Where(n => (n.Lvl == 3)).Count();
        int l4 = tree.Where(n => (n.Lvl == 4)).Count();
        int l5 = tree.Where(n => (n.Lvl == 5)).Count();
    }

    private static void FillTree(ref List<Matrix> tree, ref Matrix node)
    {
        for (int hCounter = 0; hCounter < node.HorizontalCounts; hCounter++)
        {
            for (int vCounter = 0; vCounter < node.VerticalCounts; vCounter++)
            {
                if (!node.IsEmpty(hCounter, vCounter))
                {
                    continue;
                }

                int childId = (tree.Select(n => n.Id).Max() + 1);
                Matrix childNode = node.CreateChild(childId);
                childNode.Items[hCounter, vCounter] = 1;

                tree.Add(childNode);

                FillTree(ref tree, ref childNode);
            }
        }
    }
}

Latest Edition: I am very sorry, problem was not in amount of items into required collection. Performance problem was in this line: int childId = (tree.Select(n => n.Id).Max() + 1); Thank you very much for your answers and comments.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一口甜 2024-09-24 09:06:19

这个问题的答案是视情况而定。您是否要进行多次插入而不进行排序？链接列表
您要进行大量查找吗？哈希映射/字典
你打算只拥有一组无序的东西吗？列表和/或数组
你不想要重复的吗？设置
您不想重复，但想要快速查找吗？哈希集
您是否有一个按键排序的有序列表？树形图

回复收藏 0 原文

携余温的黄昏 2024-09-24 09:06:19

如果您想添加一百万个项目，请像这样创建它：

var myList = new List<MyItem>(1500000);

存储 150 万个引用（或小结构）并不昂贵，让 List 的自适应增长算法分配空间会很昂贵。

If you want to add a million items, create it like:

var myList = new List<MyItem>(1500000);

Storing 1.5 million references (or small structs) isn't expensive, letting List's adaptive grow algorithm allocate the space will be expensive.

回复收藏 0 原文

π浅易 2024-09-24 09:06:19

除非数组将被创建一次并在应用程序的生命周期中存在，否则我倾向于建议某种类型的嵌套数组，其中每个数组的大小如果包含任何双精度浮点数，则保持在 8000 字节以下 -点数，如果没有则为 85,000 字节。该大小的对象被放置在大对象堆上。与普通堆可以有效地处理许多对象的创建和放弃不同，大对象堆在.net 2.0-3.5下处理得很差，在4.0下仅稍好一些。

如果您不打算进行插入或删除操作，我建议使用 1024 个数组（每个数组包含 1024 个元素）的数组可能是最简单的。通过索引访问元素非常简单，只需将索引右移十位，使用结果选择数组，然后使用底部 10 位查找数组中的项目。

如果需要插入和删除，我建议使用锯齿状数组以及某种数据结构来跟踪每个子数组的逻辑长度，并帮助将索引转换为数组位置。这样做可以避免在执行插入或删除时复制大量数据，但代价是更昂贵的下标操作。

回复收藏 0 原文