C# 匹配内存数据中的固定属性

发布于 2024-11-06 23:17:16 字数 1124 浏览 1 评论 0原文

伙计们,这让我绞尽脑汁

,我在 Linq 中有一个缓慢的实现,我需要加快速度。

我需要尽可能最快的方法来比较内存(200,000)行中的大量只读数据。

每行都有一个名称和 10 个表示名称属性的整数。

Brake   8 7 3 2 1 0 4 3 2
Skull   8 7 3 2 1 0 4 3 2
Napkin  3 0 5 3 2 1 3 1 0

每个项目的任何单个属性都不会超过 8 个。所以每个值都可以很好地适合双

精度:

Brake:873,210,432

现在我的问题是我需要将这些值与每个字段的最大数量进行比较

,即:

500,000,000 将返回前 2 个,

000,001,000 将仅返回 Napkin,因为其他 2 个没有至少有 1 个处于第 4 位置。

我愿意接受任何尽可能的解决方案。会有很多比较,所以速度是我在这里唯一担心的事情。

我当前的实现是这样的:

Items.Where(c => c.A <= 5).Where(c => c.B <= 0).Where(c => c.C <= 0)
.Where(c => c.D <= 0).Where(c => c.E <= 0).Where(c => c.F <= 0)
.Where(c => c.H <= 0).Where(c => c.I <= 0).Where(c => c.J <= 0)

或者(它们的运行速度大致相同)

Items.Where(c => c.A <= 5 && c.B <= 0 && c.C <= 0
 && c.D <= 0 && c.E <= 0 && c.F <= 0 && c.H <= 0
&& c.I <= 0 && c.J <= 0)

Guys this is racking my brain

I have a slow implementation in Linq that I need to speed up.

I need the fastest possible way to compare a large set of readonly data in memory (200,000) rows.

Each row has a name and 10 integers representing properties on the name

Brake   8 7 3 2 1 0 4 3 2
Skull   8 7 3 2 1 0 4 3 2
Napkin  3 0 5 3 2 1 3 1 0

Each item will never have more then 8 in any single property. So each value can fit nicely into a double

ex:

Brake:873,210,432

Now my problem is I need to compare these values to a maximum number for each field

ie:

500,000,000 would return the first 2 and

000,001,000 would only return Napkin because the other 2 don't have at least 1 in the 4th position.

I'm open to any soltuion that is as fast as possible. There will be a lot of comparisons so speed is the only thing I'm worried about here.

My current implementation would be something like this:

Items.Where(c => c.A <= 5).Where(c => c.B <= 0).Where(c => c.C <= 0)
.Where(c => c.D <= 0).Where(c => c.E <= 0).Where(c => c.F <= 0)
.Where(c => c.H <= 0).Where(c => c.I <= 0).Where(c => c.J <= 0)

Or (they both run about the same speed)

Items.Where(c => c.A <= 5 && c.B <= 0 && c.C <= 0
 && c.D <= 0 && c.E <= 0 && c.F <= 0 && c.H <= 0
&& c.I <= 0 && c.J <= 0)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

也只是曾经 2024-11-13 23:17:16

实际上,有一个比我原来的答案更好的方法,但它需要每个项目 5 位,而不是每个项目 4 位。我将用 16 位值中的三个项目来说明它。您可以使用长整数的前 50 位将其扩展到 10 个项目。

想象一下,您将三个 4 位计数器保存在一个 16 位整数中。每个计数器都有一个用于计算的附加位(见下文)。因此,带有前三个计数器的 Brake 项目将是:

       3     7     8
Brake 00011 00111 01000

现在,您需要与“504”匹配的内容,即第一列中至少有五个项目,第二列中至少有五个项目,第三列中至少有四个项目。该掩码将是:

         4     0     5
Search 00100 00000 00101

从逻辑上讲,我们想要从 Brake 中减去搜索,并知道所有必需字段是否大于或等于零。这就是额外位的用武之地。如果我们在每个字段上设置该额外位,我们将得到:

Brake 10011 10111 11000

现在,如果我们在这种情况下进行减法,我们最终会得到:

Brake - Search = 01111 10111 10011

请注意,结果最左侧字段中的高位为 0。这表明制动值该字段中的数字小于我们要查找的值。

现在,如何在代码中实现此功能。在我的示例中,使用 short,我会:

short Brake = (short)((3 << 10) | (7 << 5) | 8);
short Search = (short)((4 << 10) | (0 << 5) | 5);

short CarryMask = 0x4210;  // sets the high bit in each field.
                           // corresponds to 0100001000010000

// This would have to be done for each value that you want to compare.
short MaskedValue = (short)(Brake | CarryMask);
short diff = (short)(MaskedValue - Search);
short rslt = (short)(CarryMask & diff);

// rslt should equal CarryMask
if (rslt == CarryMask)
{
    // All fields match
}
else
{
    // At least one field doesn't match
}

您可以通过查看 rslt 中的位来确定哪些字段不匹配。每第五位(即位 4、9 和 14)应为 1。如果该位为 0,则该字段不满足最小值。

所以要比较单个项目,你必须进行三个操作和一个比较。这将比我之前的回答快得多。

实现

下面的实现假设您在字节数组中拥有每个名称的 10 个值。第一个方法 CreateValue 构建一个代表这 10 个值的长整数。

long CreateValue(byte[] values)
{
    // probably should check here to ensure that values is 10 bytes long.
    long val = 0;
    foreach (var b in values)
    {
        // do error check. If b > 15, then this is going to fail.
        val = (val << 5) | b;
    }
    return val;
}

我假设您拥有某种格式的数据,您可以方便地将每个项目的字段值放入 10 个字节的数组中。您需要添加一个 CombinedValue 属性来保存组合值。或者也许您有一个并行数据结构来保存组合值。无论如何,您必须在程序启动时执行此循环一次,以创建将用于比较的数据(或者如果可以更新各个字段,则可能更新该值)。

foreach (var item in DataItems)
{
    byte[] values = GetValuesFromItem(item);
    // if you can't store it in the item, put it in a parallel array or list.
    item.CombinedValue = CreateValue(values);
}

我还假设,当需要搜索时,您可以将要查找的值放入字节数组中,然后调用 CreateValue 来获取您要搜索的组合值。现在您所需要的只是进行比较的方法。

// Carry mask has every 5th bit set.
// This is actually the mask for 12 values.
// That's okay, since nothing will affect those higher bits.
const long CarryMask = 0x842108421084210L;

bool ItemMatches(long itemValue, long searchFor)
{
    long maskedValue = itemValue | CarryMask;
    long diff = maskedValue - searchFor;
    long rslt = diff & CarryMask;
    return (rslt == CarryMask);
}

那么,搜索您的列表就变得非常简单:

long searchFor = CreateValue(array_of_values);
foreach (var item in items)
{
    if (ItemMatches(item.CombinedValue, searchFor)
    {
        // The item matches the criteria you specified
    }
}

Actually, there's a better way than my original answer, but it requires 5 bits per item rather than 4 bits per item. I'll illustrate it with three items in a 16-bit value. You can extend it to your 10 items using the first 50 bits of a long integer.

Imagine that you're saving three 4-bit counters in a 16-bit integer. Each counter has an additional bit that's used for calculation (see below). So your Brake item with the first three counters would be:

       3     7     8
Brake 00011 00111 01000

Now, you want things that match "504", that is at least five items in the first column, any number in the second column, and at least four in the third column. That mask would be:

         4     0     5
Search 00100 00000 00101

Logically, we want to subtract the Search from the Brake, and know if all the required fields are greater than or equal to zero. That's where the extra bit comes in. If we set that extra bit on each of the fields, we have:

Brake 10011 10111 11000

Now, if we do a subtraction in this case we end up with:

Brake - Search = 01111 10111 10011

Note that the high bit in the leftmost field of the result is 0. This indicates that the number that was in that field of the Brake value was smaller than the value we were looking for.

Now, how to make this work in code. In my example, using short, I'd have:

short Brake = (short)((3 << 10) | (7 << 5) | 8);
short Search = (short)((4 << 10) | (0 << 5) | 5);

short CarryMask = 0x4210;  // sets the high bit in each field.
                           // corresponds to 0100001000010000

// This would have to be done for each value that you want to compare.
short MaskedValue = (short)(Brake | CarryMask);
short diff = (short)(MaskedValue - Search);
short rslt = (short)(CarryMask & diff);

// rslt should equal CarryMask
if (rslt == CarryMask)
{
    // All fields match
}
else
{
    // At least one field doesn't match
}

You can determine which fields don't match by looking at the bits in the rslt. Every fifth bit (i.e. bits 4, 9, and 14) should be 1. If the bit is 0, then that field didn't meet the minimum.

So to compare a single item, you have to do three operations and a comparison. That's going to be a whole lot faster than my earlier answer.

Implementation

The implementation below assumes that you have the 10 values for each name in an array of bytes. The first method, CreateValue builds a long integer that represents those 10 values.

long CreateValue(byte[] values)
{
    // probably should check here to ensure that values is 10 bytes long.
    long val = 0;
    foreach (var b in values)
    {
        // do error check. If b > 15, then this is going to fail.
        val = (val << 5) | b;
    }
    return val;
}

I'll assume that you have your data in some format where you can conveniently put the field values for each item into an array of 10 bytes. You'll want to add a CombinedValue property that holds the combined values. Or perhaps you have a parallel data structure to hold the combined values. Anyway, you have to do this loop once at program startup to create the data that you'll use for comparison (or perhaps update the value if you can update the individual fields).

foreach (var item in DataItems)
{
    byte[] values = GetValuesFromItem(item);
    // if you can't store it in the item, put it in a parallel array or list.
    item.CombinedValue = CreateValue(values);
}

I'll assume, too, that when it comes time to search, you can put the values you're looking for into an array of bytes and call CreateValue to get the combined value you're searching for. Now all you need is the method that will do the comparison.

// Carry mask has every 5th bit set.
// This is actually the mask for 12 values.
// That's okay, since nothing will affect those higher bits.
const long CarryMask = 0x842108421084210L;

bool ItemMatches(long itemValue, long searchFor)
{
    long maskedValue = itemValue | CarryMask;
    long diff = maskedValue - searchFor;
    long rslt = diff & CarryMask;
    return (rslt == CarryMask);
}

Searching your list, then, becomes very simple:

long searchFor = CreateValue(array_of_values);
foreach (var item in items)
{
    if (ItemMatches(item.CombinedValue, searchFor)
    {
        // The item matches the criteria you specified
    }
}
⊕婉儿 2024-11-13 23:17:16

您的数据由类似于以下的结构表示:

public abstract class Widget
{
  public string Name   { get ; private set ; }
  public byte[] Values { get ; private set ; }
}

无论是字节数组还是从中提取的位字段,我怀疑与性能相当无关。

一种方法是构建 8 个并行数组,其中包含对小部件的引用,每个此类数组都在不同的值列上排序。然后,查找包括对所需值的二分搜索。

另一种方法是读取源数据一次,并填充高度平衡的二叉搜索树数组,其键是所需列的值,其值是共享该特定列值的 Widget 列表。每列都需要一个这样的搜索树。显然,这种方法会消耗内存来提高速度——高度平衡二叉树中的查找是一个 O(log N) 操作。使用树意味着可以在集合中添加和删除项目,而无需施加大量开销。

虽然您可以编写自己的树实现,但我不愿意。这是一个使用 C5 Collections Library 的实现(无论如何,你的工具箱中应该有这个东西)因为我讨厌发明轮子:

using System;
using System.Collections.Generic;

using C5;

namespace ConsoleApplication13
{

    class Program
    {

        static void Main( string[] args )
        {
            IEnumerable<Widget>  sourceData  = ReadWidgets();
            WidgetSearcher lookup = new WidgetSearcher( sourceData );

            // find all the Widgets where column 2 is >= 5 ;
            Widget[] results1 = lookup.Search( 2 , 5 ).ToArray();

            // find all the Widgets where column 0 is >= 3 ;
            Widget[] results2 = lookup.Search( 0 , 3 ).ToArray();

            return ;

        }

        private static IEnumerable<Widget> ReadWidgets()
        {
            //TODO: we need source data from somewhere. It gets provided here.
            throw new NotImplementedException();
        }

    }

    public class Widget
    {
        public const int ValueCount = 8;

        public string Name { get; private set; }
        public byte[] Values
        {
            get
            {
                return (byte[])_values.Clone();
            }
        }
        private byte[] _values;

        public Widget( string name , byte[] values )
        {
            if ( name==null )
                throw new ArgumentNullException( "name" );
            if ( name.Trim()=="" )
                throw new ArgumentOutOfRangeException( "name" );
            if ( values==null )
                throw new ArgumentNullException( "values" );
            if ( values.Length!=ValueCount )
                throw new ArgumentOutOfRangeException( "values" );

            this.Name=name;
            this._values=values;
            return;
        }
        /// <summary>
        /// private constructor for search instances
        /// </summary>
        /// <param name="column"></param>
        /// <param name="value"></param>
        private Widget( int column , byte value )
        {
            this.Name=null;
            this._values=new byte[Widget.ValueCount];

            this._values.Initialize();
            this._values[column]=value;

            return;
        }

        public class Comparer : IComparer<Widget> , IEqualityComparer<Widget>
        {
            private int ColumnToCompare;

            public Comparer( int colNum )
            {
                if ( colNum<0||colNum>=Widget.ValueCount )
                    throw new ArgumentOutOfRangeException( "colNum" );
                this.ColumnToCompare=colNum;
            }

            #region IComparer<Widget> Members

            public int Compare( Widget x , Widget y )
            {
                return (int)x._values[this.ColumnToCompare]-(int)y._values[this.ColumnToCompare];
            }

            #endregion

            #region IEqualityComparer<Widget> Members

            public bool Equals( Widget x , Widget y )
            {
                return ( x._values[this.ColumnToCompare]==x._values[this.ColumnToCompare] );
            }
            public int GetHashCode( Widget obj )
            {
                return obj._values[this.ColumnToCompare].GetHashCode();
            }
            #endregion
        }


        internal static Widget CreateSearchInstance( int column , byte value )
        {
            return new Widget( column , value );
        }

    }

    public class WidgetSearcher
    {
        private C5.TreeBag<Widget>[] lookups;

        public WidgetSearcher( IEnumerable<Widget> sourceData )
        {
            this.lookups=InstantiateLookups();
            PopulateLookups( sourceData );

        }

        private TreeBag<Widget>[] InstantiateLookups()
        {
            C5.TreeBag<Widget>[] instance =new C5.TreeBag<Widget>[Widget.ValueCount];

            for ( int i = 0 ; i<instance.Length ; ++i )
            {
                Widget.Comparer widgetComparer = new Widget.Comparer( i );

                instance[i]=new TreeBag<Widget>( widgetComparer , widgetComparer );

            }

            return instance;
        }

        private void PopulateLookups( IEnumerable<Widget> sourceData )
        {
            foreach ( Widget datum in sourceData )
            {
                for ( int i = 0 ; i<Widget.ValueCount ; ++i )
                {
                    lookups[i].Add( datum );
                }
            }
            return;
        }

        public IDirectedCollectionValue<Widget> Search( int column , byte value )
        {
            Widget limit = Widget.CreateSearchInstance( column , value );
            return lookups[column].RangeFrom( limit );
        }

    }

}

Your data is represented by a structure that resembles this:

public abstract class Widget
{
  public string Name   { get ; private set ; }
  public byte[] Values { get ; private set ; }
}

Whether it's an array of bytes, or a bit field that you draw from is, I suspect fairly irrelevant to performance.

One approach would be to build 8 parallel arrays containing references to your widgets with each such array being sorted on a different value column. Lookup then consisted of a binary search for the desired values.

Another approach is to read your source data once, and populate an array of height-balanced binary search trees, whose key is the value of the desired column and whose value is the list of Widgets sharing that particular column value. You'll need one such search tree for each column. Obviously, this approach eats memory to get speed — lookup in a height-balance binary tree being a O(log N) operation. Using trees means that items can be added to and removed from the collection without imposing a lot of overhead.

Whilst you could write your own tree implementation, I'd rather not. Here is an implementation, using the C5 Collections Library (something you should have in your toolbox anyway) since I hate having to invent the wheel:

using System;
using System.Collections.Generic;

using C5;

namespace ConsoleApplication13
{

    class Program
    {

        static void Main( string[] args )
        {
            IEnumerable<Widget>  sourceData  = ReadWidgets();
            WidgetSearcher lookup = new WidgetSearcher( sourceData );

            // find all the Widgets where column 2 is >= 5 ;
            Widget[] results1 = lookup.Search( 2 , 5 ).ToArray();

            // find all the Widgets where column 0 is >= 3 ;
            Widget[] results2 = lookup.Search( 0 , 3 ).ToArray();

            return ;

        }

        private static IEnumerable<Widget> ReadWidgets()
        {
            //TODO: we need source data from somewhere. It gets provided here.
            throw new NotImplementedException();
        }

    }

    public class Widget
    {
        public const int ValueCount = 8;

        public string Name { get; private set; }
        public byte[] Values
        {
            get
            {
                return (byte[])_values.Clone();
            }
        }
        private byte[] _values;

        public Widget( string name , byte[] values )
        {
            if ( name==null )
                throw new ArgumentNullException( "name" );
            if ( name.Trim()=="" )
                throw new ArgumentOutOfRangeException( "name" );
            if ( values==null )
                throw new ArgumentNullException( "values" );
            if ( values.Length!=ValueCount )
                throw new ArgumentOutOfRangeException( "values" );

            this.Name=name;
            this._values=values;
            return;
        }
        /// <summary>
        /// private constructor for search instances
        /// </summary>
        /// <param name="column"></param>
        /// <param name="value"></param>
        private Widget( int column , byte value )
        {
            this.Name=null;
            this._values=new byte[Widget.ValueCount];

            this._values.Initialize();
            this._values[column]=value;

            return;
        }

        public class Comparer : IComparer<Widget> , IEqualityComparer<Widget>
        {
            private int ColumnToCompare;

            public Comparer( int colNum )
            {
                if ( colNum<0||colNum>=Widget.ValueCount )
                    throw new ArgumentOutOfRangeException( "colNum" );
                this.ColumnToCompare=colNum;
            }

            #region IComparer<Widget> Members

            public int Compare( Widget x , Widget y )
            {
                return (int)x._values[this.ColumnToCompare]-(int)y._values[this.ColumnToCompare];
            }

            #endregion

            #region IEqualityComparer<Widget> Members

            public bool Equals( Widget x , Widget y )
            {
                return ( x._values[this.ColumnToCompare]==x._values[this.ColumnToCompare] );
            }
            public int GetHashCode( Widget obj )
            {
                return obj._values[this.ColumnToCompare].GetHashCode();
            }
            #endregion
        }


        internal static Widget CreateSearchInstance( int column , byte value )
        {
            return new Widget( column , value );
        }

    }

    public class WidgetSearcher
    {
        private C5.TreeBag<Widget>[] lookups;

        public WidgetSearcher( IEnumerable<Widget> sourceData )
        {
            this.lookups=InstantiateLookups();
            PopulateLookups( sourceData );

        }

        private TreeBag<Widget>[] InstantiateLookups()
        {
            C5.TreeBag<Widget>[] instance =new C5.TreeBag<Widget>[Widget.ValueCount];

            for ( int i = 0 ; i<instance.Length ; ++i )
            {
                Widget.Comparer widgetComparer = new Widget.Comparer( i );

                instance[i]=new TreeBag<Widget>( widgetComparer , widgetComparer );

            }

            return instance;
        }

        private void PopulateLookups( IEnumerable<Widget> sourceData )
        {
            foreach ( Widget datum in sourceData )
            {
                for ( int i = 0 ; i<Widget.ValueCount ; ++i )
                {
                    lookups[i].Add( datum );
                }
            }
            return;
        }

        public IDirectedCollectionValue<Widget> Search( int column , byte value )
        {
            Widget limit = Widget.CreateSearchInstance( column , value );
            return lookups[column].RangeFrom( limit );
        }

    }

}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文