拆分 ICollection带有分隔符序列

发布于 2024-11-16 11:11:58 字数 455 浏览 5 评论 0原文

这是针对 C# 3.5 的,

我有 ICollection,我试图将其拆分为单独的 ICollection,其中分隔符是一个序列。

例如

ICollection<byte> input = new byte[] { 234, 12, 12, 23, 11, 32, 23, 11 123, 32 };
ICollection<byte> delimiter = new byte[] {23, 11};
List<IICollection<byte>> result = input.splitBy(delimiter);

会导致

result.item(0) = {234, 12, 12};
result.item(1) = {32};
result.item(2) = {123, 32};

This is for C# 3.5

I have ICollection that I'm trying to split into separate ICollections where the delimiter is a sequence.

For example

ICollection<byte> input = new byte[] { 234, 12, 12, 23, 11, 32, 23, 11 123, 32 };
ICollection<byte> delimiter = new byte[] {23, 11};
List<IICollection<byte>> result = input.splitBy(delimiter);

would result in

result.item(0) = {234, 12, 12};
result.item(1) = {32};
result.item(2) = {123, 32};

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

当梦初醒 2024-11-23 11:11:58
private static IEnumerable<IEnumerable<T>> Split<T>
    (IEnumerable<T> source, ICollection<T> delimiter)
{
    // window represents the last [delimeter length] elements in the sequence,
    // buffer is the elements waiting to be output when delimiter is hit

    var window = new Queue<T>();
    var buffer = new List<T>();

    foreach (T element in source)
    {
        buffer.Add(element);
        window.Enqueue(element);
        if (window.Count > delimiter.Count)
            window.Dequeue();

        if (window.SequenceEqual(delimiter))
        {
            // number of non-delimiter elements in the buffer
            int nElements = buffer.Count - window.Count;
            if (nElements > 0)
                yield return buffer.Take(nElements).ToArray();

            window.Clear();
            buffer.Clear();
        }
    }

    if (buffer.Any())
        yield return buffer;
}
private static IEnumerable<IEnumerable<T>> Split<T>
    (IEnumerable<T> source, ICollection<T> delimiter)
{
    // window represents the last [delimeter length] elements in the sequence,
    // buffer is the elements waiting to be output when delimiter is hit

    var window = new Queue<T>();
    var buffer = new List<T>();

    foreach (T element in source)
    {
        buffer.Add(element);
        window.Enqueue(element);
        if (window.Count > delimiter.Count)
            window.Dequeue();

        if (window.SequenceEqual(delimiter))
        {
            // number of non-delimiter elements in the buffer
            int nElements = buffer.Count - window.Count;
            if (nElements > 0)
                yield return buffer.Take(nElements).ToArray();

            window.Clear();
            buffer.Clear();
        }
    }

    if (buffer.Any())
        yield return buffer;
}
猫七 2024-11-23 11:11:58

最佳解决方案不是使用 SequenceEqual() 来检查每个子范围,否则您可能会迭代序列中每个项目的分隔符长度,这可能会损害性能,尤其是对于大型分隔符序列。可以在源序列被枚举时对其进行检查。

这是我要写的,但总有改进的空间。我的目标是具有与 String.Split() 类似的语义。

public enum SequenceSplitOptions { None, RemoveEmptyEntries }
public static IEnumerable<IList<T>> SequenceSplit<T>(
    this IEnumerable<T> source,
    IEnumerable<T> separator)
{
    return SequenceSplit(source, separator, SequenceSplitOptions.None);
}
public static IEnumerable<IList<T>> SequenceSplit<T>(
    this IEnumerable<T> source,
    IEnumerable<T> separator,
    SequenceSplitOptions options)
{
    if (source == null)
        throw new ArgumentNullException("source");
    if (options != SequenceSplitOptions.None
     && options != SequenceSplitOptions.RemoveEmptyEntries)
        throw new ArgumentException("Illegal option: " + (int)option);
    if (separator == null)
    {
        yield return source.ToList();
        yield break;
    }

    var sep = separator as IList<T> ?? separator.ToList();
    if (sep.Count == 0)
    {
        yield return source.ToList();
        yield break;
    }

    var buffer = new List<T>();
    var candidate = new List<T>(sep.Count);
    var sindex = 0;
    foreach (var item in source)
    {
        candidate.Add(item);
        if (!item.Equals(sep[sindex]))
        {   // item is not part of the delimiter
            buffer.AddRange(candidate);
            candidate.Clear();
            sindex = 0;
        }
        else if (++sindex >= sep.Count)
        {   // candidate is the delimiter
            if (options == SequenceSplitOptions.None || buffer.Count > 0)
                yield return buffer.ToList();
            buffer.Clear();
            candidate.Clear();
            sindex = 0;
        }
    }
    if (candidate.Count > 0)
        buffer.AddRange(candidate);
    if (options == SequenceSplitOptions.None || buffer.Count > 0)
        yield return buffer;
}

An optimal solution would not be using SequenceEqual() to check each subrange, otherwise you could potentially be iterating the length of the delimiter for every item in the sequence which could hurt performance, especially for large delimiter sequences. It could be checked as the source sequence is enumerated instead.

Here's what I'd write but there's always room for improvement. I aimed to have similar semantics to String.Split().

public enum SequenceSplitOptions { None, RemoveEmptyEntries }
public static IEnumerable<IList<T>> SequenceSplit<T>(
    this IEnumerable<T> source,
    IEnumerable<T> separator)
{
    return SequenceSplit(source, separator, SequenceSplitOptions.None);
}
public static IEnumerable<IList<T>> SequenceSplit<T>(
    this IEnumerable<T> source,
    IEnumerable<T> separator,
    SequenceSplitOptions options)
{
    if (source == null)
        throw new ArgumentNullException("source");
    if (options != SequenceSplitOptions.None
     && options != SequenceSplitOptions.RemoveEmptyEntries)
        throw new ArgumentException("Illegal option: " + (int)option);
    if (separator == null)
    {
        yield return source.ToList();
        yield break;
    }

    var sep = separator as IList<T> ?? separator.ToList();
    if (sep.Count == 0)
    {
        yield return source.ToList();
        yield break;
    }

    var buffer = new List<T>();
    var candidate = new List<T>(sep.Count);
    var sindex = 0;
    foreach (var item in source)
    {
        candidate.Add(item);
        if (!item.Equals(sep[sindex]))
        {   // item is not part of the delimiter
            buffer.AddRange(candidate);
            candidate.Clear();
            sindex = 0;
        }
        else if (++sindex >= sep.Count)
        {   // candidate is the delimiter
            if (options == SequenceSplitOptions.None || buffer.Count > 0)
                yield return buffer.ToList();
            buffer.Clear();
            candidate.Clear();
            sindex = 0;
        }
    }
    if (candidate.Count > 0)
        buffer.AddRange(candidate);
    if (options == SequenceSplitOptions.None || buffer.Count > 0)
        yield return buffer;
}
听你说爱我 2024-11-23 11:11:58
public IEnumerable<IEnumerable<T>> SplitByCollection<T>(IEnumerable<T> source, 
                                                        IEnumerable<T> delimiter)
{
    var sourceArray = source.ToArray();
    var delimiterCount = delimiter.Count();

    int lastIndex = 0;

    for (int i = 0; i < sourceArray.Length; i++)
    {
        if (delimiter.SequenceEqual(sourceArray.Skip(i).Take(delimiterCount)))
        {
            yield return sourceArray.Skip(lastIndex).Take(i - lastIndex);

            i += delimiterCount;
            lastIndex = i;
        }
    }

    if (lastIndex < sourceArray.Length)
        yield return sourceArray.Skip(lastIndex);
}

调用它...

var result = SplitByCollection(input, delimiter);

foreach (var element in result)
{
    Console.WriteLine (string.Join(", ", element));
}

返回

234, 12, 12
32
123, 32
public IEnumerable<IEnumerable<T>> SplitByCollection<T>(IEnumerable<T> source, 
                                                        IEnumerable<T> delimiter)
{
    var sourceArray = source.ToArray();
    var delimiterCount = delimiter.Count();

    int lastIndex = 0;

    for (int i = 0; i < sourceArray.Length; i++)
    {
        if (delimiter.SequenceEqual(sourceArray.Skip(i).Take(delimiterCount)))
        {
            yield return sourceArray.Skip(lastIndex).Take(i - lastIndex);

            i += delimiterCount;
            lastIndex = i;
        }
    }

    if (lastIndex < sourceArray.Length)
        yield return sourceArray.Skip(lastIndex);
}

Calling it ...

var result = SplitByCollection(input, delimiter);

foreach (var element in result)
{
    Console.WriteLine (string.Join(", ", element));
}

returns

234, 12, 12
32
123, 32
我为君王 2024-11-23 11:11:58

这是我的看法:

public static IEnumerable<IList<byte>> Split(IEnumerable<byte> input, IEnumerable<byte> delimiter)
{
    var l = new List<byte>();
    var set = new HashSet<byte>(delimiter);
    foreach (var item in input)
    {
        if(!set.Contains(item))
            l.Add(item);
        else if(l.Count > 0)
        {
            yield return l;
            l = new List<byte>();
        }
    }
    if(l.Count > 0)
        yield return l;
}

Here is my take on it:

public static IEnumerable<IList<byte>> Split(IEnumerable<byte> input, IEnumerable<byte> delimiter)
{
    var l = new List<byte>();
    var set = new HashSet<byte>(delimiter);
    foreach (var item in input)
    {
        if(!set.Contains(item))
            l.Add(item);
        else if(l.Count > 0)
        {
            yield return l;
            l = new List<byte>();
        }
    }
    if(l.Count > 0)
        yield return l;
}
笑咖 2024-11-23 11:11:58

可能有更好的方法,但这是我之前使用过的一种方法:对于相对较小的集合来说它很好:

byte startDelimit = 23;
byte endDelimit = 11;
List<ICollection<byte>> result = new List<ICollection<byte>>();
int lastMatchingPosition = 0;
var inputAsList = input.ToList();

for(int i = 0; i <= inputAsList.Count; i++)
{
    if(inputAsList[i] == startDelimit && inputAsList[i + 1] == endDelimit)
    {
        ICollection<byte> temp = new ICollection<byte>();
        for(int j = lastInputPosition; j <= i ; j++)
        {
            temp.Add(inputAsList[j]);
        }
        result.Add(temp);
        lastMatchingPosition = i + 2;
    }
}

我目前没有打开 IDE,因此我无法按原样编译,或者可能会有一些漏洞需要插头。但这是我遇到这个问题时开始的地方。再次,正如我之前所说,如果这是针对大型集合,它会很慢 - 因此可能还存在更好的解决方案。

There are probably better methods, but here's one I've used before: it's fine for relatively small collections:

byte startDelimit = 23;
byte endDelimit = 11;
List<ICollection<byte>> result = new List<ICollection<byte>>();
int lastMatchingPosition = 0;
var inputAsList = input.ToList();

for(int i = 0; i <= inputAsList.Count; i++)
{
    if(inputAsList[i] == startDelimit && inputAsList[i + 1] == endDelimit)
    {
        ICollection<byte> temp = new ICollection<byte>();
        for(int j = lastInputPosition; j <= i ; j++)
        {
            temp.Add(inputAsList[j]);
        }
        result.Add(temp);
        lastMatchingPosition = i + 2;
    }
}

I don't have my IDE open at the moment, so that my not compile as-is, or may have some holes you'll need to plug. But it's where I start when I run into this problem. Again, as I said before, if this is for large collections, it'll be slow- so better solutions may yet exist.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文