将数组拆分为大小有限的 CSV 字符串

发布于 2025-01-07 03:14:06 字数 1691 浏览 4 评论 0原文

我正在寻找一种有效的方法将大 int[] 转换为 csv 字符串的 string[] ,其中每个 csv 最多限制为 4000 个字符。数组中的值可以是 1 到 int.MaxValue 之间的任何值。

这是我的最终代码:

public static string[] GetCSVsFromArray(int[] array, int csvLimit)
{
    List<string> parts = new List<string>();
    StringBuilder sb = new StringBuilder();
    foreach(int id in array)
    {
        string intId = id.ToString();
        if (sb.Length + intId.Length < csvLimit)
            sb.Append(intId).Append(",");
        else
        {
            if (sb.Length > 0)
                sb.Length--;
            parts.Add(sb.ToString());
            sb.Length = 0;
        }
    }
    if(sb.Length>0)
       parts.Add(sb.ToString());
    return parts.ToArray();
}

有没有更有效的方法来做到这一点?

所以这就是我现在使用的(我能够将返回参数更改为 List 类型以在最后保存 ToArray() 调用):

public static List<string> GetCSVsFromArray(int[] array, int csvLimit)
{
    List<string> parts = new List<string>();
    StringBuilder sb = new StringBuilder();
    foreach(int id in array)
    {
        string intId = id.ToString();
        if (sb.Length + intId.Length < csvLimit)
            sb.Append(intId).Append(",");
        else
        {
            if (sb.Length > 0)
                sb.Length--;
            parts.Add(sb.ToString());
            sb.Length = 0;
        }
    }
    if(sb.Length>0)
       parts.Add(sb.ToString());
    return parts;
}

性能结果:

10,000,000 个项目 csv 4000 个字符的限制

  • 原始:2,887.488ms
  • GetIntegerDigitCount:3105.355 最终毫秒
  • :2883.587 毫秒

,而删除 ToArray() 只节省了 4 毫秒在我的开发机器上调用这似乎在速度慢得多的机器上产生了显着的差异(在 DELL D620 上节省了 200 多毫秒)

I am looking for an efficient way to convert a large int[] into a string[] of csv strings where each csv is limited to a maximum of 4000 characters. The values in the array could be anything between 1 and int.MaxValue.

Here is my final code:

public static string[] GetCSVsFromArray(int[] array, int csvLimit)
{
    List<string> parts = new List<string>();
    StringBuilder sb = new StringBuilder();
    foreach(int id in array)
    {
        string intId = id.ToString();
        if (sb.Length + intId.Length < csvLimit)
            sb.Append(intId).Append(",");
        else
        {
            if (sb.Length > 0)
                sb.Length--;
            parts.Add(sb.ToString());
            sb.Length = 0;
        }
    }
    if(sb.Length>0)
       parts.Add(sb.ToString());
    return parts.ToArray();
}

Is there a more efficient way to do this?

So here is what I am now using (I was able to change the return parameter to the List type to save the ToArray() call at the end):

public static List<string> GetCSVsFromArray(int[] array, int csvLimit)
{
    List<string> parts = new List<string>();
    StringBuilder sb = new StringBuilder();
    foreach(int id in array)
    {
        string intId = id.ToString();
        if (sb.Length + intId.Length < csvLimit)
            sb.Append(intId).Append(",");
        else
        {
            if (sb.Length > 0)
                sb.Length--;
            parts.Add(sb.ToString());
            sb.Length = 0;
        }
    }
    if(sb.Length>0)
       parts.Add(sb.ToString());
    return parts;
}

Performance results:

10,000,000 items csv Limit of 4000 characters

  • Original: 2,887.488ms
  • GetIntegerDigitCount: 3105.355ms
  • Final: 2883.587ms

Whilst I only saved 4ms removing the ToArray() call on my developer machine this seems to make a significant difference on a much slower machine (saved over 200ms on a DELL D620)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

禾厶谷欠 2025-01-14 03:14:06

当为每个数字创建新字符串只是为了计算位数时,您会进行大量堆内存分配。使用以下方法计算数字中的位数(请参阅下面的方法)。

因此,不要只是

string intId = id.ToString();
if (sb.Length + intId.Length < csvLimit)

使用:

if (sb.Length + this.GetIntegerDigitCount(id) < csvLimit)

结果:

  • 在 1000 万个数字上速度提高 2 倍
  • 旧:4316 毫秒,新:1983 毫秒,差异:2333 毫秒。更快 217.6%

编辑: 大 csv 限制的更多结果

项目:10000000; csv限制:4000;旧:2091ms,新:1868ms,差异:223ms
更快= 111.937901498929%


我用来测量时间的代码:

 double elapsedOld = 0;
 double elapsedNew = 0;
 int count = 10000000;
 int csvLimit = 4000;
 var items = Enumerable.Range(0, count).ToArray();
 var watch = Stopwatch.StartNew();
 this.GetCsVsFromArray(items, csvLimit);
 watch.Stop();
 elapsedOld = watch.ElapsedMilliseconds;

 watch = Stopwatch.StartNew();
 this.GetCsVsFromArrayTuned(items, csvLimit);
 watch.Stop();
 elapsedNew = watch.ElapsedMilliseconds;
 var stat = String.Format(
     "Items:{0}; csvLimit:{1}; Old:{2}ms, New:{3}ms, Diff:{4}ms faster = {5}%",                
     count,
     csvLimit,
     elapsedOld,
     elapsedNew,
     elapsedOld - elapsedNew,
     elapsedOld * 100 / elapsedNew);

GetIntegerDigitCount

public int GetIntegerDigitCount(int valueInt)
{
    double value = valueInt;
    int sign = 0;
    if (value < 0)
    {
        value = -value;
        sign = 1;
    }

    if (value <= 9)
    {
        return sign + 1;
    }

    if (value <= 99)
    {
        return sign + 2;
    }

    if (value <= 999)
    {
        return sign + 3;
    }

    if (value <= 9999)
    {
        return sign + 4;
    }

    if (value <= 99999)
    {
        return sign + 5;
    }

    if (value <= 999999)
    {
        return sign + 6;
    }

    if (value <= 9999999)
    {
        return sign + 7;
    }

    if (value <= 99999999)
    {
        return sign + 8;
    }

    if (value <= 999999999)
    {
        return sign + 9;
    }

    return sign + 10;
}

You are doing a lot of heap memory allocations when creating a new string for each number just to calculate number of digits. Use following method to calculate number of digits in the number (see method below).

So instead of

string intId = id.ToString();
if (sb.Length + intId.Length < csvLimit)

Just use:

if (sb.Length + this.GetIntegerDigitCount(id) < csvLimit)

Results:

  • 2 times faster on 10 million numbers
  • Old: 4316ms, New:1983ms, Diff: 2333ms. Faster 217.6%

EDIT: More results on large csv limit

Items:10000000; csvLimit:4000; Old:2091ms, New:1868ms, Diff:223ms
faster = 111.937901498929%


Code I've used to measure time:

 double elapsedOld = 0;
 double elapsedNew = 0;
 int count = 10000000;
 int csvLimit = 4000;
 var items = Enumerable.Range(0, count).ToArray();
 var watch = Stopwatch.StartNew();
 this.GetCsVsFromArray(items, csvLimit);
 watch.Stop();
 elapsedOld = watch.ElapsedMilliseconds;

 watch = Stopwatch.StartNew();
 this.GetCsVsFromArrayTuned(items, csvLimit);
 watch.Stop();
 elapsedNew = watch.ElapsedMilliseconds;
 var stat = String.Format(
     "Items:{0}; csvLimit:{1}; Old:{2}ms, New:{3}ms, Diff:{4}ms faster = {5}%",                
     count,
     csvLimit,
     elapsedOld,
     elapsedNew,
     elapsedOld - elapsedNew,
     elapsedOld * 100 / elapsedNew);

GetIntegerDigitCount:

public int GetIntegerDigitCount(int valueInt)
{
    double value = valueInt;
    int sign = 0;
    if (value < 0)
    {
        value = -value;
        sign = 1;
    }

    if (value <= 9)
    {
        return sign + 1;
    }

    if (value <= 99)
    {
        return sign + 2;
    }

    if (value <= 999)
    {
        return sign + 3;
    }

    if (value <= 9999)
    {
        return sign + 4;
    }

    if (value <= 99999)
    {
        return sign + 5;
    }

    if (value <= 999999)
    {
        return sign + 6;
    }

    if (value <= 9999999)
    {
        return sign + 7;
    }

    if (value <= 99999999)
    {
        return sign + 8;
    }

    if (value <= 999999999)
    {
        return sign + 9;
    }

    return sign + 10;
}
风铃鹿 2025-01-14 03:14:06

这里的 Linq 可以稍微加快速度。经过一些修改后,您的代码将如下所示:

    public static string[] GetCSVsFromArray(int[] array, int csvLimit)
    {
        List<string> parts = new List<string>();
        StringBuilder sb = new StringBuilder();
        foreach (string intId in array.Select(id => id.ToString()))
        {
            if (sb.Length + intId.Length < csvLimit)
                sb.Append(intId).Append(",");
            else
            {
                if (sb.Length > 0)
                    sb.Length--; parts.Add(sb.ToString()); sb.Length = 0;
            }
        }
        return parts.ToArray();
    }

Linq here can speed up things a bit. Your code will look something like this after few modifications:

    public static string[] GetCSVsFromArray(int[] array, int csvLimit)
    {
        List<string> parts = new List<string>();
        StringBuilder sb = new StringBuilder();
        foreach (string intId in array.Select(id => id.ToString()))
        {
            if (sb.Length + intId.Length < csvLimit)
                sb.Append(intId).Append(",");
            else
            {
                if (sb.Length > 0)
                    sb.Length--; parts.Add(sb.ToString()); sb.Length = 0;
            }
        }
        return parts.ToArray();
    }
对风讲故事 2025-01-14 03:14:06
using System.Linq;    

public static string[] GetCSVsFromArray(int[] array, int limit)
{
    int i = 0;
    return array.Select(a => a.ToString())
                .GroupBy(a => { i += a.Length; return (i - a.Length) / limit; })
                .Select(a => string.Join(",",a))
                .ToArray();
}
using System.Linq;    

public static string[] GetCSVsFromArray(int[] array, int limit)
{
    int i = 0;
    return array.Select(a => a.ToString())
                .GroupBy(a => { i += a.Length; return (i - a.Length) / limit; })
                .Select(a => string.Join(",",a))
                .ToArray();
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文