解析这个字符串最快的方法是什么

发布于 2024-08-07 23:15:14 字数 213 浏览 2 评论 0原文

我有一个字符串,格式如下:

[Season] [Year] [Vendor] [Geography]

所以一个例子可能是:Spring 2009 Nielsen MSA

我需要能够解析出 Season并以最快的方式年份。我不在乎漂亮或聪明。只是原始速度。语言是使用 VS2008 的 C#,但程序集是为 .NET 2.0 构建的

I have a string, that is in the following format:

[Season] [Year] [Vendor] [Geography]

so an example might be: Spring 2009 Nielsen MSA

I need to be able to parse out Season and Year in the fastest way possible. I don't care about prettiness or cleverness. Just raw speed. The language is C# using VS2008, but the assembly is being built for .NET 2.0

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

幸福丶如此 2024-08-14 23:15:15
string input = "Spring 2009 Nielsen MSA";

int seasonIndex = input.IndexOf(' ') + 1;

string season = input.SubString(0, seasonIndex - 2);
string year = input.SubString(seasonIndex, input.IndexOf(' ', seasonIndex) - seasonIndex);
string input = "Spring 2009 Nielsen MSA";

int seasonIndex = input.IndexOf(' ') + 1;

string season = input.SubString(0, seasonIndex - 2);
string year = input.SubString(seasonIndex, input.IndexOf(' ', seasonIndex) - seasonIndex);
挽清梦 2024-08-14 23:15:15
string[] split = stringName.Split(' ');
split[0]+" "+split[1];
string[] split = stringName.Split(' ');
split[0]+" "+split[1];
红ご颜醉 2024-08-14 23:15:15

类解析器:

public class Parser : StringReader {

    public Parser(string s) : base(s) {
    }

    public string NextWord() {
        while ((Peek() >= 0) && (char.IsWhiteSpace((char) Peek())))
            Read();
        StringBuilder sb = new StringBuilder();
        do {
            int next = Read();
            if (next < 0)
                break;
            char nextChar = (char) next;
            if (char.IsWhiteSpace(nextChar))
                break;
            sb.Append(nextChar);
        } while (true);
        return sb.ToString();
    }
}

使用:

    string str = "Spring 2009 Nielsen MSA";
    Parser parser = new Parser(str);
    string season = parser.NextWord();
    string year = parser.NextWord();
    string vendor = parser.NextWord();
    string geography = parser.NextWord();

Class Parser:

public class Parser : StringReader {

    public Parser(string s) : base(s) {
    }

    public string NextWord() {
        while ((Peek() >= 0) && (char.IsWhiteSpace((char) Peek())))
            Read();
        StringBuilder sb = new StringBuilder();
        do {
            int next = Read();
            if (next < 0)
                break;
            char nextChar = (char) next;
            if (char.IsWhiteSpace(nextChar))
                break;
            sb.Append(nextChar);
        } while (true);
        return sb.ToString();
    }
}

Use:

    string str = "Spring 2009 Nielsen MSA";
    Parser parser = new Parser(str);
    string season = parser.NextWord();
    string year = parser.NextWord();
    string vendor = parser.NextWord();
    string geography = parser.NextWord();
夏尔 2024-08-14 23:15:15

我接受了 Spidey 的建议,它应该具有足够好的性能,但具有简单、易于遵循、易于维护的代码。

但如果你真的需要提升性能。信封(C# 是唯一可用的工具),那么可能需要几个串联的循环来搜索空格,然后使用 substr 拉出字符串,这会稍微超过它。

您可以使用 IndexOf 而不是循环执行相同的操作,但滚动您自己的可能会稍微快一些(但您必须对此进行分析)。

I'd got with Spidey's suggestion, which should be decent enough performance, but with simple, easy to follow, easy to maintain code.

But if you really need to push the perf. envelope (and C# is the only tool available) then probably a couple of loops in series that search for the spaces, then pull the strings out using substr would marginally outdo it.

You could do the same with IndexOf instead of the loops, but rolling your own may be slightly faster (but you'd have to profile that).

远山浅 2024-08-14 23:15:14

如果你只需要季节和年份,那么:

int firstSpace = text.IndexOf(' ');
string season = text.Substring(0, firstSpace);
int secondSpace = text.IndexOf(' ', firstSpace + 1);
int year = int.Parse(text.Substring(firstSpace + 1, 
                                    secondSpace - firstSpace - 1));

如果你可以假设年份总是四位数,这甚至更快:

int firstSpace = text.IndexOf(' ');
string season = text.Substring(0, firstSpace);
int year = int.Parse(text.Substring(firstSpace + 1, 4));

如果另外你知道所有年份都在 21 世纪,它可能会变得愚蠢的最佳:

int firstSpace = text.IndexOf(' ');
string season = text.Substring(0, firstSpace);
int year = 2000 + 10 * (text[firstSpace + 3] - '0') 
                + text[firstSpace + 4] - '0';

这变得更糟可读,但可能更快(取决于 JIT 的作用):就

int firstSpace = text.IndexOf(' ');
string season = text.Substring(0, firstSpace);
int year = 1472 + 10 * text[firstSpace + 3] + text[firstSpace + 4];

我个人而言,我认为这至少迈得太远了:)

编辑:好的,把这个推向极端......你只是有几个季节,对吗?假设它们是“Spring”、“Summer”、“Fall”、“Winter”,那么您可以这样做:

string season;
int yearStart;
if (text[0] == 'S')
{
    season = text[1] == 'p' ? "Spring" : "Summer";
    yearStart = 7;
}
else if (text[0] == 'F')
{
    season = "Fall";
    yearStart = 5;
}
else
{
    season = "Winter";
    yearStart = 7;
}

int year = 1472 + 10 * text[yearStart + 2] + text[yearStart + 3];

这样做的优点是它将重用相同的字符串对象。当然,它假设数据永远不会有任何问题......

如蜘蛛侠的答案所示使用 Split 肯定比任何一个都简单,但我怀疑它'会稍微慢一点。说实话,我至少会首先尝试...您是否测量过最简单的代码并发现它太慢了?差异可能非常微小 - 当然与您首先读取数据的任何网络或磁盘访问相比。

If you only need the season and year, then:

int firstSpace = text.IndexOf(' ');
string season = text.Substring(0, firstSpace);
int secondSpace = text.IndexOf(' ', firstSpace + 1);
int year = int.Parse(text.Substring(firstSpace + 1, 
                                    secondSpace - firstSpace - 1));

If you can assume the year is always four digits, this is even faster:

int firstSpace = text.IndexOf(' ');
string season = text.Substring(0, firstSpace);
int year = int.Parse(text.Substring(firstSpace + 1, 4));

If additionally you know that all years are in the 21st century, it can get stupidly optimal:

int firstSpace = text.IndexOf(' ');
string season = text.Substring(0, firstSpace);
int year = 2000 + 10 * (text[firstSpace + 3] - '0') 
                + text[firstSpace + 4] - '0';

which becomes even less readable but possibly faster (depending on what the JIT does) as:

int firstSpace = text.IndexOf(' ');
string season = text.Substring(0, firstSpace);
int year = 1472 + 10 * text[firstSpace + 3] + text[firstSpace + 4];

Personally I think that's at least one step too far though :)

EDIT: Okay, taking this to extremes... you're only going to have a few seasons, right? Suppose they're "Spring", "Summer", "Fall", "Winter" then you can do:

string season;
int yearStart;
if (text[0] == 'S')
{
    season = text[1] == 'p' ? "Spring" : "Summer";
    yearStart = 7;
}
else if (text[0] == 'F')
{
    season = "Fall";
    yearStart = 5;
}
else
{
    season = "Winter";
    yearStart = 7;
}

int year = 1472 + 10 * text[yearStart + 2] + text[yearStart + 3];

This has the advantage that it will reuse the same string objects. Of course, it assumes that there's never anything wrong with the data...

Using Split as shown in Spidey's answer is certainly simpler than any of this, but I suspect it'll be slightly slower. To be honest, I'd at least try that first... have you measured the simplest code and found that it's too slow? The difference is likely to be very slight - certainly compared with whatever network or disk access you've got reading in the data in the first place.

靖瑶 2024-08-14 23:15:14

要添加到其他答案,如果您希望它们采用这种格式:

Spring xxxx
Summer xxxx
Autumn xxxx
Winter xxxx

那么更快的方法是:

string season = text.Substring(0, 6);
int year = int.Parse(text.Substring(7, 4);

不过,那相当令人讨厌。 :)

我不会甚至考虑这样的编码。

To add to the other answers, if you are expecting them to be in this format:

Spring xxxx
Summer xxxx
Autumn xxxx
Winter xxxx

then an even faster way would be:

string season = text.Substring(0, 6);
int year = int.Parse(text.Substring(7, 4);

That is rather nasty, though. :)

I wouldn't even consider coding like this.

尛丟丟 2024-08-14 23:15:14

试试这个。

        string str = "Spring 2009 Nielsen MSA";
        string[] words = str.Split(' ');
        str = words[0] + " " + words[1];

Try this.

        string str = "Spring 2009 Nielsen MSA";
        string[] words = str.Split(' ');
        str = words[0] + " " + words[1];
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文