将包含ascii字符串的byte[]快速转换为int/double/date等，无需new String

发布于 2025-01-03 17:36:15 字数 473 浏览 5 评论 0原文

我得到 FIX 消息字符串 (ASCII) 作为 ByteBuffer。我解析标签值对并将值存储为树形图中的原始对象，并以标签作为键。所以我需要根据其类型将 byte[] 值转换为 int/double/date 等。

最简单的方法是创建新的字符串并将其传递给标准转换器函数。例如，

int convertToInt(byte[] buffer, int offset, int length)
{
  String valueStr = new String(buffer, offset, length);
  return Integer.parseInt(valueStr);
}

我知道在 Java 中，创建新对象非常便宜，仍然有任何方法可以直接将此 ascii byte[] 转换为原始类型。我尝试了手写函数来执行此操作，但发现它非常耗时并且没有带来更好的性能。

是否有任何第三方库可以这样做，最重要的是它值得这样做吗？

原文

I get FIX message string (ASCII) as ByteBuffer. I parse tag value pairs and store values as primitive objects in the treemap with tag as key. So I need to convert byte[] value to int/double/date, etc. depending on its type.

Simplest way is to create new String and pass it to standard converter functions. e.g.

int convertToInt(byte[] buffer, int offset, int length)
{
  String valueStr = new String(buffer, offset, length);
  return Integer.parseInt(valueStr);
}

I understand that in Java, creating new object is very inexpensive, still is there any way to convert this ascii byte[] to primitive type directly. I tried hand written functions to do so, but found it to be time consuming and didn't result in better performance.

Are there any third party libraries for doing so and most of all is it worth doing?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

朦胧时间 2025-01-10 17:36:15

最重要的是它值得做吗？

几乎肯定不是 - 在采取重大措施缓解它之前，您应该采取措施检查这是否是一个性能瓶颈。

你现在的表现怎么样？它需要是什么？（“尽可能快”不是一个好的目标，否则你永远不会停止 - 弄清楚什么时候你可以说你“完成”了。）

分析代码 - 真正的问题是在字符串创建中？检查垃圾收集等的频率（再次使用分析器）。

每种解析类型可能具有不同的特征。例如，对于解析整数，如果您发现在很长一段时间内您得到的是单个数字，您可能想要特殊情况：

if (length == 1)
{
    char c = buffer[index];
    if (c >= '0' && c <= '9')
    {
        return c - '0';
    }
    // Invalid - throw an exception or whatever
}

...但是检查这种情况的频率发生在你走那条路之前。对实际上从未出现过的特定优化进行大量检查会适得其反。

most of all is it worth doing?

Almost certainly not - and you should measure to check that this is a performance bottleneck before going to significant effort to alleviate it.

What's your performance like now? What does it need to be? ("As fast as possible" isn't a good goal, or you'll never stop - work out when you can say you're "done".)

Profile the code - is the problem really in string creation? Check how often you're garbage-collecting etc (again, with a profiler).

Each parsing type is likely to have different characteristics. For example, for parsing integers, if you find that for a significant amount of the time you've got a single digit, you might want to special-case that:

if (length == 1)
{
    char c = buffer[index];
    if (c >= '0' && c <= '9')
    {
        return c - '0';
    }
    // Invalid - throw an exception or whatever
}

... but check how often this occurs before you go down that path. Applying lots of checks for particular optimizations that never actually crop up is counter-productive.

回复收藏 0 原文

染柒℉ 2025-01-10 17:36:15

同意 Jon 的观点，但是在处理许多 FIX 消息时，这种情况很快就会增加。
下面的方法将允许使用空格填充数字。如果您需要处理小数，那么代码会略有不同。两种方法之间的速度差异为 11 倍。ConvertToLong 导致 0 次 GC。下面的代码是c#中的：

///<summary>
///Converts a byte[] of characters that represent a number into a .net long type. Numbers can be padded from left
/// with spaces.
///</summary>
///<param name="buffer">The buffer containing the number as characters</param>
///<param name="startIndex">The startIndex of the number component</param>
///<param name="endIndex">The EndIndex of the number component</param>
///<returns>The price will be returned as a long from the ASCII characters</returns>
public static long ConvertToLong(this byte[] buffer, int startIndex, int endIndex)
{
    long result = 0;
    for (int i = startIndex; i <= endIndex; i++)
    {
        if (buffer[i] != 0x20)
        {
            // 48 is the decimal value of the '0' character. So to convert the char value
            // of an int to a number we subtract 48. e.g '1' = 49 -48 = 1
            result = result * 10 + (buffer[i] - 48);
        }
    }
    return result;
}

/// <summary>
/// Same as above but converting to string then to long
/// </summary>
public static long ConvertToLong2(this byte[] buffer, int startIndex, int endIndex)
{
    for (int i = startIndex; i <= endIndex; i++)
    {
        if (buffer[i] != SpaceChar)
        {
            return long.Parse(System.Text.Encoding.UTF8.GetString(buffer, i, (endIndex - i) + 1));
        }
    }
    return 0;
}

[Test]
public void TestPerformance(){
    const int iterations = 200 * 1000;
    const int testRuns = 10;
    const int warmUp = 10000;
    const string number = "    123400";
    byte[] buffer = System.Text.Encoding.UTF8.GetBytes(number);

    double result = 0;
    for (int i = 0; i < warmUp; i++){
        result = buffer.ConvertToLong(0, buffer.Length - 1);
    }
    for (int testRun = 0; testRun < testRuns; testRun++){
        Stopwatch sw = new Stopwatch();
        sw.Start();
        for (int i = 0; i < iterations; i++){
            result = buffer.ConvertToLong(0, buffer.Length - 1);
        }
        sw.Stop();
        Console.WriteLine("Test {4}: {0} ticks, {1}ms, 1 conversion takes = {2}μs or {3}ns. GCs: {5}", sw.ElapsedTicks,
            sw.ElapsedMilliseconds, (((decimal) sw.ElapsedMilliseconds)/((decimal) iterations))*1000,
            (((decimal) sw.ElapsedMilliseconds)/((decimal) iterations))*1000*1000, testRun,
            GC.CollectionCount(0) + GC.CollectionCount(1) + GC.CollectionCount(2));
    }
}
RESULTS
ConvertToLong:
Test 0: 9243 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
Test 1: 8339 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
Test 2: 8425 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
Test 3: 8333 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
Test 4: 8332 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
Test 5: 8331 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
Test 6: 8409 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
Test 7: 8334 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
Test 8: 8335 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
Test 9: 8331 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
ConvertToLong2:
Test 0: 109067 ticks, 55ms, 1 conversion takes = 0.275000μs or 275.000000ns. GCs: 4
Test 1: 109861 ticks, 56ms, 1 conversion takes = 0.28000μs or 280.00000ns. GCs: 8
Test 2: 102888 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 9
Test 3: 105164 ticks, 53ms, 1 conversion takes = 0.265000μs or 265.000000ns. GCs: 10
Test 4: 104083 ticks, 53ms, 1 conversion takes = 0.265000μs or 265.000000ns. GCs: 11
Test 5: 102756 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 13
Test 6: 102219 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 14
Test 7: 102086 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 15
Test 8: 102672 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 17
Test 9: 102025 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 18

Agree with Jon however when processing many FIX messages this quickly adds up.
The method below will allow for space padded numbers. If you need to handle decimals then code would be slightly different. Speed difference between the two methods is a factor of 11. The ConvertToLong results in 0 GCs. Code below is in c#:

///<summary>
///Converts a byte[] of characters that represent a number into a .net long type. Numbers can be padded from left
/// with spaces.
///</summary>
///<param name="buffer">The buffer containing the number as characters</param>
///<param name="startIndex">The startIndex of the number component</param>
///<param name="endIndex">The EndIndex of the number component</param>
///<returns>The price will be returned as a long from the ASCII characters</returns>
public static long ConvertToLong(this byte[] buffer, int startIndex, int endIndex)
{
    long result = 0;
    for (int i = startIndex; i <= endIndex; i++)
    {
        if (buffer[i] != 0x20)
        {
            // 48 is the decimal value of the '0' character. So to convert the char value
            // of an int to a number we subtract 48. e.g '1' = 49 -48 = 1
            result = result * 10 + (buffer[i] - 48);
        }
    }
    return result;
}

/// <summary>
/// Same as above but converting to string then to long
/// </summary>
public static long ConvertToLong2(this byte[] buffer, int startIndex, int endIndex)
{
    for (int i = startIndex; i <= endIndex; i++)
    {
        if (buffer[i] != SpaceChar)
        {
            return long.Parse(System.Text.Encoding.UTF8.GetString(buffer, i, (endIndex - i) + 1));
        }
    }
    return 0;
}

[Test]
public void TestPerformance(){
    const int iterations = 200 * 1000;
    const int testRuns = 10;
    const int warmUp = 10000;
    const string number = "    123400";
    byte[] buffer = System.Text.Encoding.UTF8.GetBytes(number);

    double result = 0;
    for (int i = 0; i < warmUp; i++){
        result = buffer.ConvertToLong(0, buffer.Length - 1);
    }
    for (int testRun = 0; testRun < testRuns; testRun++){
        Stopwatch sw = new Stopwatch();
        sw.Start();
        for (int i = 0; i < iterations; i++){
            result = buffer.ConvertToLong(0, buffer.Length - 1);
        }
        sw.Stop();
        Console.WriteLine("Test {4}: {0} ticks, {1}ms, 1 conversion takes = {2}μs or {3}ns. GCs: {5}", sw.ElapsedTicks,
            sw.ElapsedMilliseconds, (((decimal) sw.ElapsedMilliseconds)/((decimal) iterations))*1000,
            (((decimal) sw.ElapsedMilliseconds)/((decimal) iterations))*1000*1000, testRun,
            GC.CollectionCount(0) + GC.CollectionCount(1) + GC.CollectionCount(2));
    }
}
RESULTS
ConvertToLong:
Test 0: 9243 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
Test 1: 8339 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
Test 2: 8425 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
Test 3: 8333 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
Test 4: 8332 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
Test 5: 8331 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
Test 6: 8409 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
Test 7: 8334 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
Test 8: 8335 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
Test 9: 8331 ticks, 4ms, 1 conversion takes = 0.02000μs or 20.00000ns. GCs: 2
ConvertToLong2:
Test 0: 109067 ticks, 55ms, 1 conversion takes = 0.275000μs or 275.000000ns. GCs: 4
Test 1: 109861 ticks, 56ms, 1 conversion takes = 0.28000μs or 280.00000ns. GCs: 8
Test 2: 102888 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 9
Test 3: 105164 ticks, 53ms, 1 conversion takes = 0.265000μs or 265.000000ns. GCs: 10
Test 4: 104083 ticks, 53ms, 1 conversion takes = 0.265000μs or 265.000000ns. GCs: 11
Test 5: 102756 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 13
Test 6: 102219 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 14
Test 7: 102086 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 15
Test 8: 102672 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 17
Test 9: 102025 ticks, 52ms, 1 conversion takes = 0.26000μs or 260.00000ns. GCs: 18

回复收藏 0 原文

享受孤独 2025-01-10 17:36:15

看一下 ByteBuffer。它具有执行此操作的功能，包括处理字节顺序（字节顺序）。

回复收藏 0 原文

树深时见影 2025-01-10 17:36:15

一般来说，我不喜欢粘贴这样的代码，但无论如何，100 行是如何完成的（生产代码）
我不建议使用它，但有一些参考代码很好（通常）

package t1;

import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;

public class IntParser {
    final static byte[] digits = {
        '0' , '1' , '2' , '3' , '4' , '5' ,
        '6' , '7' , '8' , '9' , 'a' , 'b' ,
        'c' , 'd' , 'e' , 'f' , 'g' , 'h' ,
        'i' , 'j' , 'k' , 'l' , 'm' , 'n' ,
        'o' , 'p' , 'q' , 'r' , 's' , 't' ,
        'u' , 'v' , 'w' , 'x' , 'y' , 'z'
    };

    static boolean isDigit(byte b) {
    return b>='0' &&  b<='9';
  }

    static int digit(byte b){
        //negative = error

        int result  = b-'0';
        if (result>9)
            result = -1;
        return result;
    }

    static NumberFormatException forInputString(ByteBuffer b){
        byte[] bytes=new byte[b.remaining()];
        b.get(bytes);
        try {
            return new NumberFormatException("bad integer: "+new String(bytes, "8859_1"));
        } catch (UnsupportedEncodingException e) {
            throw new RuntimeException(e);
        }
    }
    public static int parseInt(ByteBuffer b){
        return parseInt(b, 10, b.position(), b.limit());
    }
    public static int parseInt(ByteBuffer b, int radix, int i, int max) throws NumberFormatException{
        int result = 0;
        boolean negative = false;


        int limit;
        int multmin;
        int digit;      

        if (max > i) {
            if (b.get(i) == '-') {
                negative = true;
                limit = Integer.MIN_VALUE;
                i++;
            } else {
                limit = -Integer.MAX_VALUE;
            }
            multmin = limit / radix;
            if (i < max) {
                digit = digit(b.get(i++));
                if (digit < 0) {
                    throw forInputString(b);
                } else {
                    result = -digit;
                }
            }
            while (i < max) {
                // Accumulating negatively avoids surprises near MAX_VALUE
                digit = digit(b.get(i++));
                if (digit < 0) {
                    throw forInputString(b);
                }
                if (result < multmin) {
                    throw forInputString(b);
                }
                result *= radix;
                if (result < limit + digit) {
                    throw forInputString(b);
                }
                result -= digit;
            }
        } else {
            throw forInputString(b);
        }
        if (negative) {
            if (i > b.position()+1) {
                return result;
            } else {    /* Only got "-" */
                throw forInputString(b);
            }
        } else {
            return -result;
        }
    }

}

Generally I have no preference to paste such code but anyways, 100 lines how it's done (production code)
I'd not advise using it but having some reference code it's nice (usually)

package t1;

import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;

public class IntParser {
    final static byte[] digits = {
        '0' , '1' , '2' , '3' , '4' , '5' ,
        '6' , '7' , '8' , '9' , 'a' , 'b' ,
        'c' , 'd' , 'e' , 'f' , 'g' , 'h' ,
        'i' , 'j' , 'k' , 'l' , 'm' , 'n' ,
        'o' , 'p' , 'q' , 'r' , 's' , 't' ,
        'u' , 'v' , 'w' , 'x' , 'y' , 'z'
    };

    static boolean isDigit(byte b) {
    return b>='0' &&  b<='9';
  }

    static int digit(byte b){
        //negative = error

        int result  = b-'0';
        if (result>9)
            result = -1;
        return result;
    }

    static NumberFormatException forInputString(ByteBuffer b){
        byte[] bytes=new byte[b.remaining()];
        b.get(bytes);
        try {
            return new NumberFormatException("bad integer: "+new String(bytes, "8859_1"));
        } catch (UnsupportedEncodingException e) {
            throw new RuntimeException(e);
        }
    }
    public static int parseInt(ByteBuffer b){
        return parseInt(b, 10, b.position(), b.limit());
    }
    public static int parseInt(ByteBuffer b, int radix, int i, int max) throws NumberFormatException{
        int result = 0;
        boolean negative = false;


        int limit;
        int multmin;
        int digit;      

        if (max > i) {
            if (b.get(i) == '-') {
                negative = true;
                limit = Integer.MIN_VALUE;
                i++;
            } else {
                limit = -Integer.MAX_VALUE;
            }
            multmin = limit / radix;
            if (i < max) {
                digit = digit(b.get(i++));
                if (digit < 0) {
                    throw forInputString(b);
                } else {
                    result = -digit;
                }
            }
            while (i < max) {
                // Accumulating negatively avoids surprises near MAX_VALUE
                digit = digit(b.get(i++));
                if (digit < 0) {
                    throw forInputString(b);
                }
                if (result < multmin) {
                    throw forInputString(b);
                }
                result *= radix;
                if (result < limit + digit) {
                    throw forInputString(b);
                }
                result -= digit;
            }
        } else {
            throw forInputString(b);
        }
        if (negative) {
            if (i > b.position()+1) {
                return result;
            } else {    /* Only got "-" */
                throw forInputString(b);
            }
        } else {
            return -result;
        }
    }

}

回复收藏 0 原文

~没有更多了~