将 ASCII byte[] 转换为字符串

发布于 2024-08-19 22:58:40 字数 342 浏览 3 评论 0 原文

我试图将包含 ASCII 字符的 byte[] 传递给 log4j,以便使用明显的表示形式记录到文件中。当我简单地传入 byt[] 时,它当然被视为一个对象,并且日志非常无用。当我尝试使用 new String(byte[] data) 将它们转换为字符串时,我的应用程序的性能减半。

我怎样才能有效地传递它们,而又不会因为将它们转换为字符串而花费大约 30us 的时间。

另外,为什么转换它们需要这么长时间?

谢谢。

编辑

我应该补充一点,我正在优化此处的延迟 - 是的,30us 确实有所作为!此外,这些数组的范围从大约 100 字节到几千字节不等。

I am trying to pass a byte[] containing ASCII characters to log4j, to be logged into a file using the obvious representation. When I simply pass in the byt[] it is of course treated as an object and the logs are pretty useless. When I try to convert them to strings using new String(byte[] data), the performance of my application is halved.

How can I efficiently pass them in, without incurring the approximately 30us time penalty of converting them to strings.

Also, why does it take so long to convert them?

Thanks.

Edit

I should add that I am optmising for latency here - and yes, 30us does make a difference! Also, these arrays vary from ~100 all the way up to a few thousand bytes.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

晌融 2024-08-26 22:58:40

ASCII 是少数可以在 UTF16 和 UTF16 之间相互转换的编码之一,无需算术或表查找,因此可以手动转换:

String convert(byte[] data) {
    StringBuilder sb = new StringBuilder(data.length);
    for (int i = 0; i < data.length; ++ i) {
        if (data[i] < 0) throw new IllegalArgumentException();
        sb.append((char) data[i]);
    }
    return sb.toString();
}

但请确保它确实是 ASCII,否则您最终会得到垃圾。

ASCII is one of the few encodings that can be converted to/from UTF16 with no arithmetic or table lookups so it's possible to convert manually:

String convert(byte[] data) {
    StringBuilder sb = new StringBuilder(data.length);
    for (int i = 0; i < data.length; ++ i) {
        if (data[i] < 0) throw new IllegalArgumentException();
        sb.append((char) data[i]);
    }
    return sb.toString();
}

But make sure it really is ASCII, or you'll end up with garbage.

像极了他 2024-08-26 22:58:40

您想要做的是延迟处理 byte[] 数组,直到 log4j 决定它实际上想要记录消息。这样,您可以在测试时将其记录在调试级别,然后在生产期间禁用它。例如,您可以:

final byte[] myArray = ...;
Logger.getLogger(MyClass.class).debug(new Object() {
    @Override public String toString() {
        return new String(myArray);
    }
});

现在,除非您实际记录数据,否则您不必支付速度损失,因为在 log4j 决定实际记录消息之前,不会调用 toString 方法!

现在我不确定你所说的“明显的表示”是什么意思,所以我假设你的意思是通过将字节重新解释为默认字符编码来转换为字符串。现在,如果您正在处理二进制数据,这显然毫无价值。在这种情况下,我建议使用 Arrays.toString(byte[]) 按照以下方式创建格式化字符串

[54, 23, 65, ...]

What you want to do is delay processing of the byte[] array until log4j decides that it actually wants to log the message. This way you could log it at DEBUG level, for example, while testing and then disable it during production. For example, you could:

final byte[] myArray = ...;
Logger.getLogger(MyClass.class).debug(new Object() {
    @Override public String toString() {
        return new String(myArray);
    }
});

Now you don't pay the speed penalty unless you actually log the data, because the toString method isn't called until log4j decides it'll actually log the message!

Now I'm not sure what you mean by "the obvious representation" so I've assumed that you mean convert to a String by reinterpreting the bytes as the default character encoding. Now if you are dealing with binary data, this is obviously worthless. In that case I'd suggest using Arrays.toString(byte[]) to create a formatted string along the lines of

[54, 23, 65, ...]
半仙 2024-08-26 22:58:40

如果您的数据实际上是 ASCII(即 7 位数据),那么您应该使用 new String(data, "US-ASCII") 而不是依赖于平台默认编码。这可能比尝试将其解释为平台默认编码(可能是 UTF-8,需要更多自省)更快。

您还可以通过缓存 Charset 实例并调用 new String(data, charset) 代替。

话虽如此:我已经很久很久没有在生产环境中看到真正的 ASCII 数据了

If your data is in fact ASCII (i.e. 7-bit data), then you should be using new String(data, "US-ASCII") instead of depending on the platform default encoding. This may be faster than trying to interpret it as your platform default encoding (which could be UTF-8, which requires more introspection).

You could also speed this up by avoiding the Charset-Lookup hit each time, by caching the Charset instance and calling new String(data, charset) instead.

Having said that: it's been a very, very long time since I've seen real ASCII data in production environment

不即不离 2024-08-26 22:58:40

性能减半?这个字节数组有多大?例如,如果它是 1MB,那么肯定有更多的因素需要考虑,而不仅仅是从字节“转换”为字符(虽然应该足够快)。 写入 1MB 数据而不是“仅”100 字节(byte[].toString() 可能生成)写入日志文件显然是需要一些时间。磁盘文件系统不如 RAM 内存快。

您需要更改字节数组的字符串表示形式。也许有一些更敏感的信息,例如与之关联的名称(文件名?)、其长度等等。毕竟,该字节数组实际上代表什么?

编辑:我不记得在你的问题中看到过“大约30us”短语,也许你在提问后5分钟内编辑了它,但这实际上是微优化一般来说,它当然不应该导致“性能减半”。除非你每秒将它们写入一百万次(那么,你为什么要这样做?你不是过度使用“日志记录”现象吗?)。

Halved performance? How large is this byte array? If it's for example 1MB, then there are certainly more factors to take into account than just "converting" from bytes to chars (which is supposed to be fast enough though). Writing 1MB of data instead of "just" 100bytes (which the byte[].toString() may generate) to a log file is obviously going to take some time. The disk file system is not as fast as RAM memory.

You'll need to change the string representation of the byte array. Maybe with some more sensitive information, e.g. the name associated with it (filename?), its length and so on. After all, what does that byte array actually represent?

Edit: I can't remember to have seen the "approximately 30us" phrase in your question, maybe you edited it in within 5 minutes after asking, but this is actually microoptimization and it should certainly not cause "halved performance" in general. Unless you write them a million times per second (still then, why would you want to do that? aren't you overusing the phenomenon "logging"?).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文