避免创建“新的”使用特定字符集将 byte[] 转换为 String 时的 String 对象

发布于 2024-08-08 11:19:36 字数 439 浏览 1 评论 0原文

我正在读取二进制文件并希望将字节转换为 US ASCII 字符串。有没有办法在不调用 String 上的 new 的情况下执行此操作,以避免在字符串文字池中创建多个语义相等的 String 对象?我认为这可能是不可能的,因为这里不可能使用双引号引入 String 对象。这是正确的吗?

private String nextString(DataInputStream dis, int size)
throws IOException
{
  byte[] bytesHolder = new byte[size];
  dis.read(bytesHolder);
  return new String(bytesHolder, Charset.forName("US-ASCII")).trim();

I'm reading from a binary file and want to convert the bytes to US ASCII strings. Is there any way to do this without calling new on String to avoid multiple semantically equal String objects being created in the string literal pool? I'm thinking that it is probably not possible since introducing String objects using double quotes is not possible here. Is this correct?

private String nextString(DataInputStream dis, int size)
throws IOException
{
  byte[] bytesHolder = new byte[size];
  dis.read(bytesHolder);
  return new String(bytesHolder, Charset.forName("US-ASCII")).trim();

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

不离久伴 2024-08-15 11:19:36

您必须有一个将字节数组映射到字符串的缓存,然后在创建新字符串之前在缓存中搜索任何相等的值。

您可以使用 Yishai 发布的 intern() 来实习现有字符串 - 这不会阻止您创建更多字符串,但它会创建除第一个字符串之外的所有字符串(对于任何字符序列)非常短暂。另一方面,它确实会让所有不同的字符串存活很长时间。

您可以通过使用 Map来实现“伪驻留”:

String tmp = new String(bytesHolder, Charset.forName("US-ASCII")).trim();
String cached = cache.get(tmp);
if (cached == null)
{
    cached = tmp;
    cache.put(tmp, tmp);
}
return cached;

您甚至可以付出更多的努力并最终得到一个 LRU 缓存,以便它将保留 N 个最最近获取的字符串,在需要时丢弃其他字符串。

正如我所说,这些都不会减少首先创建的字符串数量 - 但这在您的情况下可能是一个问题吗? GC 经过调整,可以非常便宜地创建短寿命对象。

You'd have to have a cache mapping byte arrays to strings, then search through the cache for any equal values before creating a new string.

You can intern existing strings with intern() as Yishai posted - that won't stop you from creating more strings, but it'll make all but the first one (for any char sequence) very short lived. On the other hand, it'll make all the distinct strings live for a very long time indeed.

You can have "pseudo-interning" by using a Map<String, String>:

String tmp = new String(bytesHolder, Charset.forName("US-ASCII")).trim();
String cached = cache.get(tmp);
if (cached == null)
{
    cached = tmp;
    cache.put(tmp, tmp);
}
return cached;

You could even put a bit more effort in and end up with an LRU cache so that it'll keep the N most recently fetched strings, discarding others when it needs to.

None of that reduces the number of strings created in the first place, as I say - but is that likely to be a problem in your situation? GCs have been tuned to make it very cheap to create short-lived objects.

送你一个梦 2024-08-15 11:19:36

您可以对字符串调用 intern() 方法,以确保整个 JVM 都具有该字符串。

String s = new String(bytes, "US-ASCII").intern();

您将无法避免再次创建初始字符串,但可以节省存储空间。

话虽如此,interned 字符串的存储空间有限,因此请谨慎使用。更好的选择可能是实现一个以字符串作为键和值的 HashMap,并检查该字符串是否已存在,如果存在则获取它,如果不存在则插入它。这样你就不会有这样的内存限制。

You can call the intern() method on the string to ensure one for the whole JVM.

String s = new String(bytes, "US-ASCII").intern();

You won't avoid creating the initial string again, but you will save on the storage.

That being said, interned strings have a limited storage space, so use with caution. A better option may be to implement a HashMap with the string as the key and value and check if the string already exists and get it if it does, insert it if it doesn't. That way you won't have such memory limitations.

土豪我们做朋友吧 2024-08-15 11:19:36

您不应该担心它,除非您分析了您的应用程序并确定 String 创建是问题的确切根源。

如果您发现String创建问题的根源,我会推荐Jon Skeet 提出,即来自 byte[] 的映射字符串。这与 实习您的字符串,同时不会占用宝贵的内存,直到您重新启动虚拟机为止。

You shouldn’t be concerned about it—unless you profiled your application and have determined the String creation to be the exact source of your problem.

If you find out that the String creation is the source of your problem I would recommend what Jon Skeet proposed, i.e. a mapping from byte[] to String. That has about the same effect as interning your Strings while not hogging up valuable memory until you restart the VM.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文