java中初始化字符串的大小
显然,javac 中初始化字符串的大小是有限制的。谁能帮我确定最大限制是多少?
谢谢
编辑:
我们正在构建一个初始化字符串,它看起来像这样“{1,2,3,4,5,6,7,8......}”,但带有理想情况下是 10,000 个数字。当我们对 1000 执行此操作时,它可以工作,10,000 会抛出一个错误,指出代码对于 try 语句来说太大。
为了生成这个,我们使用字符串生成器并循环遍历附加值的数组。显然这是 javac 的限制。我们被告知,如果我们以小块的形式传递数组,就可以在调用的方法中重建数组。然而这是不可能的,因为我们无法控制我们正在调用的用户方法。
我想发布代码,但不能,因为这是大学的项目。我并不是在寻找代码解决方案,只是在寻找一些帮助来理解这里的实际问题是什么。
它的 for 循环是罪犯
Object o = new Object()
{
public String toString()
{
StringBuilder s = new StringBuilder();
int length = MainInterfaceProcessor.this.valuesFromData.length;
Object[] arrayToProcess = MainInterfaceProcessor.this.valuesFromData;
if(length == 0)
{
//throw exception to do
}
else if(length == 1)
{
s.append("{" + Integer.toString((Integer)arrayToProcess[0])+"}");
}
else
{
s.append("{" + Integer.toString((Integer)arrayToProcess[0])+","); //opening statement
for(int i = 1; i < length; i++)
{
if(i == (length - 1))
{
//last element in the array so dont add comma at the end
s.append(getArrayItemAsString(arrayToProcess, i)+"}");
break;
}
//append each array value at position i, followed
//by a comma to seperate the values
s.append(getArrayItemAsString(arrayToProcess, i)+ ",");
}
}
return s.toString();
}
};
try
{
Object result = method.invoke(obj, new Object[] { o });
}
Apparently there is a limit to the size of an initialisation string in javac. Can anyone help me in identifying what the maximum limit is please?
Thank you
edit:
We are building an initialisation string which will look something like this "{1,2,3,4,5,6,7,8......}" but with 10,000 numbers ideally. When we do this for a 1000 it works, 10,000 throws an error saying code too large for try statement.
To produce this we are using a stringbuilder and looping over an array appending the values. Apparently it is a limitation in javac. We have been told that we could rebuild the array in the method we are invoking if we pass it in small chunks. This however is not possible because we dont have control over the user method we are invoking.
I would like to post code but can't because this is a project for University. I am not looking for code solutions just some help in understanding what the actual problem here is.
Its the for loop which is the offender
Object o = new Object()
{
public String toString()
{
StringBuilder s = new StringBuilder();
int length = MainInterfaceProcessor.this.valuesFromData.length;
Object[] arrayToProcess = MainInterfaceProcessor.this.valuesFromData;
if(length == 0)
{
//throw exception to do
}
else if(length == 1)
{
s.append("{" + Integer.toString((Integer)arrayToProcess[0])+"}");
}
else
{
s.append("{" + Integer.toString((Integer)arrayToProcess[0])+","); //opening statement
for(int i = 1; i < length; i++)
{
if(i == (length - 1))
{
//last element in the array so dont add comma at the end
s.append(getArrayItemAsString(arrayToProcess, i)+"}");
break;
}
//append each array value at position i, followed
//by a comma to seperate the values
s.append(getArrayItemAsString(arrayToProcess, i)+ ",");
}
}
return s.toString();
}
};
try
{
Object result = method.invoke(obj, new Object[] { o });
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
字符串文字(即
"..."
)的长度受类文件格式的CONSTANT_Utf8_info
结构,由CONSTANT_String_info
结构。这里的限制因素是
length
属性,它只有 2 个字节大,即最大值为 65535。该数字对应于字符串的修改后的 UTF-8 表示形式中的字节数(这实际上几乎是 CESU-8,但0字符也以两字节形式表示)。
因此,纯 ASCII 字符串文字最多可以包含 65535 个字符,而由 U+0800 ...U+FFFF 范围内的字符组成的字符串只有其中的三分之一。在 UTF-16 中编码为代理项对(即 U+10000 到 U+10FFFF)的每个占用 6 个字节(真正的 UTF-8 在这里需要 5 个字节)。
(标识符(即类、方法和变量名以及它们的类型描述符)也存在相同的限制,因为它们使用相同的结构。)
Java 语言规范没有提及 字符串文字:
因此,原则上,编译器可以将较长的字符串文字拆分为多个 CONSTANT_String_info 结构,并在运行时通过连接(以及 .intern() - 生成结果)来重建它。我不知道是否有任何编译器实际上在这样做。
它表明问题与字符串文字无关,而是与数组初始值设定项有关。
将对象传递给
BMethod.invoke
(与 BConstructor.newInstance 类似),它可以是BObject(即现有对象的包装器,然后它将传递包装的对象)、字符串(将按原样传递)或其他任何内容。在最后一种情况下,对象将被转换为字符串(通过 toString()),然后该字符串被解释为 Java 表达式。为此,BlueJ 会将此表达式包装在类/方法中并编译此方法。在该方法中,数组初始值设定项只是转换为一长串数组赋值...这最终使该方法比 Java 方法的最大字节码大小:
这就是它对于较长数组会中断的原因。
因此,要传递更大的数组,我们必须找到其他方法将它们传递给 BMethod.invoke。 BlueJ 扩展 API 无法创建或访问封装在 BObject 中的数组。
我们在聊天中发现的一个想法是:
在项目内(或在新项目中,如果它们可以互操作)创建一个新类,如下所示:
(这是创建
int[]
的情况 - 如果您还需要其他类型的数组,它可以也变得更加通用。此外,通过使用
内部
int[]
作为存储,随着它的增长偶尔扩大它,以及int makeArray进行最终的数组复制。这是一个草图,因此这是最简单的实现。)
从我们的扩展中创建此类的对象,
并通过调用其
.addElement
方法向该对象添加元素。(为了提高效率,BClass/BMethod 对象实际上可以检索一次并缓存,而不是每次数组转换一次。)
如果您通过某种算法生成数组内容,则可以在此处进行此生成,而不是首先创建另一个包装对象。
在我们的扩展中,使用长数组调用我们实际想要调用的方法,并传递我们的包装数组:
The length of a String literal (i.e.
"..."
) is limited by the class file format'sCONSTANT_Utf8_info
structure, which is referred by theCONSTANT_String_info
structure.The limiting factor here is the
length
attribute, which only is 2 bytes large, i.e. has a maximum value of 65535.This number corresponds to the number of bytes in a modified UTF-8 representation of the string (this is actually almost CESU-8, but the 0 character is also represented in a two-byte form).
So, a pure ASCII string literal can have up to 65535 characters, while a string consisting of characters in the range U+0800 ...U+FFFF have only one third of these. And the ones encoded as surrogate pairs in UTF-16 (i.e. U+10000 to U+10FFFF) take up 6 bytes each (real UTF-8 would take 5 here).
(The same limit is there for identifiers, i.e. class, method and variable names, and type descriptors for these, since they use the same structure.)
The Java Language Specification does not mention any limit for string literals:
So in principle a compiler could split a longer string literal into more than one
CONSTANT_String_info
structure and reconstruct it on runtime by concatenation (and.intern()
-ing the result). I have no idea if any compiler is actually doing this.It shows that the problem does not relate to string literals, but to array initializers.
When passing an object to
BMethod.invoke
(and similarly to BConstructor.newInstance), it can either be a BObject (i.e. a wrapper around an existing object, it will then pass the wrapped object), a String (which will be passed as is), or anything else. In the last case, the object will be converted to a string (bytoString()
), and this string then interpreted as a Java expression.To do this, BlueJ will wrap this expression in a class/method and compile this method. In the method, the array initializer is simply converted to a long list of array assignments ... and this finally makes the method longer than the maximum bytecode size of a Java method:
This is why it breaks for longer arrays.
So, to pass larger arrays, we have to find some other way to pass them to BMethod.invoke. The BlueJ extension API has no way to create or access arrays wrapped in a BObject.
One idea we found in chat is this:
Create a new class inside the project (or in a new project, if they can interoperate), something like this:
(This is for the case of creating an
int[]
- if you need other types of array, too, it canalso be made more generic. Also, it could be made more efficient by using an
internal
int[]
as storage, enlarging it sporadically as it grows, and int makeArraydoing a final arraycopy. This is a sketch, thus this is the simplest implementation.)
From our extension, create an object of this class ,
and add elements to this object by calling its
.addElement
method.(For efficiency, the BClass/BMethod objects could actually be retrieved once and cached instead of once for each array conversion.)
If you generate the arrays contents by some algorithm, you can do this generation here instead of first creating another wrapping object.
In our extension, call the method we actually want to call with the long array, passing our wrapped array:
字符串长度受 Integer.MAX_VALUE< 限制/a>
The String length is limited by the Integer.MAX_VALUE
如果类型为 number 最大长度 = Integer.MAX_VALUE,如果类型为 char 最大长度 = 65536
if type is number max length = Integer.MAX_VALUE,if type is char max length = 65536