以大量 nul 字符结尾的 StringBuilder

发布于 2024-10-24 22:11:59 字数 2684 浏览 8 评论 0原文

我在调试我正在构建的应用程序的问题时遇到了非常困难的时间。我似乎无法用具有相同问题的代表性测试程序来重现问题本身，这使得演示变得困难。不幸的是，由于安全原因，我无法分享我的实际源代码，但是，以下测试很好地代表了我正在做的事情，文件和数据是 unix 样式 EOL，使用 PrintWriter 写入 zip 文件，以及 StringBuilders 的使用：

public class Tester {

    public static void main(String[] args) {
        // variables
        File target = new File("TESTSAVE.zip");
        PrintWriter printout1;
        ZipOutputStream zipStream;
        ZipEntry ent1;
        StringBuilder testtext1 = new StringBuilder();
        StringBuilder replacetext = new StringBuilder();
        // ensure file replace
        if (target.exists()) {
            target.delete();
        }
        try {
            // open the streams
            zipStream = new ZipOutputStream(new FileOutputStream(target, true));
            printout1 = new PrintWriter(zipStream);
            ent1 = new ZipEntry("testfile.txt");
            zipStream.putNextEntry(ent1);

            // construct the data
            for (int i = 0; i < 30; i++) {
            testtext1.append("Testing 1 2 3 Many! \n");
            }
            replacetext.append("Testing 4 5 6 LOTS! \n");
            replacetext.append("Testing 4 5 6 LOTS! \n");

            // the replace operation
            testtext1.replace(21, 42, replacetext.toString());

            // write it
            printout1 = new PrintWriter(zipStream);
            printout1.println(testtext1);
            // save it
            printout1.flush();
            zipStream.closeEntry();
            printout1.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

问题的核心是我在我身边看到的文件正在生成一个 16.3k 字符的文件。我的朋友，无论他在电脑上使用该应用程序，还是与我查看完全相同的文件，都会看到一个包含 19.999k 个字符的文件，其中额外的字符是 CRLF 后跟大量空字符。无论我使用什么应用程序、编码或视图，我都看不到这些 nul 字符，我只在最后一行看到一个 LF，但我确实看到了一个 20k 的文件。在所有情况下，即使两台计算机都是 Windows 计算机并且都使用相同的编辑软件来查看，但两台计算机上完全相同的文件所看到的内容之间还是存在差异。

我还无法使用任何数量的虚拟程序重现这种行为。不过，我已经能够将最后一行的杂散 CRLF 追溯到我在 PrintWriter 上使用 println 的情况。当我用 print(s + '\n') 替换 println(s) 时，问题似乎消失了（文件大小为 16.3k）。但是，当我将程序返回到 println(s) 时，问题似乎没有返回。我目前正在由法国的一位朋友验证文件，看看问题是否真的消失了（因为我看不到nul，但他可以），但这种行为已经彻底混淆了。

我还注意到 StringBuilder 的替换函数声明“如果需要，该序列将被延长以容纳指定的字符串”。鉴于 stringbuilders setLength 函数用 nul 字符填充，并且 EnsureCapacity 函数将容量设置为输入或 (currentCapacity*2)+2 中的较大者，我怀疑某处存在关系。然而，当我用这个想法进行测试时，我只有一次能够得到代表我所看到的结果，并且从那以后就无法重现它。

有谁知道可能导致此错误的原因，或者至少对进行测试的方向有建议？

由于评论部分对我来说已损坏，因此进行编辑：需要澄清的是，无论操作系统如何，输出都需要采用 unix 格式，因此直接使用 '\n' 而不是通过格式化程序。插入的原始 StringBuilder 实际上不是为我生成的，而是程序读取的文件的内容。我很高兴阅读过程有效，因为其中的信息在整个应用程序中被大量使用。我也做了一些探测，发现在保存之前，缓冲区是正确的容量，并且调用 toString() 时的输出是正确的长度（即它不包含空字符，长度为 16,363，而不是 19,999 ）。这会将错误原因置于生成字符串和保存 zip 文件之间。

原文

I'm having a very difficult time debugging a problem with an application I've been building. The problem itself I cannot seem to reproduce with a representitive test program with the same issue which makes it difficult to demonstrate. Unfortunately I cannot share my actual source because of security, however, the following test represents fairly well what I am doing, the fact that the files and data are unix style EOL, writing to a zip file with a PrintWriter, and the use of StringBuilders:

public class Tester {

    public static void main(String[] args) {
        // variables
        File target = new File("TESTSAVE.zip");
        PrintWriter printout1;
        ZipOutputStream zipStream;
        ZipEntry ent1;
        StringBuilder testtext1 = new StringBuilder();
        StringBuilder replacetext = new StringBuilder();
        // ensure file replace
        if (target.exists()) {
            target.delete();
        }
        try {
            // open the streams
            zipStream = new ZipOutputStream(new FileOutputStream(target, true));
            printout1 = new PrintWriter(zipStream);
            ent1 = new ZipEntry("testfile.txt");
            zipStream.putNextEntry(ent1);

            // construct the data
            for (int i = 0; i < 30; i++) {
            testtext1.append("Testing 1 2 3 Many! \n");
            }
            replacetext.append("Testing 4 5 6 LOTS! \n");
            replacetext.append("Testing 4 5 6 LOTS! \n");

            // the replace operation
            testtext1.replace(21, 42, replacetext.toString());

            // write it
            printout1 = new PrintWriter(zipStream);
            printout1.println(testtext1);
            // save it
            printout1.flush();
            zipStream.closeEntry();
            printout1.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

The heart of the problem is that the file I see at my side is producing a file of 16.3k characters. My friend, whether he uses the app on his pc or whether he looks at exactly the same file as me sees a file of 19.999k characters, the extra characters being a CRLF followed by a massive number of null characters. No matter what application, encoding or views I use, I cannot at all see these nul characters, I only see a single LF at the last line, but I do see a file of 20k. In all cases there is a difference between what is seen with the exact same files on the two machines even though both are windows machines and both are using the same editing softwares to view.

I've not yet been able to reproduce this behaviour with any amount of dummy programs. I have been able to trace the final line's stray CRLF to my use of println on the PrintWriter, however. When I replaced the println(s) with print(s + '\n') the problem appeared to go away (the file size was 16.3k). However, when I returned the program to println(s), the problem does not appear to return. I'm currently having the files verified by a friend in france to see if the problem really did go away (since I cannot see the nuls but he can), but this behaviour has be thoroughly confused.

I've also noticed that the StringBuilder's replace function states "This sequence will be lengthened to accommodate the specified String if necessary". Given that the stringbuilders setLength function pads with nul characters and that the ensureCapacity function sets capacity to the greater of the input or (currentCapacity*2)+2, I suspected a relation somewhere. However, I have only once when testing with this idea been able to get a result that represented what I've seen, and have not been able to reproduce it since.

Does anyone have any idea what could be causing this error or at least have a suggestion on what direction to take the testing?

Edit since the comments section is broken for me:
Just to clarify, the output is required to be in unix format regardless of the OS, hence the use of '\n' directly rather than through a formatter. The original StringBuilder that is inserted into is not in fact generated to me but is the contents of a file read in by the program. I'm happy the reading process works, as the information in it is used heavily throughout the application. I've done a little probing too and found that directly prior to saving, the buffer IS the correct capacity and that the output when toString() is invoked is the correct length (i.e. it contains no null characters and is 16,363 long, not 19,999). This would put the cause of the error somewhere between generating the string and saving the zip file.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

美人如玉 2024-10-31 22:11:59

终于找到原因了。设法重现该问题几次，并将原因追溯到代码的输入侧，而不是输出侧。我的文件读取功能本质上是这样的：

char[] buf;
int charcount = 0;
StringBuilder line = new StringBuilder(2048);
InputStreamReader reader = new InputStreamReader(stream);// provides a line-wise read
BufferedReader file = new BufferedReader(reader);
do { // capture loop
    try {
    buf = new char[2048];
    charcount = file.read(buf, 0, 2048);
    } catch (IOException e) {
    return null; // unknown IO error
    }
    line.append(buf);
} while (charcount != -1);
// close and output

问题是附加一个未满的缓冲区，因此后面的值仍为其初始值 null。我无法重现它的原因是因为有些数据很好地填充了缓冲区，有些则没有。

为什么我似乎无法在文本编辑器上查看问题，我仍然不知道，但我现在应该能够解决这个问题。欢迎任何有关最佳方法的建议，因为这是我的长期实用程序库之一的一部分，我希望使其尽可能通用和优化。

Finally found the cause. Managed to reproduce the problem a few times and traced the cause down not to the output side of the code but the input side. My file reading function was essentially this:

char[] buf;
int charcount = 0;
StringBuilder line = new StringBuilder(2048);
InputStreamReader reader = new InputStreamReader(stream);// provides a line-wise read
BufferedReader file = new BufferedReader(reader);
do { // capture loop
    try {
    buf = new char[2048];
    charcount = file.read(buf, 0, 2048);
    } catch (IOException e) {
    return null; // unknown IO error
    }
    line.append(buf);
} while (charcount != -1);
// close and output

problem was appending a buffer that wasnt full, so the later values were still at their initial values of null. Reason I couldnt reproduce it was because some data filled in the buffers nicely, some didn't.

Why I couldn't seem to view the problem on my text editors I still have no idea of, but I should be able to resolve this now. Any suggestions on the best way to do so are welcome, as this is part of one of my long term utility libraries I want to keep it as generic and optimised as possible.

回复收藏 0 原文

~没有更多了~