使用Tycho SureFire插件编码问题

发布于 2025-01-28 12:56:49 字数 2910 浏览 2 评论 0原文

我一直在讨论这个问题,找不到有关这里发生的事情的解释。 我正在使用Tycho SureFire插件来构建一组Eclipse插件并执行一些单元测试。 这是环境:

tycho-surefire-plugin: version 0.19.0 //very old I know, but I'm stuck with legacy code
maven 3.5.2
jdk 8
windows 10

我从一个简单的测试用例开始,以测试一种用简单版本替换特殊字符的方法:

String input = "á:Á-é:É-í:Í-ó:Ó-ú:Ú-ñ:Ñ  ";
System.out.println("testing input: " + input);
Assert.assertEquals("a A e E i I o O u U n N ",
    Utils.sanitize(input, true));

这里的问题是,直接在Eclipse上执行Junit时,我得到了预期的结果,因此测试通过了,但是当我执行tycho构建时,我会得到:

testing input: ?:?-?:?-?:?-?:?-?:?-?:?
Failed tests:   testSanitizeWithSpaces(com.fja.eos.automation.UtilsTest): expected:<[a A e E i I o O u U n N] > but was:<[o o o o o o o o o o o o] >

在两种情况下的值

System.out.println("Default charset: " + Charset.defaultCharset());

都是相同的:

Default charset: windows-1252

我的下一个尝试是从文件中读取输入值,使用以下方式控制charset:

InputStream is = UtilsTest.class
    .getResourceAsStream("sanitationTestSubjects.xml");
InputSource source = new InputSource(is);
source.setEncoding("ISO-8859-1");
Document doc = builder.parse(is);

在读取这样的输入

<?xml version="1.0" encoding="ISO-8859-1" ?>
<SanitationTestSubjects>
    <Subject input="á:Á-é:É-í:Í-ó:Ó-ú:Ú-ñ:Ñ  " expected="a A e E i I o O u U n N " />
</SanitationTestSubjects>

时,我得到了一个稍有不同的结果:

testing input: á:?-é:É-í:?-ó:?-ú:?-ñ:Ñ

但仍然不正确。如果我尝试使用逃脱的输入,

StringEscapeUtils.escapeJava(elem.getAttribute("input"))

我将获得似乎是正确的Unicode序列:

Escaped input: \u00E1:\u00C1-\u00E9:\u00C9-\u00ED:\u00CD-\u00F3:\u00D3-\u00FA:\u00DA-\u00F1:\u00D1

我尝试在Tycho-surefire-Plugin上设置所有字符编码选项,而没有任何行为的更改:

<build>
        <plugins>
            <plugin>
                <groupId>org.eclipse.tycho</groupId>
                <artifactId>tycho-surefire-plugin</artifactId>
                <version>${tycho-version}</version>
                <configuration>
                    <appArgLine>-Dfile.encoding=ISO-8859-1</appArgLine>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-resources-plugin</artifactId>
                <version>3.2.0</version>
                <configuration>
                    <encoding>ISO-8859-1</encoding>
                </configuration>
            </plugin>
        </plugins>
    </build>

一项测试在蚀和maven上产生二进制均等文件。

更新: Java文件本身的编码设置为CP1252,将其更改为ISO-8859-1之后,我得到的结果与从文件中读取值相同

。到错误的一面。 有人可以帮忙吗?

I've been turning my head around with this issue and cannot find an explanation for what is happening here.
I'm using tycho surefire plugin to build a set of eclipse plugins and execute some unit tests.
Here's the environment:

tycho-surefire-plugin: version 0.19.0 //very old I know, but I'm stuck with legacy code
maven 3.5.2
jdk 8
windows 10

I've started with a simple test case to test a method that replaces special characters with their simple version:

String input = "á:Á-é:É-í:Í-ó:Ó-ú:Ú-ñ:Ñ  ";
System.out.println("testing input: " + input);
Assert.assertEquals("a A e E i I o O u U n N ",
    Utils.sanitize(input, true));

the problem here is that when executing the junit directly on eclipse I get the expected result, so the test passes, but when I execute the tycho build I get:

testing input: ?:?-?:?-?:?-?:?-?:?-?:?
Failed tests:   testSanitizeWithSpaces(com.fja.eos.automation.UtilsTest): expected:<[a A e E i I o O u U n N] > but was:<[o o o o o o o o o o o o] >

The value for

System.out.println("Default charset: " + Charset.defaultCharset());

is the same in both scenarios:

Default charset: windows-1252

my next attempt was to read the input value from a file, controlling the charset using:

InputStream is = UtilsTest.class
    .getResourceAsStream("sanitationTestSubjects.xml");
InputSource source = new InputSource(is);
source.setEncoding("ISO-8859-1");
Document doc = builder.parse(is);

for the file

<?xml version="1.0" encoding="ISO-8859-1" ?>
<SanitationTestSubjects>
    <Subject input="á:Á-é:É-í:Í-ó:Ó-ú:Ú-ñ:Ñ  " expected="a A e E i I o O u U n N " />
</SanitationTestSubjects>

while reading the input like this I got a slightly different result:

testing input: á:?-é:É-í:?-ó:?-ú:?-ñ:Ñ

but still not correct. If I try to get the escaped input with

StringEscapeUtils.escapeJava(elem.getAttribute("input"))

I get what it seems to be the correct unicode sequence:

Escaped input: \u00E1:\u00C1-\u00E9:\u00C9-\u00ED:\u00CD-\u00F3:\u00D3-\u00FA:\u00DA-\u00F1:\u00D1

I've tried setting all character encoding options on the tycho-surefire-plugin without any change on behavior:

<build>
        <plugins>
            <plugin>
                <groupId>org.eclipse.tycho</groupId>
                <artifactId>tycho-surefire-plugin</artifactId>
                <version>${tycho-version}</version>
                <configuration>
                    <appArgLine>-Dfile.encoding=ISO-8859-1</appArgLine>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-resources-plugin</artifactId>
                <version>3.2.0</version>
                <configuration>
                    <encoding>ISO-8859-1</encoding>
                </configuration>
            </plugin>
        </plugins>
    </build>

one more test, compiling the files on eclipse and with maven results in binary equal files..

UPDATE:
the encoding of the java file itself was set to cp1252, after changing it to ISO-8859-1 I got the same result as reading the value from a file.. still not there..

I'm really feeling that I'm looking to the wrong side of the problem.
can anyone please help?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文