LogBack System.ERR输出使用错误的编码

发布于 2025-02-02 07:22:26 字数 1699 浏览 4 评论 0 原文

我正在使用Windows 10上的Java 17使用logback 1.2.11。我正在使用以下 logback.xml

<configuration>
  <property scope="context" name="COLORIZER_COLORS" value="boldred@,boldyellow@,boldcyan@,@,@" />
  <conversionRule conversionWord="colorize" converterClass="org.tuxdude.logback.extensions.LogColorizer" />
  <statusListener class="ch.qos.logback.core.status.NopStatusListener" />
  <appender name="STDERR" class="ch.qos.logback.core.ConsoleAppender">
    <target>System.err</target>
    <withJansi>true</withJansi>
    <encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
      <pattern>[%colorize(%level)] %msg%n</pattern>
    </encoder>
  </appender>
  <root level="INFO">
    <appender-ref ref="STDERR" />
  </root>
</configuration>

如果在我的代码中,我会使用 system.out.ut.println(“é” ) system.err.println(“é”),我在主机是预期的。但是,如果我通过登录(通过SLF4J)登录,它将显示屏幕上的θ字符(U+0398,希腊大写字母Theta)。是否使用&lt; target&gt; system.out&lt;/target&gt; &lt; target&gt; system.err&lt;/target&gt; in my logback.xml 文件。

默认情况下,对于 consoleappender ,应使用系统默认编码。 (请参阅 logack default charset for LayoutWrappingEncoder?进行广泛的讨论。)Windows 10 Console在我的语言环境中编码的Windows-Windows-windows-1252(或在Powershell,ISO-8859-1)。 θ字符甚至都没有出现在这两个魅力中。

当应该打印θ字符时,为什么要打印θ当应该打印é字符时?更一般地,为什么在打印到 system.out system.err 时,为何不使用默认编码来记录回货?

I'm using Logback 1.2.11 with Java 17 on Windows 10. I'm using the following logback.xml:

<configuration>
  <property scope="context" name="COLORIZER_COLORS" value="boldred@,boldyellow@,boldcyan@,@,@" />
  <conversionRule conversionWord="colorize" converterClass="org.tuxdude.logback.extensions.LogColorizer" />
  <statusListener class="ch.qos.logback.core.status.NopStatusListener" />
  <appender name="STDERR" class="ch.qos.logback.core.ConsoleAppender">
    <target>System.err</target>
    <withJansi>true</withJansi>
    <encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
      <pattern>[%colorize(%level)] %msg%n</pattern>
    </encoder>
  </appender>
  <root level="INFO">
    <appender-ref ref="STDERR" />
  </root>
</configuration>

If in my code I use System.out.println("é") or System.err.println("é"), I see an é (U+00E9, a small letter e with acute accent) on the console as expected. However if I log through Logback (via SLF4J), it shows a Θ character (U+0398, a Greek capital letter theta) on the screen. This happens whether I use <target>System.out</target> or <target>System.err</target> in my logback.xml file.

By default the PatternLoutEncoder for ConsoleAppender should be using the system default encoding. (See LogBack default charset for LayoutWrappingEncoder? for extensive discussion.) The Windows 10 console encoding in my locale should be Windows-1252 (or in Powershell, ISO-8859-1). The Θ character doesn't even appear in either of those charsets.

Why is Logback printing a Θ character to the standard output when it should be printing an é character? More generally, why isn't Logback using the default encoding when printing to System.out or System.err?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

深海蓝天 2025-02-09 07:22:26

看来LogBack使用了错误的“默认charset”。 system.out 的API Javadocs关于其默认字符集(也适用于 System.err ):

“标准”输出流。该流已经打开并准备好接受输出数据。通常,此流对应于显示输出或主机环境或用户指定的其他输出目标。从字符到字节的转换中使用的编码等效于 console.charset()如果 console 已存在, charset.default.default.defaultchareet()否则。

在我的Windows 10命令提示符上, charset.defaultcharset()返回 windows> Windows-1252 ,而 system.console()。charset()代码> IBM437 。如果创建 new UppoteStreamWriter(System.Out,System.Console()。charset())并编写字符串“é” ,它会产生é如预期。但是,如果我使用新的outputStreamWriter(System.out,charset.defaultcharset())和WRITE “é” ,它会产生θ!这就是θ的来源 - 它是 ibm437 charset的一部分!

我不会在这里问为什么我的Windows 10命令提示符默认为 IBM437 作为其默认charset;在这个问题的背景下,这是重点。

根问题似乎是记录错误地检索了默认字符设置。 (这是长篇小说,但基本上是logback是 string.getBytes()。的默认charset 。)最终在 layoutwrappingencoder 中依靠 charset.default.default.default.default.default.default.default.default.default.default.defeart. /code>,不匹配控制台的 /code>;相反,如果要匹配控制台的默认charset,则应默认为 system.console()。charset()

显然, layoutwrappingencoder 不知道它是写入控制台还是其他一些输出流,实际上使用 charset.defaultchareet()。也许需要某种方式使 ch.qos.logback.core.outputStreamAppender 可以将其Charset暴露于 layoutwrappingencodencoder ch.qos.logback.core。 consoleappender 可以基于 system.console()。charset()而不是 charset.default.default.defaultcharset()

无论如何,这里的罪魁祸首似乎是使用错误的默认charset进行记录,用于 system.out system.err 的控制台。 (有人知道我如何告诉logback使用 system.console()。charset()而不是 charset.defaultchareet()提前了解默认的控制台charset,因此我无法将其硬编码 logback.xml 。)

我已提交了logback bug logback-1642

It looks like Logback is using the wrong "default charset". The API Javadocs of System.out says this about its default charset (which applies to System.err as well):

The "standard" output stream. This stream is already open and ready to accept output data. Typically this stream corresponds to display output or another output destination specified by the host environment or user. The encoding used in the conversion from characters to bytes is equivalent to Console.charset() if the Console exists, Charset.defaultCharset() otherwise.

On my Windows 10 Command Prompt, Charset.defaultCharset() returns windows-1252, while System.console().charset() returns IBM437. If create a new OutputStreamWriter(System.out, System.console().charset()) and write the string "é", it produces é as expected. But sure enough if I use new OutputStreamWriter(System.out, Charset.defaultCharset()) and write "é", it produces Θ! So that's where the Θ was coming from—it is part of the IBM437 charset!

I won't ask here why my Windows 10 Command Prompt is defaulting to IBM437 as its default charset; in the context of this issue, that's beside the point.

The root problem seems to be that Logback is retrieving the default character set erroneously. (It's a long story, but basically Logback is relying on the default charset of String.getBytes().) Ultimately Lobback in LayoutWrappingEncoder is relying on the value of Charset.defaultCharset(), which doesn't match that of the console; instead it should be defaulting to System.console().charset() if it wants to match the default charset of the console.

Apparently the LayoutWrappingEncoder doesn't know if it's writing to the console or some other output stream that in fact uses Charset.defaultCharset(). Perhaps there needs to be some way that ch.qos.logback.core.OutputStreamAppender can expose its charset to LayoutWrappingEncoder, and ch.qos.logback.core.ConsoleAppender can override the default based on System.console().charset() instead of Charset.defaultCharset().

In any case the culprit here seems to be Logback using the wrong default charset for the console for System.out and System.err. (Anyone know how I can tell Logback to use System.console().charset() instead of Charset.defaultCharset()? I certainly don't have any way of knowing the default console charset ahead of time, so I can't hard-code it into logback.xml.)

I have filed Logback bug LOGBACK-1642.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文