Ada 中的 HTML 实体转换器

发布于 2024-12-23 04:30:52 字数 1109 浏览 3 评论 0原文

我想编写一个 Ada 程序,用适用的 HTML 实体替换 Latin1 字符,但我的代码不起作用:text.txtconverted.txt 始终相同。我的导师说代码是正确的。 提前致谢!

这是我的代码:

with Ada.Text_IO;
procedure Entity_Converter is
   use Ada.Text_IO;

   Source : File_Type;
   Target : File_Type;
   Source_Char : Character;
begin
   Open (Source, In_File, "test.txt");
   Create (Target, Out_File, "converted.txt");
   while not End_Of_File (Source) loop
      Get (Source, Source_Char);
      case Source_Char is
         when 'ä' =>
            Put (Target, "ä");
         when 'Ä' =>
            Put (Target, "Ä");
         when 'ö' =>
            Put (Target, "ö");
         when 'Ö' =>
            Put (Target, "Ö");
         when 'ü' =>
            Put (Target, "ü");
         when 'Ü' =>
            Put (Target, "Ü");
         when 'ß' =>
            Put (Target, "ß");
         when others =>
            Put (Target, Source_Char);
      end case;
   end loop;
   Close (Source);
   Close (Target);
end Entity_Converter;

I want to write an Ada program which replaces Latin1 characters with applicable HTML entities, but my code does not work: text.txt and converted.txt are always the same. My tutor said that code is correct.
Thanks in advance!

Here is my code:

with Ada.Text_IO;
procedure Entity_Converter is
   use Ada.Text_IO;

   Source : File_Type;
   Target : File_Type;
   Source_Char : Character;
begin
   Open (Source, In_File, "test.txt");
   Create (Target, Out_File, "converted.txt");
   while not End_Of_File (Source) loop
      Get (Source, Source_Char);
      case Source_Char is
         when 'ä' =>
            Put (Target, "ä");
         when 'Ä' =>
            Put (Target, "Ä");
         when 'ö' =>
            Put (Target, "ö");
         when 'Ö' =>
            Put (Target, "Ö");
         when 'ü' =>
            Put (Target, "ü");
         when 'Ü' =>
            Put (Target, "Ü");
         when 'ß' =>
            Put (Target, "ß");
         when others =>
            Put (Target, Source_Char);
      end case;
   end loop;
   Close (Source);
   Close (Target);
end Entity_Converter;

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

ゞ花落谁相伴 2024-12-30 04:30:52

结果取决于源文本和测试文件的编码。

要解决前者,请使用包 Ada.Characters.Latin_1

with Ada.Characters.Latin_1;
use Ada.Characters.Latin_1;
...
   case Source_Char is
      when LC_A_Diaeresis =>
         Put (Target, "ä");
      when UC_A_Diaeresis =>
         Put (Target, "Ä");
      ...
      when LC_German_Sharp_S =>
         Put (Target, "ß");
      when others =>
         Put (Target, Source_Char);
   end case;

后者取决于您的编辑器。

The result depends on the encoding of both the source text, as well as the test file.

To address the former, use the constants of the package Ada.Characters.Latin_1:

with Ada.Characters.Latin_1;
use Ada.Characters.Latin_1;
...
   case Source_Char is
      when LC_A_Diaeresis =>
         Put (Target, "ä");
      when UC_A_Diaeresis =>
         Put (Target, "Ä");
      ...
      when LC_German_Sharp_S =>
         Put (Target, "ß");
      when others =>
         Put (Target, Source_Char);
   end case;

The latter depends on your editor.

枕花眠 2024-12-30 04:30:52

我在 Mac 上运行并复制了您的源代码。当我编译它时,它抱怨(例如)'ä'需要双引号;暗示来源使用宽字符。看起来它是UTF-8[1]的,所以我用-gnatW8编译,看起来成功了。

然后,我在其源文本的副本上运行该程序,它无法转换文本

使用 -gnatdg 进行编译,这使得 GNAT 生成其内部源代码树的表示,

  ada__text_io__get (source, source_char);
  case source_char is
     when '["e4"]' =>
        ada__text_io__put__3 (target, "ä");
     when '["c4"]' =>
        ada__text_io__put__3 (target, "Ä");

在我看来,GNAT 似乎已经读取了 ä 的 UTF-8 编码并且使用 Latin-1 版本作为 case 语句;鉴于它说的是Character,这并非不合理,并且足以解释为什么它无法自行转换。

然后我尝试使用 Ada.Wide_Text_IOWide_Character。遗憾的是,该计划失败了,原因与之前相同。我们可以看看一个功能吗?甚至是一个错误?

[1] 当然,由于我下载它的迂回方式,该文件可能最终以 UTF-8 格式结束。

I’m running on a Mac and I copied your source. When I compiled it, it complained that (for example) ’ä’ needed double quotes; a hint that the source uses wide characters. It seems it’s in UTF-8[1], so I compiled with -gnatW8, which appeared to be successful.

I then ran the program on a copy of its own source text, and it failed to transform the text.

Compiling with -gnatdg, which makes GNAT produce a representation of its internal source tree, I get

  ada__text_io__get (source, source_char);
  case source_char is
     when '["e4"]' =>
        ada__text_io__put__3 (target, "ä");
     when '["c4"]' =>
        ada__text_io__put__3 (target, "Ä");

which looks to me as though GNAT has read the UTF-8 encoding of ä and used the Latin-1 version for the case statement; not unreasonable given that it says Character, and quite enough to explain why it failed to convert itself.

I then tried using Ada.Wide_Text_IO and Wide_Character. Sadly the program failed, for the same reason as before. Could we be looking at a feature? or even a bug?

[1] The file may have ended up in UTF-8 because of the roundabout way I downloaded it, of course.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文