如何水平翻转文本？

发布于 2024-12-28 19:51:33 字数 2110 浏览 7 评论 0 原文

我需要编写一个函数来从左到右翻转字符串的所有字符。

例如：

那只快活的狐狸跳了起来。

应该成为

.goş yzⱥl ëht rểvo ᶁềṕmuj xof nworḇ kçiuq ėhT

我可以将问题限制为 UTF-16（它与 UTF-8 具有相同的问题，只是较少出现）。

幼稚的解决方案

幼稚的解决方案可能会尝试翻转所有事物（例如逐字，其中一个字是 16 位 - 如果我们可以假设一个字节是 16 位，我会说逐字节。我可以还说逐个字符，其中字符是表示单个代码点的数据类型Char）：

String original = "ɗỉf̴ḟếr̆ęnͥt";
String flipped = "";
foreach (Char c in s)
{
   flipped = c+fipped;
}

导致错误翻转的文本：

< code>ɗỉf̴ḟếr̆ęnͥt
̨tͥnę̆rếḟ̴fỉɗ

这是因为“字符”需要多个“代码点”。

ɗỉf̴ḟếr̆ęnͥt
ɗ ỉ f ~ ḟ ế r ˘ ę n i t ˛

并翻转每个“代码点”给出：

˛ t i <代码>n <代码>ę <代码>˘ <代码>r <代码>ế <代码>ḟ <代码> 〜 f ỉ ɗ

这不仅不是有效的 UTF-16 编码，而且不是相同的字符。

失败

UTF-16 编码中出现以下情况时会出现问题：

在另一个语言平面中组合变音符号字符

同样的问题也发生在 UTF-8 编码中，另外还有

0..127 之外的任何字符ASCII 范围

我可以将自己限制为更简单的 UTF-16 编码（因为这是我正在使用的语言所具有的编码（例如 C#、Delphi）。

在我看来，问题是发现是否有许多后续 代码点正在组合字符，并且需要与基本字形一起出现

在线文本反向网站未能考虑到这一点。

注意：

任何解决方案都应该假设无法访问 UTF-32 编码库（主要是因为我无法访问任何 UTF-32 编码库）

访问 UTF-32 编码库可以解决 UTF-8/UTF-16 语言平面问题，但不能解决组合变音符号问题

原文

i'm need to write a function that will flip all the characters of a string left-to-right.

e.g.:

Thė quiçk ḇrown fox jumṕềᶁ ovểr thë lⱥzy ȡog.

should become

.goȡ yzⱥl ëht rểvo ᶁềṕmuj xof nworḇ kçiuq ėhT

i can limit the question to UTF-16 (which has the same problems as UTF-8, just less often).

Naive solution

A naive solution might try to flip all the things (e.g. word-for-word, where a word is 16-bits - i would have said byte for byte if we could assume that a byte was 16-bits. i could also say character-for-character where character is the data type Char which represents a single code-point):

String original = "ɗỉf̴ḟếr̆ęnͥt";
String flipped = "";
foreach (Char c in s)
{
   flipped = c+fipped;
}

Results in the incorrectly flipped text:

ɗỉf̴ḟếr̆ęnͥt
̨tͥnę̆rếḟ̴fỉɗ

This is because one "character" takes multiple "code points".

ɗỉf̴ḟếr̆ęnͥt
ɗ ỉ f ˜ ḟ ế r ˘ ę n i t ˛

and flipping each "code point" gives:

˛ t i n ę ˘ r ế ḟ ˜ f ỉ ɗ

Which not only is not a valid UTF-16 encoding, it's not the same characters.

Failure

The problem happens in UTF-16 encoding when there is:

combining diacritics
characters in another lingual plane

Those same issues happen in UTF-8 encoding, with the additional case

any character outside the 0..127 ASCII range

i can limit myself to the simpler UTF-16 encoding (since that's the encoding that the language that i'm using has (e.g. C#, Delphi)

The problem, it seems to me, is discovering if a number of subsequent code points are combining characters, and need to come along with the base glyph.

It's also fun to watch an online text reverser site fail to take this into account.

Note:

any solution should assume that don't have access to a UTF-32 encoding library (mainly becuase i don't have access to any UTF-32 encoding library)

access to a UTF-32 encoding library would solve the UTF-8/UTF-16 lingual planes problem, but not the combining diacritics problem