PHP .rtf 波兰语字符编码问题

发布于 2025-01-09 10:28:17 字数 1448 浏览 0 评论 0原文


Got a problem with replacing polish characters through php in rtf file.
I want to find tagwords in rtf file content in replace them with relevant content So what I'm doing:
    // Getting rtf file content
    $content = file_get_contents('<link_to_file_here>');

    // encoding to utf-8
    $content = mb_convert_encoding($content, 'UTF-8');

    // replacing tagword with relevant content
    $content = str_replace('[company_address]', 'Częstochowa', $content);

    // save rtf file with replaced content
    file_put_contents('uploads/test.rtf', $content);
    
    echo $content; 

当我检查此代码执行后 rtf 文件内容发生了什么时,我注意到 Częstochowa 替换为 Cz\u0119stochowa
然后我在 MS Word 中打开一个新创建的 rtf 文件并看到此 Czä™stochowa
之后我决定在 rtf 文件中手动编写 Częstochowa 并检查会发生什么。我以相同的方式获取文件内容(通过 file_get_contents),并注意到 MS Word 将我手动编写的 Częstochowa 替换为 Cz\\'eastochowa。所以我决定这样做:

// replacing tagword with relevant content
$content = str_replace('[company_address]', 'Cz\\\'eastochowa', $content);

然后我在 MS Word 中打开文件并看到这个 Czêstochowa
Google 了一下,发现 ê 是来自 Unicode 块“Latin-1 Supplement”(从 U+0080 到 U+00FF) 的字符,代码为 U+00EA 但波兰语字符位于 Unicode 块“Latin Extended-A”(从 U+0100 到 U+017F) 中,因此我需要以某种方式对其编码 rtf 文件内容
我尝试了很多方法,但仍然没有解决问题。
希望您的帮助。感谢您的关注。

Got a problem with replacing polish characters through php in rtf file.
I want to find tagwords in rtf file content in replace them with relevant content
So what I'm doing:

    // Getting rtf file content
    $content = file_get_contents('<link_to_file_here>');

    // encoding to utf-8
    $content = mb_convert_encoding($content, 'UTF-8');

    // replacing tagword with relevant content
    $content = str_replace('[company_address]', 'Częstochowa', $content);

    // save rtf file with replaced content
    file_put_contents('uploads/test.rtf', $content);
    
    echo $content; 

When i check what happened with rtf file content after this code executed, i've noticed that Częstochowa replaced with Cz\u0119stochowa.
Then i open a new created rtf file in MS Word and see this Częstochowa.
After this i decided to write Częstochowa manually in rtf file and check what happens. I get file content the same way (via file_get_contents) and noticed that MS Word replaced my manually wrote Częstochowa with Cz\\'eastochowa. So i decided to do this:

// replacing tagword with relevant content
$content = str_replace('[company_address]', 'Cz\\\'eastochowa', $content);

And after this i open file in MS Word and see this Czêstochowa
Googled a bit and found that ê is character from Unicode Block “Latin-1 Supplement” (from U+0080 to U+00FF) with code U+00EA but polish characters are in Unicode Block “Latin Extended-A” (from U+0100 to U+017F), so i need to encode rtf file content to it somehow
I tried a lot of things but still didn't solve the problem.
Hope on Your help. Thanks for attention.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

蓝色星空 2025-01-16 10:28:17

找到解决办法:

$string = str_replace('&#', "\\u", mb_convert_encoding('Częstochowa', 'html'));
$content = str_replace('[company_address]', $string, $content);

Found a solution:

$string = str_replace('&#', "\\u", mb_convert_encoding('Częstochowa', 'html'));
$content = str_replace('[company_address]', $string, $content);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文