PHP .rtf 波兰语字符编码问题
Got a problem with replacing polish characters through php in rtf file.
I want to find tagwords in rtf file content in replace them with relevant content So what I'm doing:
// Getting rtf file content
$content = file_get_contents('<link_to_file_here>');
// encoding to utf-8
$content = mb_convert_encoding($content, 'UTF-8');
// replacing tagword with relevant content
$content = str_replace('[company_address]', 'Częstochowa', $content);
// save rtf file with replaced content
file_put_contents('uploads/test.rtf', $content);
echo $content;
当我检查此代码执行后 rtf 文件内容发生了什么时,我注意到 Częstochowa
替换为 Cz\u0119stochowa
。
然后我在 MS Word 中打开一个新创建的 rtf 文件并看到此 Czä™stochowa
。
之后我决定在 rtf 文件中手动编写 Częstochowa
并检查会发生什么。我以相同的方式获取文件内容(通过 file_get_contents),并注意到 MS Word 将我手动编写的 Częstochowa
替换为 Cz\\'eastochowa
。所以我决定这样做:
// replacing tagword with relevant content
$content = str_replace('[company_address]', 'Cz\\\'eastochowa', $content);
然后我在 MS Word 中打开文件并看到这个 Czêstochowa
Google 了一下,发现 ê
是来自 Unicode 块“Latin-1 Supplement”(从 U+0080 到 U+00FF) 的字符,代码为 U+00EA
但波兰语字符位于 Unicode 块“Latin Extended-A”(从 U+0100 到 U+017F) 中,因此我需要以某种方式对其编码 rtf 文件内容
我尝试了很多方法,但仍然没有解决问题。
希望您的帮助。感谢您的关注。
Got a problem with replacing polish characters through php in rtf file.
I want to find tagwords in rtf file content in replace them with relevant content
So what I'm doing:
// Getting rtf file content
$content = file_get_contents('<link_to_file_here>');
// encoding to utf-8
$content = mb_convert_encoding($content, 'UTF-8');
// replacing tagword with relevant content
$content = str_replace('[company_address]', 'Częstochowa', $content);
// save rtf file with replaced content
file_put_contents('uploads/test.rtf', $content);
echo $content;
When i check what happened with rtf file content after this code executed, i've noticed that Częstochowa
replaced with Cz\u0119stochowa
.
Then i open a new created rtf file in MS Word and see this Częstochowa
.
After this i decided to write Częstochowa
manually in rtf file and check what happens. I get file content the same way (via file_get_contents) and noticed that MS Word replaced my manually wrote Częstochowa
with Cz\\'eastochowa
. So i decided to do this:
// replacing tagword with relevant content
$content = str_replace('[company_address]', 'Cz\\\'eastochowa', $content);
And after this i open file in MS Word and see this Czêstochowa
Googled a bit and found that ê
is character from Unicode Block “Latin-1 Supplement” (from U+0080 to U+00FF) with code U+00EA
but polish characters are in Unicode Block “Latin Extended-A” (from U+0100 to U+017F), so i need to encode rtf file content to it somehow
I tried a lot of things but still didn't solve the problem.
Hope on Your help. Thanks for attention.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
找到解决办法:
Found a solution: