如何使用 Applescript 创建和写入 UTF-16 文本文件?

发布于 2024-10-16 22:12:54 字数 2310 浏览 4 评论 0原文

我正在编写一个 Applescript 来解析 iOS 本地化 文件 (/en.lproj/Localizes.strings),翻译值并输出以 UTF-16 (Unicode) 编码将 (/fr.lproj/Localized.strings) 翻译到磁盘。

由于某种原因,生成的文件在每个字母之间有一个额外的空格。经过一番挖掘,我在学习 AppleScript:脚本编写综合指南中找到了问题的原因。

“如果您不小心读取了 UTF-16 文件 作为 MacRoman,结果值可能 乍一看像普通人 字符串,特别是如果它包含 英文文本。你很快就会发现 当你的时候,有些事情是非常错误的 但是尝试使用它:一个常见的 症状是每个可见的字符 在你的“字符串”中似乎有一个 前面是看不见的字符。 例如,读取UTF-16编码的 包含短语“Hello World!" 作为字符串生成一个字符串 就像“你好世界!”,其中 每个“”实际上是一个不可见的ASCII 0 个字符。”

例如,我的英语本地化字符串文件具有:

"Yes" = "Yes";

生成的法语本地化字符串文件具有:

 " Y e s "  =  " O u i " ;

这是我的 createFile 方法:

on createFile(fileFolder, fileName)
    tell application "Finder"
        if (exists file fileName of folder fileFolder) then
            set the fileAccess to open for access file fileName of folder fileFolder with write permission
            set eof of fileAccess to 0
            write ((ASCII character 254) & (ASCII character 255)) to fileAccess starting at 0
            --write «data rdatFEFF» to fileAccess starting at 0
            close access the fileAccess
        else
            set the filePath to make new file at fileFolder with properties {name:fileName}
            set the fileAccess to open for access file fileName of folder fileFolder with write permission
            write ((ASCII character 254) & (ASCII character 255)) to fileAccess starting at 0
            --write «data rdatFEFF» to fileAccess starting at 0
            close access the fileAccess
        end if
        return file fileName of folder fileFolder as text
    end tell
end createFile

这是我的 writeFile 方法:

on writeFile(filePath, newLine)
    tell application "Finder"
        try
            set targetFileAccess to open for access file filePath with write permission
            write newLine to targetFileAccess as Unicode text starting at eof
            close access the targetFileAccess
            return true
        on error
            try
                close access file filePath
            end try
            return false
        end try
    end tell
end writeFile

知道我做错了什么吗?

I'm writing an Applescript to parse an iOS Localization file (/en.lproj/Localizable.strings), translate the values and output the translation (/fr.lproj/Localizable.strings) to disk in UTF-16 (Unicode) encoding.

For some reason, the generated file has an extra space between every letter. After some digging, I found the cause of the problem in Learn AppleScript: The Comprehensive Guide to Scripting.

"If you accidently read a UTF-16 file
as MacRoman, the resulting value may
look at first glance like an ordinary
string, especially if it contains
English text. You'll quickly discover
that something is very wrong when you
try to use it, however: a common
symptom is that each visible character
in your "string" seems to have an
invisible character in front of it.
For example, reading a UTF-16 encoded
text file containing the phrase "Hello
World!" as a string produces a string
like " H e l l o W o r l d ! ", where
each " " is really an invisible ASCII
0 character."

So for example my English localization string file has:

"Yes" = "Yes";

And the generated French localization string file has:

 " Y e s "  =  " O u i " ;

Here is my createFile method:

on createFile(fileFolder, fileName)
    tell application "Finder"
        if (exists file fileName of folder fileFolder) then
            set the fileAccess to open for access file fileName of folder fileFolder with write permission
            set eof of fileAccess to 0
            write ((ASCII character 254) & (ASCII character 255)) to fileAccess starting at 0
            --write «data rdatFEFF» to fileAccess starting at 0
            close access the fileAccess
        else
            set the filePath to make new file at fileFolder with properties {name:fileName}
            set the fileAccess to open for access file fileName of folder fileFolder with write permission
            write ((ASCII character 254) & (ASCII character 255)) to fileAccess starting at 0
            --write «data rdatFEFF» to fileAccess starting at 0
            close access the fileAccess
        end if
        return file fileName of folder fileFolder as text
    end tell
end createFile

And here is my writeFile method:

on writeFile(filePath, newLine)
    tell application "Finder"
        try
            set targetFileAccess to open for access file filePath with write permission
            write newLine to targetFileAccess as Unicode text starting at eof
            close access the targetFileAccess
            return true
        on error
            try
                close access file filePath
            end try
            return false
        end try
    end tell
end writeFile

Any idea what I'm doing wrong?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

烦人精 2024-10-23 22:12:54

这是我用来读取和写入 UTF16 的处理程序。您不需要单独的“创建文件”处理程序。如果文件不存在,写入处理程序将创建该文件。将“appendText”变量设置为 true 或 false。 False 表示覆盖文件,true 表示将新文本添加到文件中当前文本的末尾。我希望这有帮助。

on writeTo_UTF16(targetFile, theText, appendText)
    try
        set targetFile to targetFile as text
        set openFile to open for access file targetFile with write permission
        if appendText is false then
            set eof of openFile to 0
            write (ASCII character 254) & (ASCII character 255) to openFile starting at eof -- UTF-16 BOM
        else
            tell application "Finder" to set fileExists to exists file targetFile
            if fileExists is false then
                set eof of openFile to 0
                write (ASCII character 254) & (ASCII character 255) to openFile starting at eof -- UTF-16 BOM
            end if
        end if
        write theText to openFile starting at eof as Unicode text
        close access openFile
        return true
    on error theError
        try
            close access file targetFile
        end try
        return theError
    end try
end writeTo_UTF16

on readFrom_UTF16(targetFile)
    try
        set targetFile to targetFile as text
        targetFile as alias -- if file doesn't exist then you get an error
        set openFile to open for access file targetFile
        set theText to read openFile as Unicode text
        close access openFile
        return theText
    on error
        try
            close access file targetFile
        end try
        return false
    end try
end readFrom_UTF16

Here's the handlers I use to read and write as UTF16. You don't need a separate "create file" handler. The write handler will create the file if it doesn't exist. Set the "appendText" variable to true or false. False means overwrite the file and true means add the new text to the end of the current text in the file. I hope this helps.

on writeTo_UTF16(targetFile, theText, appendText)
    try
        set targetFile to targetFile as text
        set openFile to open for access file targetFile with write permission
        if appendText is false then
            set eof of openFile to 0
            write (ASCII character 254) & (ASCII character 255) to openFile starting at eof -- UTF-16 BOM
        else
            tell application "Finder" to set fileExists to exists file targetFile
            if fileExists is false then
                set eof of openFile to 0
                write (ASCII character 254) & (ASCII character 255) to openFile starting at eof -- UTF-16 BOM
            end if
        end if
        write theText to openFile starting at eof as Unicode text
        close access openFile
        return true
    on error theError
        try
            close access file targetFile
        end try
        return theError
    end try
end writeTo_UTF16

on readFrom_UTF16(targetFile)
    try
        set targetFile to targetFile as text
        targetFile as alias -- if file doesn't exist then you get an error
        set openFile to open for access file targetFile
        set theText to read openFile as Unicode text
        close access openFile
        return theText
    on error
        try
            close access file targetFile
        end try
        return false
    end try
end readFrom_UTF16
So要识趣 2024-10-23 22:12:54

如果您在每个字符之间获得实际空格,则您的代码中可能会出现“(someText 的字符 i 到 j) 作为字符串”反模式 [1]。这会将字符串拆分为字符列表,然后将其强制返回字符串,并在每个字符之间插入当前文本项分隔符。获取子字符串的正确(即快速且安全)方法是:“text i thru j of someText”(p179-181)。

OTOH,如果您在每个字符之间出现不可见字符 [2],那么是的,这将是一个编码问题,通常使用 MacRoman 或其他单字节编码读取 UTF16 编码文件。如果您的文件具有有效的字节顺序标记,那么任何精通 Unicode 的文本编辑器都应该使用正确的编码来读取它。


[1] p179 指出这个惯用语是不安全的,但忘记提供它引起的问题的实际演示。 [3]

[2] IIRC p501 上的示例旨在使用矩形符号来表示不可见字符,即“⃞H⃞e⃞l⃞l⃞o”而不是“Hell o”,但结果并非如此,因此可能会被误读为可见空间。 [3]

[3] 请随时向 Apress 提交勘误表。

If you're getting actual spaces between every character, you've probably got the '(characters i thru j of someText) as string' anti-pattern in your code [1]. That will split a string into a list of characters, then coerce it back into a string with your current text item delimiter inserted between each character. The correct (i.e. fast and safe) way to get a sub-string is this: 'text i thru j of someText' (p179-181).

OTOH, if you are getting invisible characters between each character [2], then yes, that'll be an encoding issue, typically reading a UTF16-encoded file using MacRoman or other single-byte encoding. If your file has a valid Byte Order Mark then any Unicode-savvy text editor should read it using the correct encoding.


[1] p179 states that this idiom is unsafe, but forgets to provide a practical demonstration of the problems it causes. [3]

[2] IIRC the example on p501 was meant to use rectangle symbols to represent invisible characters, i.e. "⃞H⃞e⃞l⃞l⃞o" not " H e l l o", but didn't come out quite that way so might be misread as meaning visible spaces. [3]

[3] Feel free to submit errata to Apress.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文