Applescript:清理字符串
我有这个字符串,其中包含我想要删除的非法字符,但我不知道可能存在哪种字符。
我构建了一个我不想被过滤的字符列表,并构建了这个脚本(来自我在网络上找到的另一个脚本)。
on clean_string(TheString)
--Store the current TIDs. To be polite to other scripts.
set previousDelimiter to AppleScript's text item delimiters
set potentialName to TheString
set legalName to {}
set legalCharacters to {"a", "b", "c", "d", "e", "f",
"g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r",
"s", "t", "u", "v", "w", "x", "y", "z", "A", "B", "C", "D", "E",
"F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R",
"S", "T", "U", "V", "W", "X", "Y", "Z", "1", "2", "3", "4", "5",
"6", "7", "8", "9", "0", "?", "+", "-", "Ç", "ç", "á", "Á", "é",
"É", "í", "Í", "ó", "Ó", "ú", "Ú", "â", "Â", "ã", "Ã", "ñ", "Ñ",
"õ", "Õ", "à", "À", "è", "È", "ü", "Ü", "ö", "Ö", "!", "$", "%",
"/", "(", ")", "&", "€", "#", "@", "=", "*", "+", "-", ",", ".",
"–", "_", " ", ":", ";", ASCII character 10, ASCII character 13}
--Whatever you want to eliminate.
--Now iterate through the characters checking them.
repeat with thisCharacter in the characters of potentialName
set thisCharacter to thisCharacter as text
if thisCharacter is in legalCharacters then
set the end of legalName to thisCharacter
log (legalName as string)
end if
end repeat
--Make sure that you set the TIDs before making the
--list of characters into a string.
set AppleScript's text item delimiters to ""
--Check the name's length.
if length of legalName is greater than 32 then
set legalName to items 1 thru 32 of legalName as text
else
set legalName to legalName as text
end if
--Restore the current TIDs. To be polite to other scripts.
set AppleScript's text item delimiters to previousDelimiter
return legalName
end clean_string
问题是这个脚本非常慢并且让我超时。
我正在做的是逐个字符检查并与 legalCharacters 列表进行比较。如果人物有的话就好了。如果没有,请忽略。
有没有一种快速的方法可以做到这一点?
像
“查看 TheString 的每个字符并删除那些不在 legalCharacters 上的字符”
之类的东西?
感谢您的帮助。
I have this string that has illegal chars that I want to remove but I don't know what kind of chars may be present.
I built a list of chars that I want not to be filtered and I built this script (from another one I found on the web).
on clean_string(TheString)
--Store the current TIDs. To be polite to other scripts.
set previousDelimiter to AppleScript's text item delimiters
set potentialName to TheString
set legalName to {}
set legalCharacters to {"a", "b", "c", "d", "e", "f",
"g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r",
"s", "t", "u", "v", "w", "x", "y", "z", "A", "B", "C", "D", "E",
"F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R",
"S", "T", "U", "V", "W", "X", "Y", "Z", "1", "2", "3", "4", "5",
"6", "7", "8", "9", "0", "?", "+", "-", "Ç", "ç", "á", "Á", "é",
"É", "í", "Í", "ó", "Ó", "ú", "Ú", "â", "Â", "ã", "Ã", "ñ", "Ñ",
"õ", "Õ", "à", "À", "è", "È", "ü", "Ü", "ö", "Ö", "!", "$", "%",
"/", "(", ")", "&", "€", "#", "@", "=", "*", "+", "-", ",", ".",
"–", "_", " ", ":", ";", ASCII character 10, ASCII character 13}
--Whatever you want to eliminate.
--Now iterate through the characters checking them.
repeat with thisCharacter in the characters of potentialName
set thisCharacter to thisCharacter as text
if thisCharacter is in legalCharacters then
set the end of legalName to thisCharacter
log (legalName as string)
end if
end repeat
--Make sure that you set the TIDs before making the
--list of characters into a string.
set AppleScript's text item delimiters to ""
--Check the name's length.
if length of legalName is greater than 32 then
set legalName to items 1 thru 32 of legalName as text
else
set legalName to legalName as text
end if
--Restore the current TIDs. To be polite to other scripts.
set AppleScript's text item delimiters to previousDelimiter
return legalName
end clean_string
The problem is that this script is slow as hell and gives me timeout.
What I am doing is checking character by character and comparing against the legalCharacters list. If the character is there, it is fine. If not, ignore.
Is there a fast way to do that?
something like
"look at every char of TheString and remove those that are not on legalCharacters"
?
thanks for any help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您遇到了哪些非 ASCII 字符?你的文件编码是什么?
使用 shell 脚本和 tr、sed 或 perl 来处理文本要高效得多。 OS X 中默认安装了所有语言。
您可以使用带有 tr 的 shell 脚本(如下例所示)来去除回车符,也可以使用 sed 来去除空格(不在下面的示例中):
技术说明 TN2065:在 AppleScript 中执行 shell 脚本
或者,使用 Perl,这将剥离非- 打印字符:
在 SO 中搜索其他使用 tr、sed 和 perl 通过 Applescript 处理文本的示例。或者搜索 MacScripter / AppleScript |论坛
What non-ascii characters are you running into? What is your file encoding?
It's much, much more efficient to use a shell script and tr, sed or perl to process text. All languages are installed by default in OS X.
You can use a shell script with tr (as the example below) to strip returns, and you can also use sed to strip spaces (not in the example below):
Technical Note TN2065: do shell script in AppleScript
Or, with perl, this will strip non-printing characters:
Search around SO for other examples of using tr, sed and perl to process text with Applescript. Or search MacScripter / AppleScript | Forums
另一种 Shell 脚本方法可能是:
使用 sed 删除所有非字母数字字符或空格的内容。更多正则表达式参考此处
Another Shell script method might be:
that uses sed to delete everything that isn't an alphanumeric character, or space. More regex reference here
Applescript 中的迭代总是很慢,并且确实没有更快的方法来解决这些问题。循环登录绝对是减慢速度的有效方法。明智地使用 log 命令。
但是,在您的特定情况下,您有长度限制,并且将长度检查移动到重复循环中可能会大大缩短处理时间(无论文本长度如何,在脚本调试器中运行只需不到一秒):
Iterating in Applescript is always slow, and there really isn't a faster way around these problems. Logging in loops is an absolutely guaranteed way to slow things down. Use the log command judiciously.
In your specific case, however, you have a length limit, and moving the length check into into the repeat loop will potentially cut the processing time down considerably (just under a second to run in Script Debugger regardless of length of text):
BBEdit 或 TextWrangler 在这方面会快得多。下载 TextWrangler(免费),然后打开文件并运行 Text ->;扎普·小魔怪……就在上面。这能满足您的需要吗?如果是的话,喝杯冷饮庆祝一下。如果没有,请尝试 BBEdit(它不是免费的)并根据需要创建一个具有尽可能多的“全部替换”条件的新文本工厂,然后打开文件并在其上运行文本工厂。
BBEdit or TextWrangler will be much, much faster at this. Download TextWrangler (it's free), then open up your file and run Text -> Zap Gremlins... on it. Does that do what you need? If it does, celebrate with a cold beverage. If not, try out BBEdit (it's not free) and create a new Text Factory with as many "Replace All" conditions as you need, then open up your file and run the Text Factory on it.