如何在 iPhone 应用程序中使用正则表达式以 , (逗号)分隔字符串
我必须读取包含三列的 .csv 文件。在解析 .csv 文件时,我得到以下格式的字符串 Christopher Bass,\"Cry the Beloved Country Final Essay\",[电子邮件受保护]。我想将三列的值存储在一个数组中,所以我使用了 componentSeparatedByString:@","
方法!它成功地向我返回了包含三个组成部分的数组:
- Christopher Bass
- Cry the Beloved Country Final Essay
- [email ;受保护]
但是当有列值中已经有一个逗号,如下所示 克里斯托弗·巴斯,\“哭泣,心爱的国家期末作文\”,[电子邮件]受保护] 它将字符串分成四个部分,因为 Cry:
- Christopher Bass
- Cry
- the Beloved Country Final Essay
- [电子邮件受保护]
那么,如何使用正则表达式来处理这个问题。我有“RegexKitLite”类,但我应该使用哪个正则表达式。请帮忙!
谢谢-
I have to read .csv file which has three columns. While parsing the .csv file, I get the string in this format Christopher Bass,\"Cry the Beloved Country Final Essay\",[email protected]. I want to store the values of three columns in an Array, so I used componentSeparatedByString:@","
method! It is successfully returning me the array with three components:
- Christopher Bass
- Cry the Beloved Country Final Essay
- [email protected]
but when there is already a comma in the column value, like this
Christopher Bass,\"Cry, the Beloved Country Final Essay\",[email protected]
it separates the string in four components because there is a ,(comma) after the Cry:
- Christopher Bass
- Cry
- the Beloved Country Final Essay
- [email protected]
so, How can I handle this by using regular expression. I have "RegexKitLite" classes but which regular expression should I use. Please help!
Thanks-
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
任何正则表达式都可能会出现同样的问题,您需要的是通过转义逗号或通过以下方式突出显示字符串来清理您的条目或字符串:
“My string”
。否则你也会遇到同样的问题。祝你好运。对于您的示例,您可能需要执行以下操作:
这样您就可以使用正则表达式,甚至可以使用
NSString
类中的相同方法。完全不相关,但是清理字符串的重要性:http://xkcd.com/327 /呵呵。
Any regular expression would probably turn out with the same problem, what you need is to sanitize your entries or strings, either by escaping your commas or by highlighting strings this way:
"My string"
. Otherwise you will have the same problem. Good luck.For your example you would probably need to do something like:
That way you could use a regexp or even the same method from the
NSString
class.Not related at all, but the importance of sanitizing strings: http://xkcd.com/327/ hehehe.
怎么样:
这应该在
"
和,
以任意顺序一起出现的地方分割字符串,从而产生一个三成员数组。当然,这假设该字符串始终括在括号内,并且字符"
和,
在三个组件中不会连续出现。如果这些假设中的任何一个不正确,则可以使用其他方法来识别字符串组件,但应该明确的是,不存在通用解决方案。如果三个组成字符串可以在任何地方包含
"
和,
,那么在这种情况下,甚至不可能有有限的解决方案:希望您的 CSV 数据中没有类似上述内容。如果有,数据基本上无法使用,你应该寻找更好的 CSV 导出器。
How about this:
This should split your string whereever
"
and,
appear together in either order, resulting in a three-member array. This of course assumes that the second element in the string is always enclosed in parentheses, and the characters"
and,
never appear consecutively within the three components.If either of these assumptions is incorrect, other methods to identify string components may be used, but it should be made clear that no generic solution exists. If the three component strings can contain
"
and,
anywhere, not even a limited solution is possible in such cases:Hopefully there is nothing like the above in your CSV data. If there is, the data is basically unusable, and you should look into a better CSV exporter.
您要搜索的正则表达式是:
\\"(.*)\\"[ ^,]*|([^,]*),
在 ObjC 中:
(('\ "' && string_1 && '\"' && 0-n 空格) || string_2 除逗号) &&逗号
RegexKitLite 会将两个字符串添加到数组中,因此您的数组最终会得到空对象。
removeObject:@""
将删除这些值,但如果您需要维护真正的空值(例如,您的源有val,,ue
),您必须将代码修改为以下:$1 和 $2 是上面提到的两个字符串,∏ 在这种情况下是一个很可能永远不会出现在普通文本中的字符(并且很容易记住:option-shift-p)。
The regex you're searching for is:
\\"(.*)\\"[ ^,]*|([^,]*),
in ObjC:
(('\"' && string_1 && '\"' && 0-n spaces) || string_2 except comma) && comma
RegexKitLite will add both strings to the array, therefore you will end up with empty objects for your array.
removeObject:@""
will delete those but if you need to maintain true empty values (eg. your source hasval,,ue
) you have to modify the code to the following:$1 and $2 are those two strings mentioned above, ∏ is in this case a character which will most likely never appear in normal text (and is easy to remember: option-shift-p).
最后一部分看起来永远不会包含逗号。据我所知,第一个也不会......像
这样分割字符串怎么样:
这将按原样使用第一个和最后一个字符串,并将其余字符串合并到内容中。
有点像黑客,但姓名和电子邮件地址永远不会包含逗号,对吗?
The last part looks like it will never contain a comma. Neither will the first one as far as I can see...
What about splitting the string like this:
This will use the first and last string as is, and combine the rest into the content.
Kind of a hack, but a name and an email address will never contain a comma, right?
标题是否保证有引号?它是唯一可以拥有它们的组件吗?因为
componentSeparatedByString:@"\""
应该会为您提供:然后使用
componentSeparatedByString:@","
或substringFrom/ToIndex:
摆脱这两个第一个和最后一个组件中的逗号以下是使用子字符串的解决方案:
Is the title guarantied to have the quotation marks? And is it the only component that can have them? Because then
componentSeparatedByString:@"\""
should get you this:Then use
componentSeparatedByString:@","
orsubstringFrom/ToIndex:
to get rid of the two commas in the first and last component.Here's a solution using substring: