正则表达式字符串“preg_replace”
我需要对 CSV 文件的大约 45k 行进行“查找和替换”,然后将其放入数据库中。
我想我应该能够使用 PHP 和 preg_replace 来做到这一点,但似乎无法弄清楚表达式...
这些行由一个字段组成,并且全部采用以下格式:
“./1/024/9780310320241/SPSTANDARD .9780310320241.jpg" 或 "./t/fla/8204909_flat/SPSTANDARD.8204909_flat.jpg"
第一部分始终是句点,第二部分始终是一个字母数字字符,第三部分始终是三个字母数字字符,第四个字符应始终介于 1 到 13 个字母数字字符之间。
我想出了以下似乎是正确的内容,但是我会公开声称对正则表达式一点也不了解,这对我来说有点新!我可能在这里犯了一大堆愚蠢的错误......
$pattern = "/^(\.\/[0-9a-zA-Z]{1}\/[0-9a-zA-Z]{3}\/[0-9a-zA-Z]{1,13}\/)$/";
$new = preg_replace($pattern, " ", $i);
无论如何,感谢所有的帮助!
谢谢, 菲尔
I need to do a "find and replace" on about 45k lines of a CSV file and then put this into a database.
I figured I should be able to do this with PHP and preg_replace but can't seem to figure out the expression...
The lines consist of one field and are all in the following format:
"./1/024/9780310320241/SPSTANDARD.9780310320241.jpg" or "./t/fla/8204909_flat/SPSTANDARD.8204909_flat.jpg"
The first part will always be a period, the second part will always be one alphanumeric character, the third will always be three alphanumeric characters and the fourth should always be between 1 and 13 alphanumeric characters.
I came up with the following which seems to be right however I will openly profess to not knowing very much at all about regular expressions, it's a little new to me! I'm probably making a whole load of silly mistakes here...
$pattern = "/^(\.\/[0-9a-zA-Z]{1}\/[0-9a-zA-Z]{3}\/[0-9a-zA-Z]{1,13}\/)$/";
$new = preg_replace($pattern, " ", $i);
Anyway any and all help appreciated!
Thanks,
Phil
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我遇到的唯一错误是应该删除字符串结尾
$
的锚点。并且您的表达式还缺少_
字符:更通用的模式是仅排除
/
:The only mistake I encouter is the anchor for the string end
$
that should be removed. And your expression is also missing the_
character:A more general pattern would be to just exclude the
/
:您应该使用 PHP 的内置解析器 来提取值在匹配任何模式之前从 csv 中取出。
You should use PHP's builtin parser for extracting the values out of the csv before matching any patterns.
我不确定我明白你在问什么。您的意思是文件中的每一行看起来都是这样,并且您想要处理所有这些行吗?如果是这样,这个正则表达式就可以解决问题:
它简单地匹配直到最后一个斜杠(包括最后一个斜杠)的所有内容,如果不是每个人都在谈论的流氓“$”,那么您的正则表达式就会这样做。如果您想保留其他格式的其他行,则此正则表达式可能会满足您的需求:
请注意我如何将正则表达式分隔符从“/”更改为“#”,这样我就不必转义内部的斜杠。您可以使用几乎任何标点符号作为分隔符(但当然它们必须相同)。
I'm not sure I understand what you're asking. Do you mean every line in the file looks like that, and you want to process all of them? If so, this regex would do the trick:
That simply matches everything up to and including the last slash, which is what your regex would do if it weren't for that rogue '$' everyone's talking about. If there are other lines in other formats that you want to leave alone, this regex will probably suit your needs:
Notice how I changed the regex delimiter from '/' to '#' so I don't have to escape the slashes inside. You can use almost any punctuation character for the delimiters (but of course they both have to be the same).
$
表示字符串的结尾。因此,如果./1/024/9780310320241/
和./t/fla/8204909_flat/
单独在线,您的模式将匹配它们。删除$
,它将匹配字符串的前四个部分,并用空格替换它们。The
$
means the end of the string. So your pattern would match./1/024/9780310320241/
and./t/fla/8204909_flat/
if they were alone on their line. Remove the$
and it will match the first four parts of your string, replacing them with a space.我刚刚看到,您的示例字符串不以 / 结尾,因此您可能应该在末尾将其从模式中删除。文件名中还使用下划线,并且应该在字符类中。
I just saw, that your example string doesn't end with /, so may be you should remove it from your pattern at the end. Also underscore is used in the filename and should be in the character class.