php、preg_match、正则表达式、提取特定文本
我有一个非常大的 .txt 文件,其中包含我们客户的订单,我需要将其移动到 mysql 数据库中。但是我不知道要使用哪种正则表达式,因为信息差别不大。
----------------------- 4046904 KKKKKKKKKKK Laura Meyer MassMutual Life Insurance 153 Vadnais Street Chicopee, MA 01020 US 413-744-5452 [email protected]... KKKKKKKKKKK 373074210772222 02/12 6213 NA ----------------------- 4046907 KKKKKKKKKKK Venkat Talladivedula 6105 West 68th Street Tulsa, OK 74131 US 9184472611 venkat.talladivedula... KKKKKKKKKKK 373022121440000 06/11 9344 NA -----------------------
我尝试了一些方法,但我什至无法提取名称...这是我的努力示例,但没有成功
$htmlContent = file_get_contents("orders.txt"); //print_r($htmlContent); $pattern = "/KKKKKKKKKKK(.*)\n/s"; preg_match_all($pattern, $htmlContent, $matches); print_r($matches); $name = $matches[1][0]; echo $name;
I have a very big .txt file with our clients order and I need to move it in a mysql database . However I don't know what kind of regex to use as the information is not very different .
----------------------- 4046904 KKKKKKKKKKK Laura Meyer MassMutual Life Insurance 153 Vadnais Street Chicopee, MA 01020 US 413-744-5452 [email protected]... KKKKKKKKKKK 373074210772222 02/12 6213 NA ----------------------- 4046907 KKKKKKKKKKK Venkat Talladivedula 6105 West 68th Street Tulsa, OK 74131 US 9184472611 venkat.talladivedula... KKKKKKKKKKK 373022121440000 06/11 9344 NA -----------------------
I tried something but I couldn't even extract the name ... here is a sample of my effort with no success
$htmlContent = file_get_contents("orders.txt"); //print_r($htmlContent); $pattern = "/KKKKKKKKKKK(.*)\n/s"; preg_match_all($pattern, $htmlContent, $matches); print_r($matches); $name = $matches[1][0]; echo $name;
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
对于这样的事情,您可能想避免使用正则表达式。由于数据是按行清晰组织的,因此您可以使用 fgets() 重复读取行并以这种方式解析数据。
You may want to avoid regexes for something like this. Since the data is clearly organized by line, you could repeatedly read lines with fgets() and parse the data that way.
您可以使用正则表达式读取此文件,但创建一个可以读取所有字段的正则表达式可能非常复杂。
我建议您逐行读取该文件,并解析每个文件,检测它包含哪种数据。
You could read this file with regex, but it may be quite complicated create a regex that could read all fields.
I recommend that you read this file line by line, and parse each one, detecting which kind of data it contains.
既然您确切地知道数据在哪里(即在哪一行),为什么不直接这样获取呢?
即类似
或简单地使用 fgets() 逐行读取文件 - 类似:
您可以使用正则表达式,但它们对于这种情况有点尴尬。
尼科
As you know exactly where your data is (i.e. which line its on) why not just get it that way?
i.e. something like
or simply read the file line by line using fgets() - something like:
You could use regex's, but they are a bit awkward for this situation.
Nico
作为记录,这里是将为您捕获名称的正则表达式。 (速度很可能是一个问题。)
说明:
这是一个 Regex 演示。
您会注意到我的正则表达式模式在正则表达式演示和我的 PHP 演示之间略有变化。可能需要根据环境进行轻微调整以匹配返回/换行符。
这是 php 实现(演示):
通过使用
\K
在我的模式中,我实际上避免了用括号捕获。这可以将数组大小减少 50%,对于许多项目来说是一个有用的技巧。\K
基本上表示“从这一点开始全字符串匹配”,因此匹配项进入$matches
的第一个子数组 (fullstrings, key=0) 而不是生成0
中的全字符串匹配和1
中的捕获。输出:
For the record, here is the regex that will capture the names for you. (Granted speed very well may be an issue.)
Explanation:
Here is a Regex Demo.
You will notice that my regex pattern changes slightly between the Regex Demo and my PHP Demo. Slight tweaking depending on environment may be required to match the return / newline characters.
Here is the php implementation (Demo):
By using
\K
in my pattern I avoid actually having to capture with parentheses. This cuts down array size by 50% and is a useful trick for many projects. The\K
basically says "start the fullstring match from this point", so the matches go in the first subarray (fullstrings, key=0) of$matches
instead of generating a fullstring match in0
and the capture in1
.Output: