php、preg_match、正则表达式、提取特定文本

发布于 2024-09-28 13:30:05 字数 913 浏览 1 评论 0原文

我有一个非常大的 .txt 文件,其中包含我们客户的订单,我需要将其移动到 mysql 数据库中。但是我不知道要使用哪种正则表达式,因为信息差别不大。

-----------------------
4046904


KKKKKKKKKKK
Laura Meyer
MassMutual Life Insurance
153 Vadnais Street

Chicopee, MA 01020
US
413-744-5452
[email protected]...


KKKKKKKKKKK
373074210772222 02/12 6213 NA
-----------------------
4046907


KKKKKKKKKKK
Venkat Talladivedula

6105 West 68th Street

Tulsa, OK 74131
US
9184472611
venkat.talladivedula...


KKKKKKKKKKK
373022121440000 06/11 9344 NA
-----------------------

我尝试了一些方法,但我什至无法提取名称...这是我的努力示例,但没有成功

$htmlContent = file_get_contents("orders.txt");

//print_r($htmlContent);

$pattern = "/KKKKKKKKKKK(.*)\n/s";
preg_match_all($pattern, $htmlContent, $matches);
print_r($matches);
$name = $matches[1][0];
echo $name;

I have a very big .txt file with our clients order and I need to move it in a mysql database . However I don't know what kind of regex to use as the information is not very different .

-----------------------
4046904


KKKKKKKKKKK
Laura Meyer
MassMutual Life Insurance
153 Vadnais Street

Chicopee, MA 01020
US
413-744-5452
[email protected]...


KKKKKKKKKKK
373074210772222 02/12 6213 NA
-----------------------
4046907


KKKKKKKKKKK
Venkat Talladivedula

6105 West 68th Street

Tulsa, OK 74131
US
9184472611
venkat.talladivedula...


KKKKKKKKKKK
373022121440000 06/11 9344 NA
-----------------------

I tried something but I couldn't even extract the name ... here is a sample of my effort with no success


$htmlContent = file_get_contents("orders.txt");

//print_r($htmlContent);

$pattern = "/KKKKKKKKKKK(.*)\n/s";
preg_match_all($pattern, $htmlContent, $matches);
print_r($matches);
$name = $matches[1][0];
echo $name;

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

永言不败 2024-10-05 13:30:05

对于这样的事情,您可能想避免使用正则表达式。由于数据是按行清晰组织的,因此您可以使用 fgets() 重复读取行并以这种方式解析数据。

You may want to avoid regexes for something like this. Since the data is clearly organized by line, you could repeatedly read lines with fgets() and parse the data that way.

记忆で 2024-10-05 13:30:05

您可以使用正则表达式读取此文件,但创建一个可以读取所有字段的正则表达式可能非常复杂。

我建议您逐行读取该文件,并解析每个文件,检测它包含哪种数据。

You could read this file with regex, but it may be quite complicated create a regex that could read all fields.

I recommend that you read this file line by line, and parse each one, detecting which kind of data it contains.

说谎友 2024-10-05 13:30:05

既然您确切地知道数据在哪里(即在哪一行),为什么不直接这样获取呢?

即类似

$htmlContent = file_get_contents("orders.txt");

$arrayofclients = explode("-----------------------",$htmlContent);
$newlinesep = "\r\n";
for($i = 0;i < count($arrayofclients);$i++)
{
$temp = explode($newlinesep,$arrayofclients[i]);
$idnum = $temp[0];
$name = $temp[4];
$houseandstreet = $temp[6];
//etc
}

或简单地使用 fgets() 逐行读取文件 - 类似:

$i = 0;$j = 0;
$file = fopen("orders.txt","r");
$clients = [];
while ($line = fgets($ffile) )
{
    if(line != false)
    {
        $i++;
        switch($i)
        {
        case 2:
            $clients[$j]["idnum"] = $line;
            break;
        case 6:
            $clients[$j]["name"] = $line;
            break;
        //add more cases here for each line up to:
        case 18:
            $j++;
            $i = 0;
            break;
        //there are 18 lines per client if i counted right, so increment $j and reset $i.
        }
    }
}
fclose ($f);

您可以使用正则表达式,但它们对于这种情况有点尴尬。

尼科

As you know exactly where your data is (i.e. which line its on) why not just get it that way?

i.e. something like

$htmlContent = file_get_contents("orders.txt");

$arrayofclients = explode("-----------------------",$htmlContent);
$newlinesep = "\r\n";
for($i = 0;i < count($arrayofclients);$i++)
{
$temp = explode($newlinesep,$arrayofclients[i]);
$idnum = $temp[0];
$name = $temp[4];
$houseandstreet = $temp[6];
//etc
}

or simply read the file line by line using fgets() - something like:

$i = 0;$j = 0;
$file = fopen("orders.txt","r");
$clients = [];
while ($line = fgets($ffile) )
{
    if(line != false)
    {
        $i++;
        switch($i)
        {
        case 2:
            $clients[$j]["idnum"] = $line;
            break;
        case 6:
            $clients[$j]["name"] = $line;
            break;
        //add more cases here for each line up to:
        case 18:
            $j++;
            $i = 0;
            break;
        //there are 18 lines per client if i counted right, so increment $j and reset $i.
        }
    }
}
fclose ($f);

You could use regex's, but they are a bit awkward for this situation.

Nico

红焚 2024-10-05 13:30:05

作为记录,这里是将为您捕获名称的正则表达式。 (速度很可能是一个问题。)

(?<=K{10}\s{2})\K[^\r\n]++(?!\s{2}-)

说明:

(?<=K{10}\s{2})  #Positive lookbehind for KKKKKKKKKK then 2 return/newline characters
\K[^\r\n]++      #Greedily match 1 or more non-return/newline characters
(?!\s{2}-)       #Negative lookahead for return/newline character then dash

这是一个 Regex 演示

您会注意到我的正则表达式模式在正则表达式演示和我的 PHP 演示之间略有变化。可能需要根据环境进行轻微调整以匹配返回/换行符。

这是 php 实现(演示):

if(preg_match_all("/(?<=K{10}\s{2})\K[^\r\n]++(?!\s{2}-)/",$htmlContent,$matches)){
    var_export($matches[0]);   
}else{
    echo "no matches";
}

通过使用 \K在我的模式中,我实际上避免了用括号捕获。这可以将数组大小减少 50%,对于许多项目来说是一个有用的技巧。 \K 基本上表示“从这一点开始全字符串匹配”,因此匹配项进入 $matches 的第一个子数组 (fullstrings, key=0) 而不是生成0 中的全字符串匹配和 1 中的捕获。

输出:

array (
  0 => 'Laura Meyer',
  1 => 'Venkat Talladivedula',
)

For the record, here is the regex that will capture the names for you. (Granted speed very well may be an issue.)

(?<=K{10}\s{2})\K[^\r\n]++(?!\s{2}-)

Explanation:

(?<=K{10}\s{2})  #Positive lookbehind for KKKKKKKKKK then 2 return/newline characters
\K[^\r\n]++      #Greedily match 1 or more non-return/newline characters
(?!\s{2}-)       #Negative lookahead for return/newline character then dash

Here is a Regex Demo.

You will notice that my regex pattern changes slightly between the Regex Demo and my PHP Demo. Slight tweaking depending on environment may be required to match the return / newline characters.

Here is the php implementation (Demo):

if(preg_match_all("/(?<=K{10}\s{2})\K[^\r\n]++(?!\s{2}-)/",$htmlContent,$matches)){
    var_export($matches[0]);   
}else{
    echo "no matches";
}

By using \K in my pattern I avoid actually having to capture with parentheses. This cuts down array size by 50% and is a useful trick for many projects. The \K basically says "start the fullstring match from this point", so the matches go in the first subarray (fullstrings, key=0) of $matches instead of generating a fullstring match in 0 and the capture in 1.

Output:

array (
  0 => 'Laura Meyer',
  1 => 'Venkat Talladivedula',
)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文