从 HTML 页面创建 CSV 文件

发布于 2025-01-06 23:28:18 字数 1048 浏览 0 评论 0原文

我从数据库中提取了记录并将它们存储在仅包含文本的 HTML 页面上。每条记录都存储在

段落字段中,并由换行符
和行 hr>. 例如:

Company Name<br/>
555-555-555<br />
Address Line 1<br />
Address Line 2<br />
Website: www.example.com<br />

我只需要把这些记录放入一个CSV文件中。我将 fputcsv 与 array() 和 file_get_contents() 结合使用,但它将网页的整个源代码读取到 .csv 文件中,并且还丢失了大量数据。这些是以相同格式存储的多个记录。因此,在如上所示的整个记录​​块之后,它由


行标记分隔。我想将公司名称读入“名称”列,将电话号码读入“电话”列,将地址读入“地址”列,将网站读入“网站”列,如下所示。

https://i.sstatic.net/00Gxw.png
我该怎么做?

HTML 的片段:

            1 Stop Signs<br />
            480-961-7446<br />
500 N. 56th Street<br />
        Chandler, AZ  85226<br />

<br />
                Website: www.1stopsigns.com<br />
            <br />
            </p><br /><hr><br />

它在 HTML 源代码中的间隔如下。

I have extracted records from a database and stored them on an HTML page with only text. Each record is stored in a <p> paragraph field and separated by a line break <br /> and a line <hr>.
For example:

Company Name<br/>
555-555-555<br />
Address Line 1<br />
Address Line 2<br />
Website: www.example.com<br />

I just need to place these records into a CSV file. I used fputcsv in combination with array() and file_get_contents() but it read my the entire source code of the webpage into a .csv file and alot of data was missing as well. These are multiple records stored in the same format. So after an entire record block as seen above, it is separate by an <hr> line tag. I want to read the company name into the Name column, the Phone number into the Phone column, the addresses into the Address column and the Website into the Website column as shown below.

https://i.sstatic.net/00Gxw.png

How can i do this?

Snippet of the HTML:

            1 Stop Signs<br />
            480-961-7446<br />
500 N. 56th Street<br />
        Chandler, AZ  85226<br />

<br />
                Website: www.1stopsigns.com<br />
            <br />
            </p><br /><hr><br />

It's spaced like this in the source of the HTML.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

丶视觉 2025-01-13 23:28:18

假设您的数据遵循一种模式,其中每条记录均由


标记分隔,并且其中的每个字段均由
分隔,那么您应该能够拆分数据。

有很多方法可以做到这一点,但使用 explode() 可能会起作用的一种简单方法可能是这样的:

// open a file pointer to csv
$fp = fopen('records.csv', 'w');

// first, split each record into a separate array element
$records = explode('<hr>', $str);

// then iterate over this array
foreach ($records as $record) {

    // strip tags and trim enclosing whitespace
    $stripped = trim(strip_tags($record));

    // explode by end-of-line
    $fields = explode(PHP_EOL, $stripped);

    // array walk over each field and trim whitespace
    array_walk($fields, function(&$field) {
        $field = trim($field);
    });

    // create row
    $row = array(
        $fields[0], // name
        $fields[1], // phone
        sprintf('%s, %s', $fields[2], $fields[3]), // address
        $fields[6], // web
    );

    // write cleaned array of fields to csv
    fputcsv($fp, $row);
}

// done
fclose($fp);

其中 $str 是您正在解析的页面数据。希望这有帮助。

编辑

最初没有注意到具体的字段要求。更新了示例。

Assuming that your data follows a pattern where every record is separated by a <hr> tag and every field within is separated by a <br /> then you should be able to split out the data.

There are loads of ways to do this, but a naive way that might work using explode() might be something like:

// open a file pointer to csv
$fp = fopen('records.csv', 'w');

// first, split each record into a separate array element
$records = explode('<hr>', $str);

// then iterate over this array
foreach ($records as $record) {

    // strip tags and trim enclosing whitespace
    $stripped = trim(strip_tags($record));

    // explode by end-of-line
    $fields = explode(PHP_EOL, $stripped);

    // array walk over each field and trim whitespace
    array_walk($fields, function(&$field) {
        $field = trim($field);
    });

    // create row
    $row = array(
        $fields[0], // name
        $fields[1], // phone
        sprintf('%s, %s', $fields[2], $fields[3]), // address
        $fields[6], // web
    );

    // write cleaned array of fields to csv
    fputcsv($fp, $row);
}

// done
fclose($fp);

Where $str is the page data you are parsing. Hope this helps.

EDIT

Didn't notice the specific field requirements originally. Updated the example.

路还长,别太狂 2025-01-13 23:28:18

假设上面显示的 html 格式良好,我解决这个问题的方法必须分两个阶段。
第一的。清除一点 html 文本可以更有效地导出或管理信息。在这里尝试清除您想要保存的项目,并删除那些您知道在不久的将来不需要的项目。

$html = preg_replace("|\s{2,}|si"," ",$html); // clear non neccesary spaces
$html = preg_replace("|\n{2,}|si","\n",$html); // convert more return line to only one
$html = preg_replace("|<br />|si","##",$html); // replace those tags with this one

然后你将有一个更干净的 html 可以使用,与此类似......

1 Stop Signs##
480-961-7446##
500 N. 56th Street##
Chandler, AZ  85226##
Website: www.1stopsigns.com##
##
</p>##<hr>##

第二。现在,您可以分解字段或将内爆分解为逗号分隔值以形成 csv

// here you'll have the fields to work with into the array called $csv_parts
$csv_parts = explode("##",$html);

// imploding, so there you have the formatted csv similar to 1 Stop Signs,480-961-7446,..
$csv = implode(",",$csv_parts);

现在,您将有两种方法使用 html 来提取字段或导出 csv。


希望这对您有所帮助或给您一个想法来开发您需要的东西。

Assuming the html that shown above is well formed,my approach to this problem must be in 2 phases.
First. Clear a little bit the html text to be more efficient to export or manage the information. Here try to clear the items you want to save and delete those you know you don't want to require in the near future.

$html = preg_replace("|\s{2,}|si"," ",$html); // clear non neccesary spaces
$html = preg_replace("|\n{2,}|si","\n",$html); // convert more return line to only one
$html = preg_replace("|<br />|si","##",$html); // replace those tags with this one

Then you'll have a more clean html to work with similar to this....

1 Stop Signs##
480-961-7446##
500 N. 56th Street##
Chandler, AZ  85226##
Website: www.1stopsigns.com##
##
</p>##<hr>##

Second. Now you can explode the fields or make an implode into a comma separate value to form a csv

// here you'll have the fields to work with into the array called $csv_parts
$csv_parts = explode("##",$html);

// imploding, so there you have the formatted csv similar to 1 Stop Signs,480-961-7446,..
$csv = implode(",",$csv_parts);

Now you'll have a two ways to work with the html for extracting the fields or exporting the csv.


Hope this helps or give you an idea to develop what you need.

金橙橙 2025-01-13 23:28:18

到目前为止,最简单的方法是简单地获取块,删除


标记中的所有内容,然后将字符串拆分为
< 上的字符串数组。 /代码> 标签。

By far the easiest way would be to simply take the block, drop everything from the <hr> tag forward then split the string as a string array on the <br /> tags.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文