从 HTML 页面创建 CSV 文件
我从数据库中提取了记录并将它们存储在仅包含文本的 HTML 页面上。每条记录都存储在
段落字段中,并由换行符
和行 hr>. 例如:
Company Name<br/>
555-555-555<br />
Address Line 1<br />
Address Line 2<br />
Website: www.example.com<br />
我只需要把这些记录放入一个CSV文件中。我将 fputcsv 与 array() 和 file_get_contents() 结合使用,但它将网页的整个源代码读取到 .csv 文件中,并且还丢失了大量数据。这些是以相同格式存储的多个记录。因此,在如上所示的整个记录块之后,它由
行标记分隔。我想将公司名称读入“名称”列,将电话号码读入“电话”列,将地址读入“地址”列,将网站读入“网站”列,如下所示。
https://i.sstatic.net/00Gxw.png
我该怎么做?
HTML 的片段:
1 Stop Signs<br />
480-961-7446<br />
500 N. 56th Street<br />
Chandler, AZ 85226<br />
<br />
Website: www.1stopsigns.com<br />
<br />
</p><br /><hr><br />
它在 HTML 源代码中的间隔如下。
I have extracted records from a database and stored them on an HTML page with only text. Each record is stored in a <p>
paragraph field and separated by a line break <br />
and a line <hr>
.
For example:
Company Name<br/>
555-555-555<br />
Address Line 1<br />
Address Line 2<br />
Website: www.example.com<br />
I just need to place these records into a CSV file. I used fputcsv in combination with array() and file_get_contents() but it read my the entire source code of the webpage into a .csv file and alot of data was missing as well. These are multiple records stored in the same format. So after an entire record block as seen above, it is separate by an <hr>
line tag. I want to read the company name into the Name column, the Phone number into the Phone column, the addresses into the Address column and the Website into the Website column as shown below.
https://i.sstatic.net/00Gxw.png
How can i do this?
Snippet of the HTML:
1 Stop Signs<br />
480-961-7446<br />
500 N. 56th Street<br />
Chandler, AZ 85226<br />
<br />
Website: www.1stopsigns.com<br />
<br />
</p><br /><hr><br />
It's spaced like this in the source of the HTML.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
假设您的数据遵循一种模式,其中每条记录均由
标记分隔,并且其中的每个字段均由
分隔,那么您应该能够拆分数据。有很多方法可以做到这一点,但使用
explode()
可能会起作用的一种简单方法可能是这样的:其中
$str
是您正在解析的页面数据。希望这有帮助。编辑
最初没有注意到具体的字段要求。更新了示例。
Assuming that your data follows a pattern where every record is separated by a
<hr>
tag and every field within is separated by a<br />
then you should be able to split out the data.There are loads of ways to do this, but a naive way that might work using
explode()
might be something like:Where
$str
is the page data you are parsing. Hope this helps.EDIT
Didn't notice the specific field requirements originally. Updated the example.
假设上面显示的 html 格式良好,我解决这个问题的方法必须分两个阶段。
第一的。清除一点 html 文本可以更有效地导出或管理信息。在这里尝试清除您想要保存的项目,并删除那些您知道在不久的将来不需要的项目。
然后你将有一个更干净的 html 可以使用,与此类似......
第二。现在,您可以分解字段或将内爆分解为逗号分隔值以形成 csv
现在,您将有两种方法使用 html 来提取字段或导出 csv。
希望这对您有所帮助或给您一个想法来开发您需要的东西。
Assuming the html that shown above is well formed,my approach to this problem must be in 2 phases.
First. Clear a little bit the html text to be more efficient to export or manage the information. Here try to clear the items you want to save and delete those you know you don't want to require in the near future.
Then you'll have a more clean html to work with similar to this....
Second. Now you can explode the fields or make an implode into a comma separate value to form a csv
Now you'll have a two ways to work with the html for extracting the fields or exporting the csv.
Hope this helps or give you an idea to develop what you need.
到目前为止,最简单的方法是简单地获取块,删除
标记中的所有内容,然后将字符串拆分为< 上的字符串数组。 /代码> 标签。
By far the easiest way would be to simply take the block, drop everything from the
<hr>
tag forward then split the string as a string array on the<br />
tags.