在服务器上使用 PHP 解析 HTML 更好,还是在最终用户端使用 JavaScript 解析 HTML 更好?

发布于 2024-10-07 22:04:25 字数 447 浏览 1 评论 0原文

我需要编写一个脚本,该脚本接受一个链接并解析链接页面的 HTML,以提取标题和其他一些数据,例如可能的简短描述,就像链接到 Facebook 上的某些内容时一样。

当用户添加到网站的链接时它将被调用,因此当客户端启动网站时可以看到相当多的点击次数。

我很好奇我应该在服务器端使用 PHP 执行此操作,还是在最终用户端使用 Javascript 执行此操作?我一直在编写背后的逻辑,试图找出标记的哪些区域充满了潜在的内容,这让我想知道如果我继续使用 PHP,负载是否会太大。

客户端只有一台像样的 Web 服务器,我担心解析/分析 HTML 页面可能会产生太大的负载,而我们可以在 Javascript 中完成它并将其外包给添加链接的用户。

关于此事的任何建议或想法都会很棒。谢谢。

编辑:此数据不会直接进入数据库,它用于通过自动填充链接的描述来帮助用户,该链接在存储到数据库之前仍然经过我的定期审查。

I need to write a script that takes a link and parses the HTML of the linked page to pull in the title and a few other pieces of data like potentially a short description much like when you link to something on Facebook.

It will be called when a user adds a link to the site, so could see a decent number of hits when the client launches the site.

I am curious if I should do this on the server side with PHP or the end user side with Javascript? I have been writing the logic behind trying to figure out which areas of the markup are filled with potential content and it made me wonder if the load would be too much if I continue in PHP.

The client has just the one decent web server and I worry parsing/analyzing HTML pages may be too much load where we could do it in Javascript and farm it out to the user adding the link.

Any advice or thoughts on the matter would be awesome. Thank you.

Edit: This data is not going straight into the database, it is used to help the user by auto filling the description of their link which still goes through my regular vetting before being stored to the DB.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

带刺的爱情 2024-10-14 22:04:25

嗯,这是一个简单的方法,因为纯粹使用 JavaScript 从客户端执行此操作根本不是一种选择,因为 同源政策

解析 HTML 并不是一项繁重的任务,您应该可以在 PHP 中完成它。

Well, this is an easy one, because performing this from the client-side purely with JavaScript just plain isn't an option at all due to the same origin policy.

Parsing HTML isn't that heavy of a task, you should be fine doing it in PHP.

浪漫人生路 2024-10-14 22:04:25

我会通过 JavaScript 将其卸载给最终用户,然后您可以使用侦听器将其绑定回服务器。原因很简单:

  • 这是前端而不是后端的帮助程序(值不会直接在后端存储或操作。)
  • 负载比本地化在服务器上更好地分散,而且您可能会给出如果最终用户仅拉取 1 个 URL,而服务器拉取数千个 URL,则会获得更好的用户体验。
  • 前端处理还可以降低恶意代码直接在服务器上执行的可能性。

I would offload this to the end-user via javascript, with a listener you could then bind it back to the server. The reasons why are simple:

  • This is a helper to the front-end not the backend (values aren't stored or manipulated on the backend directly.)
  • The load is better spread around than localized on your server, also you'll probably give a better user experience here if the end-user is only pulling 1 url vs. the server pulling thousands.
  • Processing in the front-end also mitigates the possibility of malicious code being executed directly on your server.
庆幸我还是我 2024-10-14 22:04:25

如果您正在考虑让客户实际获取并获取一些随机站点,用 Javascript 为您解析它,获取标题、描述和其他数据,然后将其提交到您的表单中,您的表单的提交时间将是受制于用户获取该页面的网络连接速度以及解析数据的任何开销(可能很小)。如果您使用 cURL 在服务器端进行操作,那么重点将在于解析文档以获取您需要的内容。最好的速度解决方案可能是让人们输入 URL,用 PHP 将其取回,让 PHP 将其交给 Perl 脚本(它有一些快速的 DOM 解析器),然后为 PERL 脚本取回所需的数据。从个人经验来看,Perl 脚本整天都优于 cURL,而 cURL 通常远远优于 javascript AJAX,因为它的管道比家庭用户更大。

If you're thinking about having the client actually got and fetch some random site, parse it for you in Javascript, grab the title, description and other data and then submit that in your form for you, your form's submit time is going to be held hostage to your user's network connection speed for fetching that page and whatever overhead (likely miniscule) for parsing the data. If you do that server side using cURL, the hit will be in parsing the document for what you need. the best speed solution would probably be to let the person enter the URL, get it back in PHP, have PHP hand it off to a Perl script (which has some wicked fast DOM parsers) and get the required data back for the PERL script. From personal experience, the Perl scripts outperform cURL all day long, and cURL generally outperforms javascript AJAX gets by a wide margin just by nature of being on a bigger pipe than a home user.

暖阳 2024-10-14 22:04:25

你可以两者都做....

1) PHP:

  • checkout HTML DOM Parser,可能会有所帮助
  • ,或者使用 php curl,然后使用 DOMDocument 进行解析

2) JavaScript:

  • 您不必打扰您的服务器(专业人士)
  • 使用 jQuery 解析内容很容易(专业人士)
  • 您需要处理跨域策略(缺点)

You can do both....

1) PHP:

  • checkout HTML DOM Parser, could be helpful
  • or use php curl and then parse with DOMDocument

2) JavaScript:

  • you don't have to bother your server (pro)
  • parsing content with jQuery is easy (pro)
  • you need to handle cross domain policy (cons)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文