在 URL 中查找公司名称

发布于 2024-08-10 21:23:17 字数 386 浏览 8 评论 0原文

给定一家知名公司的 URL(例如 http://mcdonalds.com/),您会如何自动并可靠地找到公司名称(在本例中为“Mc Donalds”)?

谢谢

编辑:有人投票结束了这个问题,所以也许我需要解释一下动机。我有一个很大的公司网址列表,我想使用 Google 地图查找有关每个公司的数据。使用公司名称搜索 Google 地图比使用 URL 搜索效果要好得多。

删除“http”和“com”在很多情况下确实有效,特别是对于知名公司,但并非全部。我发现 whois 记录不是很有帮助。

我希望有某种公共数据库可以将公司与 URL 相匹配,但到目前为止还没有遇到过。

given the URL of a well known company (eg http://mcdonalds.com/), how would you automatically and reliably find the company name (in this case "Mc Donalds")?

Thanks

Edit: someone voted to close this question, so maybe I need to explain the motivation. I have a large list of company URLs and I want to find data about each company using Google Maps. And searching Google Maps with the company name works much better than the URL.

Removing 'http' and 'com' does work in a lot of cases, particularly for well known companies, but not all. I found the whois records were not very helpful.

I was hoping there was some kind of public database matching companies to URLs, but haven't come across one so far.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

神回复 2024-08-17 21:23:17

您需要创建自己的查找表:您必须尝试从 URL 处的 html 中解析此信息以获得最准确的数据,例如:获取 Html 页面标题,或查找版权消息?

You would need to create your own Lookup Table: You would have to try and parse this information from the html at the URL for themost accurate data, eg: get the Html page Title, or look for the Copyright message?

渔村楼浪 2024-08-17 21:23:17

他们很可能会将其放在 </code> 元素中。解析它并将其与网站的域名进行比较。如果有明显的重叠,那就是你的匹配。如果没有,请尝试对标题进行一些启发(例如名称是 <code>>></code> 之前的所有内容)。

如果是一家较大的公司,那么您也可能很幸运地查看其域的 NIC 条目(又名 Whois)。

Quite probable they will have it in the <title/> element. Parse this and compare it to the website's domain. If there is a significant overlap, it is your match. If not, try some heuristics on the title (like name is everything before >> or such).

If it is a larger company, then you could also be lucky looking at the NIC entry (aka Whois) for their domain.

木落 2024-08-17 21:23:17

Whois 数据库可能会有所帮助,但总有一些边缘情况需要您付出更多努力来处理。

Whois database may be of some help, though there are always edge cases that you will have to handle with more effort.

沙沙粒小 2024-08-17 21:23:17

如果你想要准确的话,我会说亚马逊机械土耳其人。

If you want to be accurate, I would say amazon mechanical turk.

陈年往事 2024-08-17 21:23:17

另一种选择是使用 API,例如 https://developer .tuxx.co.uk/api-overview/company-name-api。在这里,您可以输入一个 URL,它会提取最可能的公司名称。

Another option would be to use an API, for example https://developer.tuxx.co.uk/api-overview/company-name-api. Here, you can enter an URL and it extracts the most probable company name.

错々过的事 2024-08-17 21:23:17

尝试使用 cURL 和 DOMDocument。

<?php

    $ch = curl_init();
    $site = "http://mcdonalds.com/";
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
    curl_setopt($ch, CURLOPT_URL, $site);
    $result= curl_exec($ch);
    curl_close($ch);        

    $dom = new DOMDocument();
    @$dom->loadHTML($result);
    $title = $dom->getElementsByTagName("title");
    echo $title->item(0)->nodeValue;
    
?>

看一下元标记

Try to use cURL and DOMDocument.

<?php

    $ch = curl_init();
    $site = "http://mcdonalds.com/";
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
    curl_setopt($ch, CURLOPT_URL, $site);
    $result= curl_exec($ch);
    curl_close($ch);        

    $dom = new DOMDocument();
    @$dom->loadHTML($result);
    $title = $dom->getElementsByTagName("title");
    echo $title->item(0)->nodeValue;
    
?>

Take a look at the meta tag <meta name="author" content="McDonald's Corporation" >

清晨说晚安 2024-08-17 21:23:17

您可以使用 whois 信息。应该有一些库可以让您以干净的方式做到这一点。您没有提到您将使用什么类型的技术......

You could use the whois information. There should be libraries to let you do that in a clean way. You didnt mention what type of technology you'll be using...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文