从公司名称查找网站
我有 6,000 个公司名称的列表(及其总部地址),我需要找到每个公司的网址。我正在考虑使用 Google Web API(显然这需要几天时间,因为每天只允许 1,000 个查询)来执行此操作(搜索“COMPANY_NAME CITY STATE”),然后获取第一个结果。不过我并不是 100% 确定这会起作用,而且我觉得还有更好的方法。我可以用我真正了解的任何语言来完成此操作,C++、Java、PHP、Python。这只需要运行一次。
我将如何使用 WHOIS 来执行此操作?如果我已经知道 URL,我知道该怎么做,但反之则不然(名称到 URL)。如果是私人注册的话我该怎么办?
顺便说一句,这些都是美国企业。
I've got a list of 6,000 company names (along with their headquarters address) and I need to find the web address for each of them. I'm considering using the Google Web API (obviously this will take a few days as only 1,000 queries per day are allowed) to do this(search for "COMPANY_NAME CITY STATE") and then take the first result. However I'm not 100% sure this will work, and I feel like there's a better way. I can do this in any language I know really, C++, Java, PHP, Python. This only has to be run once.
How would I use WHOIS to do this? I know how I would do it if I already knew the URL, but not the other way around(name to URL). And what would I do if it were privately registered?
BTW, these are US businesses.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以使用 WHOIS 而不是 Google API。
You can use WHOIS instead of Google API for it.
使用Amazon Mechanical Turk。它非常适合此类难以自动化且通常需要专人验证的任务。这会花费一点,但应该是可以管理的,具体取决于您想要的结果有多糟糕。
Use Amazons Mechanical Turk. It's perfect for these kinds of tasks which can be hard to automate and typically need a person to validate them. It will cost a little but it should be manageable, depending on how bad you want the results.