如何处理 whois 数据

发布于 2024-11-05 18:55:56 字数 325 浏览 4 评论 0原文

我需要将 whois 数据放入表中,例如

  • 注册者、
  • 创建日期、
  • 过期日期等。

我有从 whois 服务器提取数据的脚本,但每个域扩展的输出都不同。

例如,对于 .com 域,注册者详细信息以总地址形式出现,对于 .org 域,它以注册者名称、street1、street2、street3 等形式出现

。无法将注册者详细信息提取为一个单元放入数据库中。

我听说如果我们获取 xml 数据,我们就可以提取它,有人可以帮助解决这个问题吗?谢谢!。

I need to put whois data in a table like

  • registrant,
  • created date,
  • expire date etc.

I've the script which is extracting data from whois servers, but the output is different for each domain extensions.

For example, for .com domains registrant details comes as a total address and for .org domains it comes as registrant name,street1,street2,street3 etc.

so i'm not able extract registrant details as a unit to be put in db.

some where i heard if we get as xml data we can able to extract it, can somebody help to get around this? Thanks!.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

玩物 2024-11-12 18:55:56

事实上,问题比这要大得多。

  • 没有统一的请求语法
  • ,也没有定义的功能集
  • 没有定义的答案方案
  • 当地立法使内容不同
  • 没有标准化的错误集
  • 记录的信息质量较差
  • 您必须处理内部化

WHOIS 服务由 RFC3912 定义。这是一个非常基本的请求协议,根本没有定义应答内容的格式。。因此,答案通常反映了包含数据的数据库的格式,并且您可能会为每个数据库得到不同的语法。由于 WHOIS 可用于您想要的任何内容,因此您不能对将得到的答案的格式做出很多假设。但希望您可以收到可解析的内容以及每个请求的类似格式的答案。

因此,您需要为每个服务器开发一个解析逻辑,您必须以非常经验的方式进行。

不过,这里有一些来自 RFC 的开发技巧。

  • 您需要使用 TCP 端口 43 发送请求,并以 CR+LF ASCII 字符结尾的单行

  • 您必须期望 TCP 连接结束仅意味着答案已完成。

特别是关于域名,您可能需要注意,以前对 ASCII 编码的限制使得一些注册人使用 Punycode 对 DNS 系统中的一些(通过示例强调的)字符串进行编码,因此您可能希望能够在 Whois 答案中期待这些字符串如果您在某些回复中遇到同样的情况。自 2003 年以来,国际化域名的存在将要求您支持 unicode 编码。转换名称的算法很复杂,RFC 3490 应该为您提供一些有用的详细信息。

祝你好运 !

Actually the problem is a big larger than that.

  • there is no unified syntax for request
  • nor defined set of capabilities
  • there is no defined scheme for answers
  • local legislations make contents different
  • there is not sandardized error set
  • there is weak quality of the recorded information
  • you must deal with internalization

The WHOIS service is defined by RFC3912. It is a very basic request protocol that does not define the format of answered contents at all. So the answers often reflects the format of the database containing the data and you may get different syntax for each database. Since WHOIS can be use for whatever contents you want, you cannot make many assumptions about the format of answer you will get. Hopefully however, you can expect to receive parseable content, and similarly formatted answers for each request.

So you need to develop a parsing logic for each server which you will have to do in a very empirical manner.

However here a a few tips for your development that come from the RFC.

  • you need to send request using TCP port 43 with a single line ended by CR+LF ASCII characters

  • you must expect TCP end of connection as meaning the answer is finished, only.

About domain names specifically, you might be want to note that formerly restriction to ASCII encoding made some registrants to use Punycode to encode some (accentuated by example) strings in DNS systems, so you might want to be able to expect these in a Whois answer also if you meet in some replies. The existence of Internationalized Domain Names since 2003 will require you to support unicode encoding. Algorithms to converts names are complex, RFC 3490 should give you some useful details about this.

Good luck !

莫言歌 2024-11-12 18:55:56

您需要检测格式并为它们使用不同的正则表达式。或者,正如您提到的,您可以使用 XML 甚至 JSON API
http://whoisxmlapi.com/
http://www.domaintools.com/api/docs/

You need to detect the format ands use different regular expressions for them. alternatively as you mentioned you can use XML or even JSON APIs
http://whoisxmlapi.com/
http://www.domaintools.com/api/docs/

倾听心声的旋律 2024-11-12 18:55:56

您需要扩展数据库和处理才能更好地处理问题。

正如您已经注意到的,远程服务提供的数据采用不同的格式。因此,您需要将获取数据和解析数据的关注点分开,因为这两件事是相互独立的。例如,一个 TLD 的格式可能会随着时间的推移而改变。

因此,首先,您获取每个域的纯文本数据并存储它的元数据:

  • 状态代码的域
  • whois 服务器
  • 时间戳
  • 获取操作响应
  • (如果协议有此),

然后您可以稍后在第二个处理中进行解析。您可以使用已经存在的元数据来决定您需要哪种解析算法。这也可以帮助您随着时间的推移维护您的应用程序。

解析正确后,您就得到了您想要的标准化格式。

除了这些技术处理之外,您还应该注意 whois 服务提供的使用条件。并非所有技术上可行的事情都在法律或道德上被接受。妥善保管并尊重他人的个人记录。保护您收集的数据,例如归档和加扰/锁定您在正在进行的处理中不再需要的数据。

另请参阅:

You need to extend your database and processing to better deal with the problem.

The data provided by the remote service is in different format as you've already noted. So you need to separate the concerns of fetching the data and parsing it, because both things are independent to each other. For example, the format for one TLD can change over time.

So first of all you fetch the plain text data per domain and store it's meta-data:

  • domain
  • whois server
  • timestamp of fetch operation
  • response
  • status code (if the protocol has this)

You can then later on within a second processing do the parsing. You can use the metadata that already exists to decide which parsing algorithm you need. That helps you to maintain your application over time as well.

After parsing went right, you've got the normalized format which is what you aim for.

Next to these technical processings, you should take care of the usage conditions offered by the whois service(s). Not everything that is technically possible, is legally or morally accepted. Take care and treat other persons personal records with the respect this deserves. Protect the data you collect, e.g. archive and scramble / lock-away data you don't need any longer for your on going processing.

See as well:

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文