2domain 中文文档教程

发布于 5年前 浏览 61 项目主页 更新于 3年前

2domain

将 URL 拆分为子域、域和顶级域。

由于不同国家和组织对域的处理方式不同,因此将 URL 拆分为子域、域和顶级域部分不是一个简单的正则表达式。 2domain 使用来自 publicsuffix.org 的大型已知顶级域列表来识别不同的部分域的。

该模块在底层使用 trie 数据结构来确保尽可能小的库大小和最快的查找速度。 该库经过压缩和 gzip 压缩后大约有 30KB。 由于 publicsuffix.org 经常更新,数据结构构建在 npm install 上作为 postinstall 挂钩。 如果在该步骤中出现问题,库将回退到发布时已构建的预构建列表。


Installation

npm install 2domain

Updating Database

您可能需要不时更新数据库。 有时数据库下载可能会失败,您可能需要手动下载

node node-modules/2domain/scripts/build-tries.js

Updating Database Strategy

我建议定期从 crontab 更新文件。 如果 mocha 成功测试了构建尝试,则此 repo 返回 0。 所以你的 crontab 可能像 node/scripts/build-tries.js && node/scrpits/write-pre.js 如果 mocha 测试成功,这将下载文件并将其存储到 pre/ 文件夹。 为什么你需要一个 pre/ 文件夹:假设你在执行 build-tries.js 时互联网中断,那么你可能有无效或损坏的数据。 如果出现此类错误,build-tries.js 会自动从 pre/ 中获取内容。 这就是为什么保持 pre/up2date 以便从紧急情况中恢复很重要。

如果服务也长期运行,您可能需要重新启动服务,因为 .json 文件是从 tries 模块读取的,并在服务的生命周期内保存在内存中。 您可能希望定期重新启动长期运行的服务,或者构建一些东西来检测更改并重新启动

Usage

// long subdomains can be handled
expect(parseDomain("some.subdomain.example.co.uk")).to.eql({
    subdomain: "some.subdomain",
    domain: "example",
    tld: "co.uk"
});

// protocols, usernames, passwords, ports, paths, queries and hashes are disregarded
expect(parseDomain("https://user:password@example.co.uk:8080/some/path?and&query#hash")).to.eql({
    subdomain: "",
    domain: "example",
    tld: "co.uk"
});

// unknown top-level domains are ignored
expect(parseDomain("unknown.tld.kk")).to.equal(null);

// invalid urls are also ignored
expect(parseDomain("invalid url")).to.equal(null);
expect(parseDomain({})).to.equal(null);

Introducing custom tlds

// custom top-level domains can optionally be specified
expect(parseDomain("mymachine.local",{ customTlds: ["local"] })).to.eql({
    subdomain: "",
    domain: "mymachine",
    tld: "local"
});

// custom regexps can optionally be specified (instead of customTlds)
expect(parseDomain("localhost",{ customTlds:/localhost|\.local/ })).to.eql({
    subdomain: "",
    domain: "",
    tld: "localhost"
});

应用 customTlds 参数有时会很有帮助

function parseLocalDomains(url) {
    return parseDomain(url, {
        customTlds: /localhost|\.local/
    });
}

expect(parseLocalDomains("localhost")).to.eql({
    subdomain: "",
    domain: "",
    tld: "localhost"
});
expect(parseLocalDomains("mymachine.local")).to.eql({
    subdomain: "",
    domain: "mymachine",
    tld: "local"
});

使用辅助函数

API

parseDomain(url: string, options: ParseOptions): ParsedDomain|null


如果 返回 null url 有一个未知的顶级域名,或者它不是一个有效的 url。

ParseOptions

{
    // A list of custom tlds that are first matched against the url.
    // Useful if you also need to split internal URLs like localhost.
    customTlds: RegExp|Array<string>,

    // There are lot of private domains that act like top-level domains,
    // like blogspot.com, googleapis.com or s3.amazonaws.com.
    // By default, these domains would be split into:
    // { subdomain: ..., domain: "blogspot", tld: "com" }
    // When this flag is set to true, the domain will be split into
    // { subdomain: ..., domain: ..., tld: "blogspot.com" }
    // See also https://github.com/peerigon/parse-domain/issues/4
    privateTlds: boolean - default: false
}

ParsedDomain

{
    tld: string,
    domain: string,
    subdomain: string
}


Tests

cd <project_path>
node_modules/mocha/bin/mocha --recursive -R dot node_modules/2domain/test/

Forked

这个 repo 是从 peerigon 分叉出来的,并进行了修改/扩展

2domain

Splits a URL into sub-domain, domain and the top-level domain.

Since domains are handled differently across different countries and organizations, splitting a URL into sub-domain, domain and top-level-domain parts is not a simple regexp. 2domain uses a large list of known top-level domains from publicsuffix.org to recognize different parts of the domain.

This module uses a trie data structure under the hood to ensure the smallest possible library size and the fastest lookup. The library is roughly 30KB minified and gzipped. Since publicsuffix.org is frequently updated, the data structure is built on npm install as a postinstall hook. If something goes wrong during that step, the library falls back to a prebuilt list that has been built at the time of publishing.


Installation

npm install 2domain

Updating Database

You might need to update the database from time to time. Sometimes database downloading might fail and you might need to download it manually

node node-modules/2domain/scripts/build-tries.js

Updating Database Strategy

I suggest update the files from crontab regularly. This repo returns a 0 if build-tries was successfully tested by mocha. So you might crontab like node <path>/scripts/build-tries.js && node <path>/scrpits/write-pre.js This would download and store the files to pre/ folder if mocha tests were successfull. Why would you need a pre/ folder: imagine you get internet outage while executing build-tries.js then you might have invalid or broken data. build-tries.js automatically taking the content from pre/ in case of such an error. That is why it it important to keep pre/ up2date, to recover from emergency cases.

Probably you would need to restart your services if they run long term too, because the .json files are read from tries modules and kept in memory during lifetime of the service. You might want to keep restarting your long-running services regularly or build something to detect changes and restart

Usage

// long subdomains can be handled
expect(parseDomain("some.subdomain.example.co.uk")).to.eql({
    subdomain: "some.subdomain",
    domain: "example",
    tld: "co.uk"
});

// protocols, usernames, passwords, ports, paths, queries and hashes are disregarded
expect(parseDomain("https://user:password@example.co.uk:8080/some/path?and&query#hash")).to.eql({
    subdomain: "",
    domain: "example",
    tld: "co.uk"
});

// unknown top-level domains are ignored
expect(parseDomain("unknown.tld.kk")).to.equal(null);

// invalid urls are also ignored
expect(parseDomain("invalid url")).to.equal(null);
expect(parseDomain({})).to.equal(null);

Introducing custom tlds

// custom top-level domains can optionally be specified
expect(parseDomain("mymachine.local",{ customTlds: ["local"] })).to.eql({
    subdomain: "",
    domain: "mymachine",
    tld: "local"
});

// custom regexps can optionally be specified (instead of customTlds)
expect(parseDomain("localhost",{ customTlds:/localhost|\.local/ })).to.eql({
    subdomain: "",
    domain: "",
    tld: "localhost"
});

It can sometimes be helpful to apply the customTlds argument using a helper function

function parseLocalDomains(url) {
    return parseDomain(url, {
        customTlds: /localhost|\.local/
    });
}

expect(parseLocalDomains("localhost")).to.eql({
    subdomain: "",
    domain: "",
    tld: "localhost"
});
expect(parseLocalDomains("mymachine.local")).to.eql({
    subdomain: "",
    domain: "mymachine",
    tld: "local"
});


API

parseDomain(url: string, options: ParseOptions): ParsedDomain|null

Returns null if url has an unknown tld or if it's not a valid url.

ParseOptions

{
    // A list of custom tlds that are first matched against the url.
    // Useful if you also need to split internal URLs like localhost.
    customTlds: RegExp|Array<string>,

    // There are lot of private domains that act like top-level domains,
    // like blogspot.com, googleapis.com or s3.amazonaws.com.
    // By default, these domains would be split into:
    // { subdomain: ..., domain: "blogspot", tld: "com" }
    // When this flag is set to true, the domain will be split into
    // { subdomain: ..., domain: ..., tld: "blogspot.com" }
    // See also https://github.com/peerigon/parse-domain/issues/4
    privateTlds: boolean - default: false
}

ParsedDomain

{
    tld: string,
    domain: string,
    subdomain: string
}


Tests

cd <project_path>
node_modules/mocha/bin/mocha --recursive -R dot node_modules/2domain/test/

Forked

This repo was forked from peerigon and modified/extended

    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文