AWS胶水横梁排除模式功能

发布于 2025-02-09 19:42:28 字数 281 浏览 2 评论 0 原文

我们需要忽略几条路径,同时爬行特定路径。以下是详细信息:

Include Path: s3://dev-bronze/api/sp/reports/xyz/
Exclude Path: brand=abc/client=xxx/**

完整路径:“ s3:// dev-bronze/api/sp/sp/reports/xyz/brand = abc/client = xxx/”

我们想忽略一些客户的数据。因此,我正在使用上面的地球,但似乎不起作用。任何帮助将不胜感激。

We need to ignore a few paths while crawling through a specific path. Below are the details:

Include Path: s3://dev-bronze/api/sp/reports/xyz/
Exclude Path: brand=abc/client=xxx/**

Full path : "s3://dev-bronze/api/sp/reports/xyz/brand=abc/client=xxx/"

We want to ignore a few client's data. So I am using the above glob but it doesn't seem to work. Any help will be highly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

故人爱我别走 2025-02-16 19:42:28

阐明排除模式之间的区别 brand = abc/client = xxx/** brand = abc/client = xxx ** (请注意缺少/)。

排除模式 brand = abc/client = xxx/** 匹配:

s3://dev-bronze/api/sp/reports/xyz/brand=abc/client=xxx/<subfolder1>/file1.txt
s3://dev-bronze/api/sp/reports/xyz/brand=abc/client=xxx/<subfolder2>/file2.txt

此模式将匹配 brand> brand = abc/client = xxx/的所有子文件夹中的对象。

排除模式 brand = abc/client = xxx ** 匹配:

s3://dev-bronze/api/sp/reports/xyz/brand=abc/client=xxx/file1.txt
s3://dev-bronze/api/sp/reports/xyz/brand=abc/client=xxx/file2.txt

此模式将与 brand = abc/client = xxx/中的所有对象匹配。

如果要在 brand = abc/client = xxx/中排除文件,请使用Dublude Pattern brand = abc/client = xxx **

参考:

Clarifying the difference between exclude patterns brand=abc/client=xxx/** and brand=abc/client=xxx** (note the missing /).

Exclude pattern brand=abc/client=xxx/** matches:

s3://dev-bronze/api/sp/reports/xyz/brand=abc/client=xxx/<subfolder1>/file1.txt
s3://dev-bronze/api/sp/reports/xyz/brand=abc/client=xxx/<subfolder2>/file2.txt

This pattern will match objects in all subfolders of brand=abc/client=xxx/.

Exclude pattern brand=abc/client=xxx** matches:

s3://dev-bronze/api/sp/reports/xyz/brand=abc/client=xxx/file1.txt
s3://dev-bronze/api/sp/reports/xyz/brand=abc/client=xxx/file2.txt

This pattern will match all objects in brand=abc/client=xxx/.

If you want to exclude files in brand=abc/client=xxx/, then use the exclude pattern brand=abc/client=xxx**.

Reference: Crawler Properties > Include and Exclude Patterns (AWS)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文