需要从sub-domans删除域域
我试图从cut命令从右到左的最后2个值,
我有一个大约1.1亿个域和子域的大数据库。
就像
yahoo.com
mail.yahoo.com
a.yahoo.com
a.yahoo.co.uk
用简单的话说,我试图从域中删除子域,
echo a.yahoo.aa | cut -d '.' -f 2,3
yahoo.aa
但是当我尝试时,
echo yahoo.aa | cut -d '.' -f 2,3
aa
只会给我aa
所需的输出是
yahoo.com
yahoo.com
yahoo.com
yahoo.co.uk
编辑感谢Anubhava的建议。
tld属性就像
xxxx.xx
xxx.xx
xx.xx
cctld始终具有2个字符。
I am trying to get last 2 values from right to left from cut command
I have a large database for about 110 Million domains and subdomains.
Like
yahoo.com
mail.yahoo.com
a.yahoo.com
a.yahoo.co.uk
In simple words I am trying to remove subdomains from domains
echo a.yahoo.aa | cut -d '.' -f 2,3
yahoo.aa
but when I try
echo yahoo.aa | cut -d '.' -f 2,3
aa
it give me only aa
Required output is
yahoo.com
yahoo.com
yahoo.com
yahoo.co.uk
edit thanks anubhava for suggestion.
a TLD property is like
xxxx.xx
xxx.xx
xx.xx
i.e. a ccTLD always has 2 characters in last.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
长期解决方案,但想到您想做的事情:
可执行文件
domain.awk
:with
domains.lst
file:like this:output:output:
output: output:
Long solution but a think that makes what you want to do:
Executable file
domain.awk
:with
domains.lst
file:Used like that:
Output:
使用您提供的示例输入,并接受您的陈述,即
cctld始终具有最后2个字符。
是您打印最后3个的标准,而不是输入的最后2个段:使用GNU GREP for <代码> -o :
或使用任何尴尬:
Using the sample input you provided and accepting your statement that
a ccTLD always has 2 characters in last.
as being your criteria for printing the last 3 instead of last 2 segments of the input:Using GNU grep for
-o
:or using any awk:
尝试
Try
因此,我建议在此处使用
sed
,让file.txt
contents输出
start
说明:在跨越整行的正则表达式中(
^
- start- ,$
end)我使用单个捕获组,该组包含零或摩尔(*
)非点,然后是字面的点(\。)随后是零或摩尔的非点,与线的结尾相邻,我用该组的内容代替了整行。 免责声明:此解决方案假定每行中始终至少有一个点
(在GNU SED 4.2.2中进行了测试)
Due to this I suggest using
sed
here, letfile.txt
content bethen
output
Explanation: In regular expression spanning whole line (
^
-start,$
-end) I use single capturing group which contain zero-or-more (*
) non-dots followed by literal dot (\.
) followed by zero-or-more non-dots which is adjacent to end of line, I replace whole line with content of that group. Disclaimer: this solution assumes there is always at least one dot in each line(tested in GNU sed 4.2.2)
您仅选择字段2和3。您需要从字段2到末端进行选择:
You are selecting only fields 2 and 3. You need to select from field 2 up to the end: