Wget 页面标题
是否可以从命令行获取页面的标题?
输入:
$ wget http://bit.ly/rQyhG5 <<code>>
输出:
If it’s broke, fix it right - Keeping it Real Estate. Home
Is it possible to Wget a page's title from the command line?
input:
$ wget http://bit.ly/rQyhG5 <<code>>
output:
If it’s broke, fix it right - Keeping it Real Estate. Home
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
该脚本将为您提供所需的内容:
但是在很多情况下它都会中断,包括页面正文中是否有
...
,或者如果标题位于多行上。这可能会好一点:
但它不适合您的情况,因为您的页面包含以下头部开口:
同样,这可能会更好:
但仍然有方法可以打破它,包括页面中没有头部/标题。
同样,更好的解决方案可能是:
但我相信我们可以找到一种方法来打破它。这就是为什么真正的 xml 解析器是正确的解决方案,但由于您的问题被标记为
shell
,所以上面的内容是我能提供的最好的解决方案。paste
和 2 个sed
可以合并在一个 sed 中,但可读性较差。然而,这个版本的优点是可以处理多行标题:更新:
正如注释中所解释的,上面的最后一个 sed 使用
T
命令,它是一个 GNU 扩展。如果您没有兼容版本,您可以使用:更新 2:
如上所述,在 Mac 上仍然无法工作,请尝试:
和/或
(注意
\
之前code>$ 以避免变量扩展。)它似乎
:next
不喜欢以$
为前缀,这在某些情况下可能是一个问题sed 版本。This script would give you what you need:
But there are lots of situations where it breaks, including if there is a
<title>...</title>
in the body of the page, or if the title is on more than one line.This might be a little better:
but it does not fit your case as your page contains the following head opening:
Again, this might be better:
but there is still ways to break it, including no head/title in the page.
Again, a better solution might be:
but I am sure we can find a way to break it. This is why a true xml parser is the right solution, but as your question is tagged
shell
, the above it the best I can come with.The
paste
and the 2sed
can be merged in a single sed, but is less readable. However, this version has the advantage of working on multi-line titles:Update:
As explain in the comments, the last sed above uses the
T
command which is a GNU extension. If you do not have a compatible version, you can use:Update 2:
As above still not working on Mac, try:
and/or
(Note the
\
before the$
to avoid variable expansion.)It seams that the
:next
does not like to be prefixed by a$
, which could be a problem in some sed version.以下内容将提取 lynx 认为页面标题是什么,从而使您免于所有正则表达式的废话。假设您正在检索的页面对于 lynx 来说足够符合标准,那么这应该不会中断。
The following will pull whatever lynx thinks the title of the page is, saving you from all of the regex nonsense. Assuming the page you are retrieving is standards compliant enough for lynx, this should not break.