有条件地迭代列表中的项目的 Pythonic 方法

发布于 2024-10-15 22:15:39 字数 412 浏览 3 评论 0原文

一般来说，编程新手，所以我可能会以错误的方式进行处理。我正在编写一个 lxml 解析器，我想忽略解析器输出中没有内容的 HTML 表行。这就是我所得到的：

for row in doc.cssselect('tr'):
    for cell in row.cssselect('td'):
        sys.stdout.write(cell.text_content() + '\t')
    sys.stdout.write '\n'

write() 的东西是临时的。我想要的是循环仅返回 tr.text_content != '' 的行。所以我想我是在问如何写出我的大脑认为应该是“for a in b if a != x”的内容，但这不起作用。

谢谢！

原文

New to programming in general, so I'm probably going about this the wrong way. I'm writing an lxml parser where I want to omit HTML table rows that have no content from the parser output. This is what I've got:

for row in doc.cssselect('tr'):
    for cell in row.cssselect('td'):
        sys.stdout.write(cell.text_content() + '\t')
    sys.stdout.write '\n'

The write() stuff is temporary. What I want is for the loop to only return rows where tr.text_content != ''. So I guess I'm asking how to write what my brain thinks should be 'for a in b if a != x' but that doesn't work.

Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

烟燃烟灭 2024-10-22 22:15:39

for row in doc.cssselect('tr'):
    cells = [ cell.text_content() for cell in row.cssselect('td') ]
    if any(cells):
        sys.stdout.write('\t'.join(cells) + '\n')

仅当至少有一个单元格包含文本内容时才打印这一行。

for row in doc.cssselect('tr'):
    cells = [ cell.text_content() for cell in row.cssselect('td') ]
    if any(cells):
        sys.stdout.write('\t'.join(cells) + '\n')

prints the line only if there is at least one cell with text content.

回复收藏 0 原文

柳若烟 2024-10-22 22:15:39

重新编辑：

你知道，我真的一点也不喜欢我的回答。我投票赞成了另一个答案，但我喜欢他原来的答案，因为它不仅干净而且不言自明，没有变得“花哨”，这正是我的受害者：

for row in doc.cssselect('tr'):
    for cell in row.cssselect('td'):
        if(cel.text_content() != ''):
            #do stuff here

没有更多优雅的解决方案。

原始风格：

您可以按如下方式转换第二个 for 循环：

[cell for cell in row.cssselect if cell.text_content() != '']

并将其转换为列表理解。这样你就得到了一份预先筛选的名单。您可以进一步查看以下示例：

a = [[1,2],[2,3],[3,4]
newList = [y for x in a for y in x]

它将其转换为 [1, 2, 2, 3, 3, 4]。然后您可以在末尾添加 if 语句来筛选出值。因此，您可以将其减少为一行。

再说一遍，如果您要查看 itertools：

ifilter(lambda x: x.text_content() != '', row.cssselect('td'))

会生成一个可以迭代的迭代器，跳过所有您不需要的项目。

编辑：

在我获得更多反对票之前，如果您使用的是 python 3.0，filter 的工作方式相同。无需导入ifilter。

ReEdit:

You know, I really don't like my answer at all. I voted up the other answer but I liked his original answer because not only was it clean but self explanatory without getting "fancy" which is what I fell victim to:

for row in doc.cssselect('tr'):
    for cell in row.cssselect('td'):
        if(cel.text_content() != ''):
            #do stuff here

there's not much more of an elegant solution.

Original-ish:

You can transform the second for loop as follows:

[cell for cell in row.cssselect if cell.text_content() != '']

and turn it into a list-comprehension. That way you've got a prescreened list. You can take that even farther by looking at the following example:

a = [[1,2],[2,3],[3,4]
newList = [y for x in a for y in x]

which transforms it into [1, 2, 2, 3, 3, 4]. Then you can add in the if statement at the end to screen out values. Hence, you'd reduce that into a single line.

Then again, if you were to look at itertools:

ifilter(lambda x: x.text_content() != '', row.cssselect('td'))

produces an iterator which you can iterate over, skipping all items you don't want.

Edit:

And before I get more downvotes, if you're using python 3.0, filter works the same way. No need to import ifilter.

回复收藏 0 原文

~没有更多了~

关于作者

萌酱

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

有条件地迭代列表中的项目的 Pythonic 方法

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

巷子口的你

微信用户

神妖

鞋纸虽美，但不合脚ㄋ〞

7460852697

ligengkai

友情链接

有条件地迭代列表中的项目的 Pythonic 方法

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

巷子口的你

微信用户

神妖

鞋纸虽美，但不合脚ㄋ〞

7460852697

ligengkai

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。