有条件地迭代列表中的项目的 Pythonic 方法

发布于 2024-10-15 22:15:39 字数 412 浏览 3 评论 0原文

一般来说,编程新手,所以我可能会以错误的方式进行处理。我正在编写一个 lxml 解析器,我想忽略解析器输出中没有内容的 HTML 表行。这就是我所得到的:

for row in doc.cssselect('tr'):
    for cell in row.cssselect('td'):
        sys.stdout.write(cell.text_content() + '\t')
    sys.stdout.write '\n'

write() 的东西是临时的。我想要的是循环仅返回 tr.text_content != '' 的行。所以我想我是在问如何写出我的大脑认为应该是“for a in b if a != x”的内容,但这不起作用。

谢谢!

New to programming in general, so I'm probably going about this the wrong way. I'm writing an lxml parser where I want to omit HTML table rows that have no content from the parser output. This is what I've got:

for row in doc.cssselect('tr'):
    for cell in row.cssselect('td'):
        sys.stdout.write(cell.text_content() + '\t')
    sys.stdout.write '\n'

The write() stuff is temporary. What I want is for the loop to only return rows where tr.text_content != ''. So I guess I'm asking how to write what my brain thinks should be 'for a in b if a != x' but that doesn't work.

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

烟燃烟灭 2024-10-22 22:15:39
for row in doc.cssselect('tr'):
    cells = [ cell.text_content() for cell in row.cssselect('td') ]
    if any(cells):
        sys.stdout.write('\t'.join(cells) + '\n')

仅当至少有一个单元格包含文本内容时才打印这一行。

for row in doc.cssselect('tr'):
    cells = [ cell.text_content() for cell in row.cssselect('td') ]
    if any(cells):
        sys.stdout.write('\t'.join(cells) + '\n')

prints the line only if there is at least one cell with text content.

柳若烟 2024-10-22 22:15:39

重新编辑

你知道,我真的一点也不喜欢我的回答。我投票赞成了另一个答案,但我喜欢他原来的答案,因为它不仅干净而且不言自明,没有变得“花哨”,这正是我的受害者:

for row in doc.cssselect('tr'):
    for cell in row.cssselect('td'):
        if(cel.text_content() != ''):
            #do stuff here

没有更多优雅的解决方案。

原始风格

您可以按如下方式转换第二个 for 循环:

[cell for cell in row.cssselect if cell.text_content() != '']

并将其转换为列表理解。这样你就得到了一份预先筛选的名单。您可以进一步查看以下示例:

a = [[1,2],[2,3],[3,4]
newList = [y for x in a for y in x]

它将其转换为 [1, 2, 2, 3, 3, 4]。然后您可以在末尾添加 if 语句来筛选出值。因此,您可以将其减少为一行。

再说一遍,如果您要查看 itertools

ifilter(lambda x: x.text_content() != '', row.cssselect('td'))

会生成一个可以迭代的迭代器,跳过所有您不需要的项目。

编辑

在我获得更多反对票之前,如果您使用的是 python 3.0,filter 的工作方式相同。无需导入ifilter

ReEdit:

You know, I really don't like my answer at all. I voted up the other answer but I liked his original answer because not only was it clean but self explanatory without getting "fancy" which is what I fell victim to:

for row in doc.cssselect('tr'):
    for cell in row.cssselect('td'):
        if(cel.text_content() != ''):
            #do stuff here

there's not much more of an elegant solution.

Original-ish:

You can transform the second for loop as follows:

[cell for cell in row.cssselect if cell.text_content() != '']

and turn it into a list-comprehension. That way you've got a prescreened list. You can take that even farther by looking at the following example:

a = [[1,2],[2,3],[3,4]
newList = [y for x in a for y in x]

which transforms it into [1, 2, 2, 3, 3, 4]. Then you can add in the if statement at the end to screen out values. Hence, you'd reduce that into a single line.

Then again, if you were to look at itertools:

ifilter(lambda x: x.text_content() != '', row.cssselect('td'))

produces an iterator which you can iterate over, skipping all items you don't want.

Edit:

And before I get more downvotes, if you're using python 3.0, filter works the same way. No need to import ifilter.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文