在Python中迭代字典并去除空格
我正在使用网络抓取框架 Scrapy,我想知道如何迭代所有似乎在字典中的抓取项目并去除每个项目中的空白。
这是我在项目管道中使用的代码:
for info in item:
info[info].lstrip()
但此代码不起作用,因为我无法单独选择项目。所以我尝试这样做:
for key, value item.items():
value[1].lstrip()
第二种方法在一定程度上有效,但问题是我不知道如何循环所有值。
我知道这可能是一个很容易解决的问题,但我似乎找不到它。
I am working with the web scraping framework Scrapy and I am wondering how do I iterate over all of the scraped items which seem to be in a dictionary and strip the white space from each one.
Here is the code I have been playing with in my item pipeline:
for info in item:
info[info].lstrip()
But this code does not work, because I cannot select items individually. So I tried to do this:
for key, value item.items():
value[1].lstrip()
This second method works to a degree, but the problem is that I have no idea how then to loop over all of the values.
I know this is probably such an easy fix, but I cannot seem to find it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
在字典理解中(Python >=2.7 中可用):
Python 3.X:
In a dictionary comprehension (available in Python >=2.7):
Python 3.X:
尝试
或按照 Monkut 建议的综合方式:
Try
or in a comprehensive way as suggested by monkut:
不是问题的直接答案,但我建议您查看项目加载器 和输入/输出处理器。您的大部分清理工作都可以在这里完成。
删除每个条目的示例如下:
Not a direct answer to the question, but I would suggest you look at Item Loaders and input/output processors. A lot of your cleanup can be take care of here.
An example which strips each entry would be:
您应该注意的是
lstrip()
返回字符串的副本而不是修改对象。要实际更新字典,您需要将剥离的值分配回该项目。例如:
请注意
.iteritems()
的使用,它返回一个迭代器而不是键值对列表。这使得它更加高效。我应该补充一点 Python3,
.item()
已更改为返回“views”,因此不需要.iteritems()
。What you should note is that
lstrip()
returns a copy of the string rather than modify the object. To actually update your dictionary, you'll need to assign the stripped value back to the item.For example:
Note the use of
.iteritems()
which returns an iterator instead of a list of key value pairs. This makes it somewhat more efficient.I should add that in Python3,
.item()
has been changed to return "views" and so.iteritems()
would not be required.我使用以下内容。您可以传递任何对象作为参数,包括字符串、列表或字典。
I use the following. You can pass any object as an argument, including a string, list or dictionary.
假设您想删除
yourDict
的值,创建一个名为newDict
的新dict
:此代码可以处理多类型值,因此将避免剥离
int
、float
等。Assuming you would like to strip the values of
yourDict
creating a newdict
callednewDict
:This code can handle multi-type values, so will avoid stripping
int
,float
, etc.虽然 @zquare 对这个问题有最好的答案,但我觉得我需要加入一个 Pythonic 方法,该方法也可以解释不是字符串的字典值。请注意,这不是递归,因为它仅适用于一维字典对象。
如果该值是字符串并以空格开头,则这会更新原始字典值。
更新:
如果您想使用正则表达式并避免使用开头和结尾。您可以使用这个:
如果该值具有前导或尾随空格字符,则此版本会去除。
Although @zquare had the best answer for this question, I feel I need to chime in with a Pythonic method that will also account for dictionary values that are not strings. This is not recursive mind you, as it only works with one dimensional dictionary objects.
This updates the original dictionary value if the value is a string and starts with a space.
UPDATE:
If you want to use Regular Expressions and avoid using starts with and endswith. You can use this:
This version strips if the value has a leading or trailing white space character.