在Python中迭代字典并去除空格

发布于 2024-12-27 15:05:59 字数 334 浏览 1 评论 0原文

我正在使用网络抓取框架 Scrapy,我想知道如何迭代所有似乎在字典中的抓取项目并去除每个项目中的空白。

这是我在项目管道中使用的代码:

for info in item:
   info[info].lstrip()

但此代码不起作用,因为我无法单独选择项目。所以我尝试这样做:

for key, value item.items():
   value[1].lstrip()

第二种方法在一定程度上有效,但问题是我不知道如何循环所有值。

我知道这可能是一个很容易解决的问题,但我似乎找不到它。

I am working with the web scraping framework Scrapy and I am wondering how do I iterate over all of the scraped items which seem to be in a dictionary and strip the white space from each one.

Here is the code I have been playing with in my item pipeline:

for info in item:
   info[info].lstrip()

But this code does not work, because I cannot select items individually. So I tried to do this:

for key, value item.items():
   value[1].lstrip()

This second method works to a degree, but the problem is that I have no idea how then to loop over all of the values.

I know this is probably such an easy fix, but I cannot seem to find it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

木格 2025-01-03 15:05:59

在字典理解中(Python >=2.7 中可用):

clean_d = { k:v.strip() for k, v in d.iteritems()}

Python 3.X:

clean_d = { k:v.strip() for k, v in d.items()}

In a dictionary comprehension (available in Python >=2.7):

clean_d = { k:v.strip() for k, v in d.iteritems()}

Python 3.X:

clean_d = { k:v.strip() for k, v in d.items()}
深陷 2025-01-03 15:05:59

尝试

for k,v in item.items():
   item[k] = v.replace(' ', '')

或按照 Monkut 建议的综合方式:

newDic = {k,v.replace(' ','') for k,v in item.items()}

Try

for k,v in item.items():
   item[k] = v.replace(' ', '')

or in a comprehensive way as suggested by monkut:

newDic = {k,v.replace(' ','') for k,v in item.items()}
心是晴朗的。 2025-01-03 15:05:59

不是问题的直接答案,但我建议您查看项目加载器 和输入/输出处理器。您的大部分清理工作都可以在这里完成。

删除每个条目的示例如下:

class ItemLoader(ItemLoader):

    default_output_processor = MapCompose(unicode.strip)

Not a direct answer to the question, but I would suggest you look at Item Loaders and input/output processors. A lot of your cleanup can be take care of here.

An example which strips each entry would be:

class ItemLoader(ItemLoader):

    default_output_processor = MapCompose(unicode.strip)
旧话新听 2025-01-03 15:05:59

您应该注意的是 lstrip() 返回字符串的副本而不是修改对象。要实际更新字典,您需要将剥离的值分配回该项目。

例如:

for k, v in your_dict.iteritems():
    your_dict[k] = v.lstrip()

请注意 .iteritems() 的使用,它返回一个迭代器而不是键值对列表。这使得它更加高效。

我应该补充一点 Python3,.item() 已更改为返回“views”,因此不需要 .iteritems()

What you should note is that lstrip() returns a copy of the string rather than modify the object. To actually update your dictionary, you'll need to assign the stripped value back to the item.

For example:

for k, v in your_dict.iteritems():
    your_dict[k] = v.lstrip()

Note the use of .iteritems() which returns an iterator instead of a list of key value pairs. This makes it somewhat more efficient.

I should add that in Python3, .item() has been changed to return "views" and so .iteritems() would not be required.

江湖正好 2025-01-03 15:05:59

我使用以下内容。您可以传递任何对象作为参数,包括字符串、列表或字典。

# strip any type of object
def strip_all(x):
  if isinstance(x, str): # if using python2 replace str with basestring to include unicode type
    x = x.strip()
  elif isinstance(x, list):
    x = [strip_all(v) for v in x]
  elif isinstance(x, dict):
    for k, v in x.iteritems():
      x.pop(k)  # also strip keys
      x[ strip_all(k) ] = strip_all(v)
  return x

I use the following. You can pass any object as an argument, including a string, list or dictionary.

# strip any type of object
def strip_all(x):
  if isinstance(x, str): # if using python2 replace str with basestring to include unicode type
    x = x.strip()
  elif isinstance(x, list):
    x = [strip_all(v) for v in x]
  elif isinstance(x, dict):
    for k, v in x.iteritems():
      x.pop(k)  # also strip keys
      x[ strip_all(k) ] = strip_all(v)
  return x
寄居人 2025-01-03 15:05:59

假设您想删除 yourDict 的值,创建一个名为 newDict 的新 dict

newDict = dict(zip(yourDict.keys(), [v.strip() if isinstance(v,str) else v for v in yourDict.values()]))

此代码可以处理多类型值,因此将避免剥离 intfloat 等。

Assuming you would like to strip the values of yourDict creating a new dict called newDict:

newDict = dict(zip(yourDict.keys(), [v.strip() if isinstance(v,str) else v for v in yourDict.values()]))

This code can handle multi-type values, so will avoid stripping int, float, etc.

眼趣 2025-01-03 15:05:59

虽然 @zquare 对这个问题有最好的答案,但我觉得我需要加入一个 Pythonic 方法,该方法也可以解释不是字符串的字典值。请注意,这不是递归,因为它仅适用于一维字典对象。

d.update({k: v.lstrip() for k, v in d.items() if isinstance(v, str) and v.startswith(' ')})

如果该值是字符串并以空格开头,则这会更新原始字典值。

更新:
如果您想使用正则表达式并避免使用开头和结尾。您可以使用这个:

import re
rex = re.compile(r'^\s|\s

如果该值具有前导或尾随空格字符,则此版本会去除。

) d.update({k: v.strip() for k, v in d.items() if isinstance(v, str) and rex.search(v)})

如果该值具有前导或尾随空格字符,则此版本会去除。

Although @zquare had the best answer for this question, I feel I need to chime in with a Pythonic method that will also account for dictionary values that are not strings. This is not recursive mind you, as it only works with one dimensional dictionary objects.

d.update({k: v.lstrip() for k, v in d.items() if isinstance(v, str) and v.startswith(' ')})

This updates the original dictionary value if the value is a string and starts with a space.

UPDATE:
If you want to use Regular Expressions and avoid using starts with and endswith. You can use this:

import re
rex = re.compile(r'^\s|\s

This version strips if the value has a leading or trailing white space character.

) d.update({k: v.strip() for k, v in d.items() if isinstance(v, str) and rex.search(v)})

This version strips if the value has a leading or trailing white space character.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文