如何在Python中读取具有换行符和制表符的文件到字符串中?

发布于 2024-11-26 09:22:03 字数 699 浏览 2 评论 0原文

我正在尝试读取一个包含制表符和换行符等的文件,并且数据为 JSON 格式。

当我使用 file.read()/readlines() 等读取它时,所有换行符和制表符也会被读取。

我已经尝试过 rstrip() 、 split 等,但徒劳无功,也许我错过了一些东西:

这基本上就是我正在做的事情:

 f = open('/path/to/file.txt')
 line = f.readlines()
 line.split('\n')

这是数据(包括原始选项卡,因此格式很差) :

        {
      "foo": [ {
       "id1" : "1",
   "blah": "blah blah",
       "id2" : "5885221122",
      "bar" : [
              {  
         "name" : "Joe JJ", 
          "info": [                 {
         "custid": "SSN",    
         "type" : "String",             }        ]
        }     ]     }     ]  }

我想知道我们是否可以优雅地忽略它。

也希望使用 json.dumps()

I am trying to read a file which has tabs and newline etc and the data is JSON format.

When I read it using file.read()/readlines() etc, all the newlines and tabs are also read.

I have tried rstrip(), split etc but in vain, maybe I am missing some thing:

Here is essentially what I am doing:

 f = open('/path/to/file.txt')
 line = f.readlines()
 line.split('\n')

This is the data (including the raw tabs, hence the poor formatting):

        {
      "foo": [ {
       "id1" : "1",
   "blah": "blah blah",
       "id2" : "5885221122",
      "bar" : [
              {  
         "name" : "Joe JJ", 
          "info": [                 {
         "custid": "SSN",    
         "type" : "String",             }        ]
        }     ]     }     ]  }

I was wondering if we can ignore it elegantly.

Also hoping to use json.dumps()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

離殇 2024-12-03 09:22:03

如果数据是 json 为什么不直接使用 json.load() 呢?

import json
d = json.load(open('myfile.txt', 'r'))

Why not just use json.load() if the data is json?

import json
d = json.load(open('myfile.txt', 'r'))
野却迷人 2024-12-03 09:22:03

我猜是一个小技巧,效率低下:

f = open("/path/to/file.txt")
lines = f.read().replace("\n", "").replace("\t", "").replace(" ", "")

print lines

A little hack, inefficient I guess:

f = open("/path/to/file.txt")
lines = f.read().replace("\n", "").replace("\t", "").replace(" ", "")

print lines
相权↑美人 2024-12-03 09:22:03

这个结构是从哪里来的?我的哀悼。无论如何,作为开始,您可以尝试以下操作:

cleanedData = re.sub('[\n\t]', '', f.read())

这是强力删除换行符和制表符。它返回的内容可能适合输入到json.loads中。这在很大程度上取决于清除多余的空格和换行符后文件的内容是否实际上是有效的 JSON。

Where did that structure come from? My condolences. Anyway, as a start you might try this:

cleanedData = re.sub('[\n\t]', '', f.read())

That's a brute-force removal of newline and tab characters. What it returns might be suitable for feeding into json.loads. It'll depend greatly on whether or not the contents of the file are actually valid JSON once you clear out the extra white space and line breaks.

优雅的叶子 2024-12-03 09:22:03

如果你想循环每一行,你可以:

for line in open('path/to/file.txt'):
  # Remove whitespace from both ends of line
  line = line.strip()

  # Do whatever you want with line

If you want to loop over each line, you can just:

for line in open('path/to/file.txt'):
  # Remove whitespace from both ends of line
  line = line.strip()

  # Do whatever you want with line
长亭外,古道边 2024-12-03 09:22:03

那么json模块的用法呢?

import json

tmp = json.loads(open("/path/to/file.txt", "r"))

output = open("/path/to/file2.txt", "w")
output.write(json.dumps(tmp, sort_keys=True, indent=4))

What about the usage of the json module?

import json

tmp = json.loads(open("/path/to/file.txt", "r"))

output = open("/path/to/file2.txt", "w")
output.write(json.dumps(tmp, sort_keys=True, indent=4))
甲如呢乙后呢 2024-12-03 09:22:03
$ cat foo.json | python -mjson.tool
Expecting property name: line 11 column 41

"type" : "String", 中的逗号导致 JSON 解码器阻塞。如果不是这个问题,您可以使用 json.load() 直接加载文件。

换句话说,您的 JSON 格式不正确,这意味着您需要在将其提供给 json.loads() 之前执行替换操作。由于无论如何您都需要将文件完全读入字符串才能执行替换操作,因此请使用 json.loads(jsonstr) 而不是 json.load(jsonfilep):

    >>> import json, re
    >>> jsonfilep = open('foo.json')
    >>> jsonstr = re.sub(r'''(["'0-9.]\s*),\s*}''', r'\1}', jsonfilep.read())
    >>> jsonobj = json.loads(jsonstr)
    >>> jsonstr = json.dumps(jsonobj)
    >>> print(jsonstr)
    {"foo": [{"blah": "blah blah", "id2": "5885221122", "bar": [{"info":
    [{"type": "String", "custid": "SSN"}], "name": "Joe JJ"}], "id1": "1"}]}

我只使用了 re 模块,因为它可能发生在任何值、数字或字符串上。

$ cat foo.json | python -mjson.tool
Expecting property name: line 11 column 41

The comma in "type" : "String", is causing the JSON decoder to choke. If it wasn't for that problem, you could use json.load() to load the file directly.

In other words, you have malformed JSON, meaning you'll need to perform a replacement operation before feeding it to json.loads(). Since you'll need to read the file into a string completely to do the replacement operation anyway, use json.loads(jsonstr) instead of json.load(jsonfilep):

    >>> import json, re
    >>> jsonfilep = open('foo.json')
    >>> jsonstr = re.sub(r'''(["'0-9.]\s*),\s*}''', r'\1}', jsonfilep.read())
    >>> jsonobj = json.loads(jsonstr)
    >>> jsonstr = json.dumps(jsonobj)
    >>> print(jsonstr)
    {"foo": [{"blah": "blah blah", "id2": "5885221122", "bar": [{"info":
    [{"type": "String", "custid": "SSN"}], "name": "Joe JJ"}], "id1": "1"}]}

I only used the re module because it could happen for any value, number or string.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文