如何恢复损坏的 python “cPickle” 倾倒?

发布于 2024-07-15 12:35:50 字数 2903 浏览 4 评论 0原文

我正在使用 rss2email 将许多 RSS 提要转换为邮件,以便于使用。 也就是说,我使用它是因为它今天以一种可怕的方式崩溃了:在每次运行中,它只给我这个回溯:

Traceback (most recent call last):
  File "/usr/share/rss2email/rss2email.py", line 740, in <module>
    elif action == "list": list()
  File "/usr/share/rss2email/rss2email.py", line 681, in list
    feeds, feedfileObject = load(lock=0)
  File "/usr/share/rss2email/rss2email.py", line 422, in load
    feeds = pickle.load(feedfileObject)
TypeError: ("'str' object is not callable", 'sxOYAAuyzSx0WqN3BVPjE+6pgPU', ((2009, 3, 19, 1, 19, 31, 3, 78, 0), {}))

我能够从这个回溯构建的唯一有用的事实是文件 ~/.rss2email/feeds.dat(其中 rss2email 保存其所有配置和运行时状态)以某种方式被破坏。 显然,rss2email 会读取其状态,并在每次运行时使用 cPickle 将其转储回来。

我什至在巨大的 (>12MB) feeds.dat 文件中找到了包含上面提到的 'sxOYAAuyzSx0WqN3BVPjE+6pgPU' 字符串的行。 在我未经训练的眼睛看来,转储似乎没有被截断或以其他方式损坏。

我可以尝试什么方法来重建文件?

在 Debian/不稳定系统上,Python 版本为 2.5.4。

编辑

Peter Gibson 和 JF Sebastian 建议直接从 pickle 文件,我之前已经尝试过。 显然,一个 Feed 类 需要在 rss2email.py 中定义,所以这是我的脚本:

#!/usr/bin/python

import sys
# import pickle
import cPickle as pickle
sys.path.insert(0,"/usr/share/rss2email")
from rss2email import Feed

feedfile = open("feeds.dat", 'rb')
feeds = pickle.load(feedfile)

“plain”pickle 变体产生以下回溯:

Traceback (most recent call last):
  File "./r2e-rescue.py", line 8, in <module>
    feeds = pickle.load(feedfile)
  File "/usr/lib/python2.5/pickle.py", line 1370, in load
    return Unpickler(file).load()
  File "/usr/lib/python2.5/pickle.py", line 858, in load
    dispatch[key](self)
  File "/usr/lib/python2.5/pickle.py", line 1133, in load_reduce
    value = func(*args)
TypeError: 'str' object is not callable

cPickle 变体产生与调用基本相同的东西 r2e 本身:

Traceback (most recent call last):
  File "./r2e-rescue.py", line 10, in <module>
    feeds = pickle.load(feedfile)
TypeError: ("'str' object is not callable", 'sxOYAAuyzSx0WqN3BVPjE+6pgPU', ((2009, 3, 19, 1, 19, 31, 3, 78, 0), {}))

编辑 2

遵循 JF Sebastian 的建议,放置“printf debug” 到 Feed.__setstate__ 到我的测试脚本中,这些是 Python 退出之前的最后几行。

          u'http:/com/news.ars/post/20080924-everyone-declares-victory-in-smutfree-wireless-broadband-test.html': u'http:/com/news.ars/post/20080924-everyone-declares-victory-in-smutfree-wireless-broadband-test.html'},
 'to': None,
 'url': 'http://arstechnica.com/'}
Traceback (most recent call last):
  File "./r2e-rescue.py", line 23, in ?
    feeds = pickle.load(feedfile)
TypeError: ("'str' object is not callable", 'sxOYAAuyzSx0WqN3BVPjE+6pgPU', ((2009, 3, 19, 1, 19, 31, 3, 78, 0), {}))

使用 python 2.4.4-2 的 Debian/etch 盒子上也会发生同样的事情。

I am using rss2email for converting a number of RSS feeds into mail for easier consumption. That is, I was using it because it broke in a horrible way today: On every run, it only gives me this backtrace:

Traceback (most recent call last):
  File "/usr/share/rss2email/rss2email.py", line 740, in <module>
    elif action == "list": list()
  File "/usr/share/rss2email/rss2email.py", line 681, in list
    feeds, feedfileObject = load(lock=0)
  File "/usr/share/rss2email/rss2email.py", line 422, in load
    feeds = pickle.load(feedfileObject)
TypeError: ("'str' object is not callable", 'sxOYAAuyzSx0WqN3BVPjE+6pgPU', ((2009, 3, 19, 1, 19, 31, 3, 78, 0), {}))

The only helpful fact that I have been able to construct from this backtrace is that the file ~/.rss2email/feeds.dat in which rss2email keeps all its configuration and runtime state is somehow broken. Apparently, rss2email reads its state and dumps it back using cPickle on every run.

I have even found the line containing that 'sxOYAAuyzSx0WqN3BVPjE+6pgPU'string mentioned above in the giant (>12MB) feeds.dat file. To my untrained eye, the dump does not appear to be truncated or otherwise damaged.

What approaches could I try in order to reconstruct the file?

The Python version is 2.5.4 on a Debian/unstable system.

EDIT

Peter Gibson and J.F. Sebastian have suggested directly loading from the
pickle file and I had tried that before. Apparently, a Feed class
that is defined in rss2email.py is needed, so here's my script:

#!/usr/bin/python

import sys
# import pickle
import cPickle as pickle
sys.path.insert(0,"/usr/share/rss2email")
from rss2email import Feed

feedfile = open("feeds.dat", 'rb')
feeds = pickle.load(feedfile)

The "plain" pickle variant produces the following traceback:

Traceback (most recent call last):
  File "./r2e-rescue.py", line 8, in <module>
    feeds = pickle.load(feedfile)
  File "/usr/lib/python2.5/pickle.py", line 1370, in load
    return Unpickler(file).load()
  File "/usr/lib/python2.5/pickle.py", line 858, in load
    dispatch[key](self)
  File "/usr/lib/python2.5/pickle.py", line 1133, in load_reduce
    value = func(*args)
TypeError: 'str' object is not callable

The cPickle variant produces essentially the same thing as calling
r2e itself:

Traceback (most recent call last):
  File "./r2e-rescue.py", line 10, in <module>
    feeds = pickle.load(feedfile)
TypeError: ("'str' object is not callable", 'sxOYAAuyzSx0WqN3BVPjE+6pgPU', ((2009, 3, 19, 1, 19, 31, 3, 78, 0), {}))

EDIT 2

Following J.F. Sebastian's suggestion around putting "printf
debugging" into Feed.__setstate__ into my test script, these are the
last few lines before Python bails out.

          u'http:/com/news.ars/post/20080924-everyone-declares-victory-in-smutfree-wireless-broadband-test.html': u'http:/com/news.ars/post/20080924-everyone-declares-victory-in-smutfree-wireless-broadband-test.html'},
 'to': None,
 'url': 'http://arstechnica.com/'}
Traceback (most recent call last):
  File "./r2e-rescue.py", line 23, in ?
    feeds = pickle.load(feedfile)
TypeError: ("'str' object is not callable", 'sxOYAAuyzSx0WqN3BVPjE+6pgPU', ((2009, 3, 19, 1, 19, 31, 3, 78, 0), {}))

The same thing happens on a Debian/etch box using python 2.4.4-2.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

孤寂小茶 2024-07-22 12:35:50

我如何解决我的问题

pickle.py 的 Perl 端口

遵循 JF Sebastian 关于 pickle 多么简单的评论
格式是,我将 pickle.py 的部分内容移植到 Perl。 一对夫妇
快速正则表达式将是访问我的更快的方式
数据,但我觉得黑客的价值和了解更多的机会
关于 Python 是值得的。 另外,我还有更多的感受
使用 Perl(以及在其中调试代码)比 Python 更舒服。

大部分移植工作(简单类型、元组、列表、字典)
非常简单。 Perl 和 Python 的不同概念
类和对象是迄今为止唯一的问题
需要的不仅仅是习语的简单翻译。 结果是一个模块
称为 Pickle::Parse ,经过一些打磨后将
发表在 CPAN 上。

CPAN 上存在一个名为 Python::Serialise::Pickle 的模块,但我
发现它的解析能力缺乏:它喷出所有调试输出
在这个地方,似乎不支持类/对象。

解析、转换数据、检测流中的实际错误

基于 Pickle::Parse,我尝试解析 feeds.dat 文件。
经过几次迭代修复解析代码中的小错误后,我得到了
与 pickle.py 的原始错误消息非常相似
对象不可调用错误消息:

Can't use string ("sxOYAAuyzSx0WqN3BVPjE+6pgPU") as a subroutine
ref while "strict refs" in use at lib/Pickle/Parse.pm line 489,
<STDIN> line 187102.

哈! 现在我们正处于这样一个时刻,实际数据很可能是这样的:
流已损坏。 另外,我们还知道它在哪里被破坏了。

事实证明,以下序列的第一行是错误的:

g7724
((I2009
I3
I19
I1
I19
I31
I3
I78
I0
t(dtRp62457

“备忘录”中的位置 7724 指向该字符串
“sxOYAAuyzSx0WqN3BVPjE+6pgPU”。 从早期的类似记录来看
流,很明显需要一个 time.struct_time 对象
反而。 后来的所有记录都共享这个错误的指针。 用一个简单的
搜索/替换操作,修复这个问题很简单。

我觉得很讽刺的是我偶然发现了错误的根源
通过 Perl 的功能告诉用户其在输入中的位置
当它死亡时的数据流。

结论

  1. 一旦我有时间,我就会离开rss2email
    自动将其腌制配置/状态混乱转换为
    另一种工具的格式。
  2. pickle.py 需要更有意义的错误消息来告诉用户
    关于数据流的位置(不是其自身的位置)
    代码)哪里出了问题。
  3. pickle.py 部分移植到 Perl 很有趣,而且最终也很有意义。

How I solved my problem

A Perl port of pickle.py

Following J.F. Sebastian's comment about how simple the pickle
format is, I went out to port parts of pickle.py to Perl. A couple
of quick regular expressions would have been a faster way to access my
data, but I felt that the hack value and an opportunity to learn more
about Python would be be worth it. Plus, I still feel much more
comfortable using (and debugging code in) Perl than Python.

Most of the porting effort (simple types, tuples, lists, dictionaries)
went very straightforward. Perl's and Python's different notions of
classes and objects has been the only issue so far where a bit more
than simple translation of idioms was needed. The result is a module
called Pickle::Parse which after a bit of polishing will be
published on CPAN.

A module called Python::Serialise::Pickle existed on CPAN, but I
found its parsing capabilities lacking: It spews debugging output all
over the place and doesn't seem to support classes/objects.

Parsing, transforming data, detecting actual errors in the stream

Based upon Pickle::Parse, I tried to parse the feeds.dat file.
After a few iteration of fixing trivial bugs in my parsing code, I got
an error message that was strikingly similar to pickle.py's original
object not callable error message:

Can't use string ("sxOYAAuyzSx0WqN3BVPjE+6pgPU") as a subroutine
ref while "strict refs" in use at lib/Pickle/Parse.pm line 489,
<STDIN> line 187102.

Ha! Now we're at a point where it's quite likely that the actual data
stream is broken. Plus, we get an idea where it is broken.

It turned out that the first line of the following sequence was wrong:

g7724
((I2009
I3
I19
I1
I19
I31
I3
I78
I0
t(dtRp62457

Position 7724 in the "memo" pointed to that string
"sxOYAAuyzSx0WqN3BVPjE+6pgPU". From similar records earlier in the
stream, it was clear that a time.struct_time object was needed
instead. All later records shared this wrong pointer. With a simple
search/replace operation, it was trivial to fix this.

I find it ironic that I found the source of the error by accident
through Perl's feature that tells the user its position in the input
data stream when it dies.

Conclusion

  1. I will move away from rss2email as soon as I find time to
    automatically transform its pickled configuration/state mess to
    another tool's format.
  2. pickle.py needs more meaningful error messages that tell the user
    about the position of the data stream (not the poision in its own
    code) where things go wrong.
  3. Porting parts pickle.py to Perl was fun and, in the end, rewarding.
爱要勇敢去追 2024-07-22 12:35:50

您是否尝试过使用 cPickle 和 pickle 手动加载 feeds.dat 文件? 如果输出不同,则可能暗示错误。

类似于(从您的主目录):(

import cPickle, pickle
f = open('.rss2email/feeds.dat', 'r')
obj1 = cPickle.load(f)
obj2 = pickle.load(f)

如果 rss2email 未在 ascii 中进行 pickle,您可能需要以二进制模式“rb”打开)。

Pete

编辑:cPickle 和 pickle 给出相同错误的事实表明 feeds.dat 文件是问题所在。 可能是 rss2email 版本之间的 Feed 类发生了变化,如 Ubuntu bug JF Sebastian 链接中所建议的。

Have you tried manually loading the feeds.dat file using both cPickle and pickle? If the output differs it might hint at the error.

Something like (from your home directory):

import cPickle, pickle
f = open('.rss2email/feeds.dat', 'r')
obj1 = cPickle.load(f)
obj2 = pickle.load(f)

(you might need to open in binary mode 'rb' if rss2email doesn't pickle in ascii).

Pete

Edit: The fact that cPickle and pickle give the same error suggests that the feeds.dat file is the problem. Probably a change in the Feed class between versions of rss2email as suggested in the Ubuntu bug J.F. Sebastian links to.

画离情绘悲伤 2024-07-22 12:35:50

听起来 cPickle 的内部结构正在变得混乱。 此线程(http://bytes.com/groups/python/565085-cpickle-problems )看起来可能有线索..

Sounds like the internals of cPickle are getting tangled up. This thread (http://bytes.com/groups/python/565085-cpickle-problems) looks like it might have a clue..

小巷里的女流氓 2024-07-22 12:35:50
  1. 'sxOYAAuyzSx0WqN3BVPjE+6pgPU' 很可能与 pickle 的问题无关
  2. 发布错误回溯(以确定哪个类定义了无法调用的属性(导致 TypeError 的属性) ):

    python -c "导入 pickle; pickle.load(open('feeds.dat'))" 
      

编辑:

将以下内容添加到您的代码并运行(将 stderr 重定向到文件,然后在其上使用 'tail -2' 来打印最后 2 行):

from pprint import pprint
def setstate(self, dict_):
    pprint(dict_, stream=sys.stderr, depth=None)
    self.__dict__.update(dict_)
Feed.__setstate__ = setstate

如果上述内容没有产生有趣的输出,则使用一般故障排除策略:

确认'feeds.dat' 是问题所在:

  • 备份 ~/.rss2email 目录,
  • 将 rss2email 安装到 virtualenv/pip 沙箱中(或使用 zc.buildout)来隔离环境(确保您正在使用主干中的 feedparser.py),
  • 'feeds.dat' 大小大于当前的大小
  • 添加几个 feed,直到 现有 rss2email 安装上的新“feeds.dat”

请参阅 r2e 因 TypeError 而退出Ubuntu 上的 错误。

  1. 'sxOYAAuyzSx0WqN3BVPjE+6pgPU' is most probably unrelated to the pickle's problem
  2. Post an error traceback for (to determine what class defines the attribute that can't be called (the one that leads to the TypeError):

    python -c "import pickle; pickle.load(open('feeds.dat'))"
    

EDIT:

Add the following to your code and run (redirect stderr to file then use 'tail -2' on it to print last 2 lines):

from pprint import pprint
def setstate(self, dict_):
    pprint(dict_, stream=sys.stderr, depth=None)
    self.__dict__.update(dict_)
Feed.__setstate__ = setstate

If the above doesn't yield an interesting output then use general troubleshooting tactics:

Confirm that 'feeds.dat' is the problem:

  • backup ~/.rss2email directory
  • install rss2email into virtualenv/pip sandbox (or use zc.buildout) to isolate the environment (make sure you are using feedparser.py from the trunk).
  • add couple of feeds, add feeds until 'feeds.dat' size is greater than the current. Run some tests.
  • try old 'feeds.dat'
  • try new 'feeds.dat' on existing rss2email installation

See r2e bails out with TypeError bug on Ubuntu.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文