用Python解析文本文件?
我必须做一个作业,其中我有一个包含类似内容的 .txt 文件
p
没有人热爱痛苦本身、追求痛苦并想要 拥有它,只是因为它很痛苦...
h1
这是这个文本文件的另一个例子,
我想编写一个 python 代码来解析这个文本文件并创建 xhtml 文件
我需要为这个项目找到一个起点,因为我对 python 很陌生,对很多东西都不熟悉。
这个Python代码应该从这个文本文件中获取每个“标签”并将它们放入一个xhtml文件中我希望我的要求对你有意义。
非常感谢任何帮助,
预先感谢!
-博扬
I have to do an assignment where i have a .txt file that contains something like this
p
There is no one who loves pain itself, who seeks after it and wants to
have it, simply because it is pain...
h1
this is another example of what this text file looks like
i am suppose to write a python code that parses this text file and creates and xhtml file
I need to find a starting point for this project because i am very new to python and not familiar with alot of this stuff.
This python code is suppose to take each of these "tags" from this text file and put them into an xhtml file I hope that what i ask makes sense to you.
Any help is greatly appreciated,
Thanks in advance!
-bojan
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
你说你对 Python 很陌生,所以我将从非常低的级别开始。您可以在 Python 中非常简单地迭代文件中的行,
现在如何解析它。如果每个格式指令(例如 p、h1)位于单独的行上,您可以轻松检查。我会建立一个处理程序字典并像这样获取处理程序:
您可以执行以下操作 Daniel Pryden 建议 首先创建一个内存数据结构,然后序列化该 XHTML。在这种情况下,处理程序将知道如何构建与每个标签相对应的对象。但我认为更简单的解决方案,特别是如果您没有很多时间,您可以直接转到 XHTML,保留当前包含的标签的堆栈。在这种情况下,您的“处理程序”可能只是一些将标签写入输出文件/字符串的简单逻辑。
在不了解您问题的具体情况的情况下,我无法说更多。此外,我不想为你做所有的作业。这应该会给你一个好的开始。
You say you're very new to Python, so I'll start at the very low-level. You can iterate over the lines in a file very simply in Python
Now how to parse it. If each formatting directive (e.g. p, h1), is on a separate line, you can check that easily. I'd build up a dictionary of handlers and get the handler like so:
You could do what Daniel Pryden suggested and create an in-memory data structure first, and then serialize that the XHTML. In that case, the handlers would know how to build the objects corresponding to each tag. But I think the simpler solution, especially if you don't have lots of time, you have is just to go straight to XHTML, keeping a stack of the current enclosed tags. In that case your "handler" may just be some simple logic to write the tags to the output file/string.
I can't say more without knowing the specifics of your problem. And besides, I don't want to do all your homework for you. This should give you a good start.
我不会直接从您描述的文本文件转换为 XHTML 文件,而是首先将其转换为中间内存表示形式。
因此,我将构建类来表示
p
和h1
标记,然后遍历文本文件并构建这些对象并将它们放入列表中(或者甚至是更复杂的列表)对象,但从文件的外观来看,列表应该足够了)。然后,我将该列表传递给另一个函数,该函数将循环遍历p
和h1
对象并将它们输出为 XHTML。作为额外的好处,我将使每个标记对象(例如
Paragraph
和Heading1
类)实现一个as_xhtml()
方法,并委托对该方法的实际格式化。那么 XHTML 输出循环可能类似于:Rather than going directly from the text file you describe to an XHTML file, I would transform it into an intermediate in-memory representation first.
So I would build classes to represent the
p
andh1
tags, and then go through the text file and build those objects and put them into a list (or even a more complex object, but from the looks of your file a list should be sufficient). Then I would pass the list to another function that would loop through thep
andh1
objects and output them as XHTML.As an added bonus, I would make each tag object (say,
Paragraph
andHeading1
classes) implement anas_xhtml()
method, and delegate the actual formatting to that method. Then the XHTML output loop could be something like: