用Python解析文本文件?

发布于 2024-08-08 10:54:55 字数 299 浏览 3 评论 0原文

我必须做一个作业,其中我有一个包含类似内容的 .txt 文件
p
没有人热爱痛苦本身、追求痛苦并想要 拥有它,只是因为它很痛苦...

h1
这是这个文本文件的另一个例子,

我想编写一个 python 代码来解析这个文本文件并创建 xhtml 文件
我需要为这个项目找到一个起点,因为我对 python 很陌生,对很多东西都不熟悉。
这个Python代码应该从这个文本文件中获取每个“标签”并将它们放入一个xhtml文件中我希望我的要求对你有意义。
非常感谢任何帮助,
预先感谢!
-博扬

I have to do an assignment where i have a .txt file that contains something like this
p
There is no one who loves pain itself, who seeks after it and wants to
have it, simply because it is pain...

h1
this is another example of what this text file looks like

i am suppose to write a python code that parses this text file and creates and xhtml file
I need to find a starting point for this project because i am very new to python and not familiar with alot of this stuff.
This python code is suppose to take each of these "tags" from this text file and put them into an xhtml file I hope that what i ask makes sense to you.
Any help is greatly appreciated,
Thanks in advance!
-bojan

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

伤感在游骋 2024-08-15 10:54:55

你说你对 Python 很陌生,所以我将从非常低的级别开始。您可以在 Python 中非常简单地迭代文件中的行,

fyle = open("contents.txt")
for lyne in fyle :
    # Do string processing here
fyle.close()

现在如何解析它。如果每个格式指令(例如 p、h1)位于单独的行上,您可以轻松检查。我会建立一个处理程序字典并像这样获取处理程序:

handlers= {"p": # p tag handler
           "h1": # h1 tag handler
          }

# ... in the loop
    if lyne.rstrip() in handlers :  # strip to remove trailing whitespace
        # close current handler?
        # start new handler?
    else :
        # pass string to current handler

您可以执行以下操作 Daniel Pryden 建议 首先创建一个内存数据结构,然后序列化该 XHTML。在这种情况下,处理程序将知道如何构建与每个标签相对应的对象。但我认为更简单的解决方案,特别是如果您没有很多时间,您可以直接转到 XHTML,保留当前包含的标签的堆栈。在这种情况下,您的“处理程序”可能只是一些将标签写入输出文件/字符串的简单逻辑。

在不了解您问题的具体情况的情况下,我无法说更多。此外,我不想为你做所有的作业。这应该会给你一个好的开始。

You say you're very new to Python, so I'll start at the very low-level. You can iterate over the lines in a file very simply in Python

fyle = open("contents.txt")
for lyne in fyle :
    # Do string processing here
fyle.close()

Now how to parse it. If each formatting directive (e.g. p, h1), is on a separate line, you can check that easily. I'd build up a dictionary of handlers and get the handler like so:

handlers= {"p": # p tag handler
           "h1": # h1 tag handler
          }

# ... in the loop
    if lyne.rstrip() in handlers :  # strip to remove trailing whitespace
        # close current handler?
        # start new handler?
    else :
        # pass string to current handler

You could do what Daniel Pryden suggested and create an in-memory data structure first, and then serialize that the XHTML. In that case, the handlers would know how to build the objects corresponding to each tag. But I think the simpler solution, especially if you don't have lots of time, you have is just to go straight to XHTML, keeping a stack of the current enclosed tags. In that case your "handler" may just be some simple logic to write the tags to the output file/string.

I can't say more without knowing the specifics of your problem. And besides, I don't want to do all your homework for you. This should give you a good start.

他是夢罘是命 2024-08-15 10:54:55

我不会直接从您描述的文本文件转换为 XHTML 文件,而是首先将其转换为中间内存表示形式。

因此,我将构建类来表示 ph1 标记,然后遍历文本文件并构建这些对象并将它们放入列表中(或者甚至是更复杂的列表)对象,但从文件的外观来看,列表应该足够了)。然后,我将该列表传递给另一个函数,该函数将循环遍历 ph1 对象并将它们输出为 XHTML。

作为额外的好处,我将使每个标记对象(例如 ParagraphHeading1 类)实现一个 as_xhtml() 方法,并委托对该方法的实际格式化。那么 XHTML 输出循环可能类似于:

for tag in input_tags:
    xhtml_file.write(tag.as_xhtml())

Rather than going directly from the text file you describe to an XHTML file, I would transform it into an intermediate in-memory representation first.

So I would build classes to represent the p and h1 tags, and then go through the text file and build those objects and put them into a list (or even a more complex object, but from the looks of your file a list should be sufficient). Then I would pass the list to another function that would loop through the p and h1 objects and output them as XHTML.

As an added bonus, I would make each tag object (say, Paragraph and Heading1 classes) implement an as_xhtml() method, and delegate the actual formatting to that method. Then the XHTML output loop could be something like:

for tag in input_tags:
    xhtml_file.write(tag.as_xhtml())
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文