在 Python 3 中使用格式化标记从纯文本生成 html
我编写了一组 Python 3 脚本来获取格式化文本文件并将数据移动到 SQLite 数据库中。然后,数据库中的数据将用作 PHP 应用程序的一部分。我的文本文件中的数据具有粗体和斜体的格式标记,但浏览器无法理解。格式方案是这样的:
fi:xxxx (italics on the word xxxx (turned off at the word break))
fi:{xxx…xxx} (italics on the word or phrase in the curly brackets {})
fb:xxxx (bold on the word xxxx (turned off at the word break))
fb:{xxx} (bold on the word or phrase in the brackets {})
fv:xxxx (bold on the word xxxx (turned off at the word break))
fv:{xxx…xxx} (bold on the word or phrase in the brackets {})
fn:{xxx…xxx} (no formatting)
我想将源文本的每一行转换为(1.包含字符串的行,使用html标签而不是源格式,2.另一行,包含删除所有格式标记的字符串) 。我需要为每个源行提供一个格式化的和剥离的行,即使该行上没有使用格式标记。在源数据中,不同(或相同)排序的多个格式标记可能会显示在一行中,但您不会找到任何不在该行之前结束的标记。
I have written a set of Python 3 scripts to take a formatted text file and move the data into a SQLite database. The data in the database is then used as a part of a PHP application. The data in my text file has formatting markers for bold and italics, but not in anything intelligible to a browser. The formatting scheme is like this:
fi:xxxx (italics on the word xxxx (turned off at the word break))
fi:{xxx…xxx} (italics on the word or phrase in the curly brackets {})
fb:xxxx (bold on the word xxxx (turned off at the word break))
fb:{xxx} (bold on the word or phrase in the brackets {})
fv:xxxx (bold on the word xxxx (turned off at the word break))
fv:{xxx…xxx} (bold on the word or phrase in the brackets {})
fn:{xxx…xxx} (no formatting)
I would like to convert each line of source text to (1. a line containing the string, using html tags instead of the source formatting and 2. another line, containing the string stripped of all formatting markers). I need a formatted and a stripped line for each source line, even if no formatting markers are used on that line. In the source data, multiple formatting markers of different (or the same) sort may show up in a single line, but you won't find any marker that doesn't end before the line does.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
要格式化括号内的部分,您可以执行以下操作:
这会将“other fb:{bold text} text”转换为“other bold text text”。
然后您可以转换以空格分隔的部分:
如果您想要纯文本,只需替换诸如“”之类的标签和“”与空字符串“”。
如果格式不跨越多行,您可能会获得更好的逐行读取和转换性能:
To format the bracketed sections, you could do something like this:
This will convert "other fb:{bold text} text" to "other bold text text".
Then you could convert the space-separated sections:
If you want plain text just replace the tags such as "<b>" and "</b>" with empty string "".
If the formatting doesn't span multiple lines you will probably get better performance reading and converting line by line with: