解析或提取数据以输入数据库

发布于 2024-12-04 20:07:55 字数 742 浏览 1 评论 0原文

我有以下文本文件:

VERDICT: 
MR. FOREMAN:  Guilty.        
THE COURT:  Accused and, you have been found guilty on the charges as you have heard the Foreman for the jury say.  You are remanded.  I have requested a probation report and you are remanded until sentencing, until the Court receives the probation report. 
THE COURT:  Mr. Foreman and members of the jury, on behalf of the administration of justice   
THE CLERK:  Joh Doe the jury have found you guilty.  Have you anything to say before Her Ladyship, the Judge, proceeds to sentence you?                      
SENTENCE:
THE COURT:  John Doe.

我想使用诸如判决、工头、法庭、书记员、句子等关键字作为标签来将此信息输入数据库。请告诉我如何提取这些单词来创建标签以形成 xml 文档并将其放入数据库中。我一直在使用正则表达式和数据提取进行搜索,但尚未找到任何内容。

I have the following text file:

VERDICT: 
MR. FOREMAN:  Guilty.        
THE COURT:  Accused and, you have been found guilty on the charges as you have heard the Foreman for the jury say.  You are remanded.  I have requested a probation report and you are remanded until sentencing, until the Court receives the probation report. 
THE COURT:  Mr. Foreman and members of the jury, on behalf of the administration of justice   
THE CLERK:  Joh Doe the jury have found you guilty.  Have you anything to say before Her Ladyship, the Judge, proceeds to sentence you?                      
SENTENCE:
THE COURT:  John Doe.

I would like to use the keywords such as verdict, foreman, court, clerk, sentence as tags to enter this information in a database. Please tell me how I can extract these words to create tags to form an xml document to place it in a database. i have been searching using regex and data extraction but I have not found anything as yet.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

平安喜乐 2024-12-11 20:07:55

您有预期标签的列表吗?

  • 如果有,哪一部分不清楚?
    • 只需从 XML 中提取所有相关字符串(使用任何解析器,您没有提到语言,因此无法给出示例)。
    • 应用包含允许标签的正则表达式,如果匹配则添加该标签。
    • PS:如果您有太多标签和/或太多数据而无法处理,则将一个正则表达式/标签应用于每个输入字符串可能不会获得最佳性能。
  • 如果不是,那么我想您应该假设某些单词是标签并添加它们。虽然我不喜欢这个想法(通常我希望用户思考并给我他想要标记他的输入的标签)我能想到的一种方法是列出你不想用作标签的单词列表(例如“and”,“or”,“I”,“we”,...),使用正则表达式替换删除所有这些单词,取剩余单词

Do you have a list of expected tags?

  • If yes, what part is not clear?
    • Just extract all relevant strings from XML (using any parser, you haven't mentioned language so can't give examples).
    • apply regExs that contain the allowed tag and if a match then add the tag.
    • PS: If you have too many tags and/or too much data to deal with applying one regEx/tag to each input string may not be most performant.
  • if no, then I suppose you're expected to assume some words are tags and add them. Though I don't like the idea (usually I would expect the user to think and give me tags he wants to mark his inputs with) one way I can think of is to make a list of words you do NOT want to used as tags (e.g. "and", "or", "I", "we", ...), remove all these words using regEx replace, take remaining word
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文