有哪些技术可用于格式化、结构化数据输入和输出?

发布于 2024-09-18 10:43:17 字数 742 浏览 3 评论 0原文

我正在这里开展一个项目,该项目收集我公司人员的内部简历,从中剔除技能和相关内容并将其存储在数据库中。这一切都是使用 docx4j 和 Grails 完成的。这要求首先通过模板提交简历,该模板将所有内容格式化得恰到好处,以便摄取工具知道要寻找什么来剥离数据。

第二部分是如果我们想从数据库中获取“简化的”简历。换句话说,我想搜索我现在拥有的上传内容,并且只为有Java编程经验的人打印新的简历。所以我可以进入我的数据库,找到最初拥有java技能的人,并输出一组新的简历,这些简历也仍然是很好的模板格式,并且其中只有相关信息,而不是所有内容。

我一直在用Java编写一些软件来执行此操作,这些软件基本上会使用docx模板,覆盖customXML中绑定到文档中内容控件的项目,因此新数据会显示出来,并且可以保存为新的docx那个自定义数据。

这对我来说确实很麻烦,并且有一些限制。首先,假设我的模板有 3 项技能,而该人有 8 项技能。除了煞费苦心地插入带有所有格式化 XML 标签等的数据之外,似乎没有什么好方法可以将这 5 项附加技能添加到 docx 中。这是一个真正的痛苦,因为如果模板发生变化,我不想返回到我的软件并编辑源代码以将附加数据输入 XML 标记更改为粗体而不是斜体。

我正在阅读有关使用 Infopath 创建一个表单的信息,我可以使用该表单来获取输入、连接到某些共享点数据源或用于存储剥离数据的数据源。但是,我似乎无法找出是否可以使用 sharepoint 以良好的格式化方式获取数据。这样做的一般步骤是什么?似乎我无法通过任何快速谷歌搜索找到很多关于这个主题的信息。

谢谢

I am working on a project here that ingests internal resumes from people at my company, strips out the skills and relevant content from them and stores it in a database. This was all done using docx4j and Grails. This required the resumes to first be submitted via a template that formatted everything just right so that the ingest tool knew what to look for to strip the data.

The 2nd portion of this, is what if we want to get out a "reduced" resume from the database. In other words, I want to search the uploaded content I now have, and only print out new resumes for people who have Java programming experience lets say. So I can go into my database, find the people who originally had java as a skill, and output a new set of resumes that are also still in a nice templated format, and only have the relevant info in them, instead of ALL the content.

I have been writing some software to do this in Java that will basically use a docx template, overwriting the items in customXML which are bound to the content controls in the doc, so the new data shows up and can eb saved as a new docx with that custom data.

This seems really cumbersome to me, and has some limitations. For one, lets say my template has a place for 3 Skills, and the particular person has 8 skills. There seems to be no good way to add those 5 additional skills to the docx other than painstakingly inserting the data with all of the formatting XML tags and such. This is a real pain, because if the template changes, I dont want to have to go back into my software and edit source code to change that additional data input XML tag to bold instead of italic.

I was doing some reading up on using Infopath to create a form that I could use to get the input, connecting to some sharepoint data source or something to store the stripped out data. However, I can't seem to find out if it is possible using sharepoint to get the data back out, in a nice formatted way. What would the general steps for this be? It seems like I couldnt find very much about this topic with any quick googling.

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

长亭外,古道边 2024-09-25 10:43:17

您可以设置技能:

<skills>
  <skill>..</skill>
  <skill>..</skill>

并使用指向容器的“重复”内容控件。这将处理任意数量的 条目。

You could set up the skills:

<skills>
  <skill>..</skill>
  <skill>..</skill>

and use a "repeat" content control pointing to the container. This would handle any number of <skill> entries.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文