java中的高亮文本
我们正在开发抄袭检测框架。在那里我必须强调文档中可能抄袭的短语。首先对文档进行预处理,包括停用词删除、词干提取和数字删除。因此,预处理标记的突出显示变得困难 例如:
原始文本:“极限编程是敏捷软件开发的一种方法,它强调在称为时间盒的短开发周期中频繁发布。这通过多个短开发周期而不是极端编程包括成对编程(用于代码审查、单元测试),它还避免实现当前时间框中未包含的功能,因此可以最大限度地减少进度蔓延。 ”
这句话想要强调的是: 极限编程包括成对编程
预处理标记:极端程序成对编程
是否有我可以在原始文档中突出显示预处理标记???
谢谢
We are developing a plagiarism detection framework. In there i have to highlight the possible plagiarized phrases in the document. The document gets preprocessed with stop word removal, stemming and number removal first. So the highlighting gets difficult with the preprocessed token
As and example:
Orginal Text: "Extreme programming is one approach of agile software development which emphasizes on frequent releases in short development cycles which are called time boxes. This result in reducing the costs spend for changes, by having multiple short development cycles, rather than one long one. Extreme programming includes pair-wise programming (for code review, unit testing). Also it avoids implementing features which are not included in the current time box, so the schedule creep can be minimized. "
phrase want to highlight: Extreme programming includes pair-wise programming
preprocessed token : Extrem program pair-wise program
Is there anyway I can highlight the preprocessed token in the original document????
Thanx
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您最好使用 JTextPane 或 JEditorPane,而不是 JTextArea。
文本区域是一个“纯”文本组件,这意味着虽然它可以以任何字体显示文本,但所有文本都采用相同的字体。
因此,
JTextArea
不是一个方便进行任何文本格式设置的组件。相反,使用
JTextPane
或JEditorPane
,可以很容易地更改加载文本任何部分的样式(突出显示)。有关详细信息,请参阅如何使用编辑器窗格和文本窗格。
更新:
以下代码突出显示文本中所需的部分。
这不完全是你想要的。它只是在文本中找到确切的短语。
但我希望如果你应用你的算法,你可以轻松地
修改它以满足您的需要。
此示例基于突出显示 JTextComponent 中的单词。
You'd better use JTextPane or JEditorPane, instead of JTextArea.
A text area is a "plain" text component, which means taht although it can display text in any font, all of the text is in the same font.
So,
JTextArea
is not a convenient component to make any text formatting.On the contrary, using
JTextPane
orJEditorPane
, it's quite easy to change style (highlight) of any part of loaded text.See How to Use Editor Panes and Text Panes for details.
Update:
The following code highlights the desired part of your text.
It's not exectly what you want. It simply finds the exact phrase in the text.
But I hope that if you apply your algorithms, you can easily
modify it to fit your needs.
This example is based on Highlighting Words in a JTextComponent.
从技术角度来看:您可以选择或开发一种标记语言,并向原始文档添加注释或标签。或者您想创建第二个文件来记录所有潜在的抄袭行为。
使用标记,您的文本可能如下所示:(
ref 引用描述原始内容的一些元数据记录)
From a technical point of view: You can either choose or develop a markup language and add annotations or tags to the original document. Or you want to create a second file that records all potential plagiarisms.
With markup, your text could look like this:
(with ref referencing to some metadata record that describes the original)
您可以使用 java.text.AttributedString 来注释原始文档中的预处理标记。
然后将 TextAttributes 应用于相关的(在原始文档中生效。
You could use java.text.AttributedString to annotate the preprocessed tokens in the original document.
Then apply TextAttributes to the relevant ones (which whould take effect in the original document.