java中的高亮文本

发布于 2024-11-17 17:25:50 字数 347 浏览 3 评论 0原文

我们正在开发抄袭检测框架。在那里我必须强调文档中可能抄袭的短语。首先对文档进行预处理,包括停用词删除、词干提取和数字删除。因此,预处理标记的突出显示变得困难 例如:

原始文本:“极限编程是敏捷软件开发的一种方法,它强调在称为时间盒的短开发周期中频繁发布。这通过多个短开发周期而不是极端编程包括成对编程(用于代码审查、单元测试),它还避免实现当前时间框中未包含的功能,因此可以最大限度地减少进度蔓延。 ”

这句话想要强调的是: 极限编程包括成对编程

预处理标记:极端程序成对编程

是否有我可以在原始文档中突出显示预处理标记???

谢谢

We are developing a plagiarism detection framework. In there i have to highlight the possible plagiarized phrases in the document. The document gets preprocessed with stop word removal, stemming and number removal first. So the highlighting gets difficult with the preprocessed token
As and example:

Orginal Text: "Extreme programming is one approach of agile software development which emphasizes on frequent releases in short development cycles which are called time boxes. This result in reducing the costs spend for changes, by having multiple short development cycles, rather than one long one. Extreme programming includes pair-wise programming (for code review, unit testing). Also it avoids implementing features which are not included in the current time box, so the schedule creep can be minimized. "

phrase want to highlight: Extreme programming includes pair-wise programming

preprocessed token : Extrem program pair-wise program

Is there anyway I can highlight the preprocessed token in the original document????

Thanx

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

笑叹一世浮沉 2024-11-24 17:25:50

您最好使用 JTextPaneJEditorPane,而不是 JTextArea

文本区域是一个“纯”文本组件,这意味着虽然它可以以任何字体显示文本,但所有文本都采用相同的字体。

因此,JTextArea不是一个方便进行任何文本格式设置的组件。

相反,使用 JTextPaneJEditorPane,可以很容易地更改加载文本任何部分的样式(突出显示)。

有关详细信息,请参阅如何使用编辑器窗格和文本窗格

更新:

以下代码突出显示文本中所需的部分。
这不完全是你想要的。它只是在文本中找到确切的短语。

但我希望如果你应用你的算法,你可以轻松地
修改它以满足您的需要。

import java.lang.reflect.InvocationTargetException;
import javax.swing.*;
import javax.swing.text.*;
import java.awt.*;

public class LineHighlightPainter {

    String revisedText = "Extreme programming is one approach "
            + "of agile software development which emphasizes on frequent"
            + " releases in short development cycles which are called "
            + "time boxes. This result in reducing the costs spend for "
            + "changes, by having multiple short development cycles, "
            + "rather than one long one. Extreme programming includes "
            + "pair-wise programming (for code review, unit testing). "
            + "Also it avoids implementing features which are not included "
            + "in the current time box, so the schedule creep can be minimized. ";
    String token = "Extreme programming includes pair-wise programming";

    public static void main(String args[]) {
        try {
            SwingUtilities.invokeAndWait(new Runnable() {

                public void run() {
                    new LineHighlightPainter().createAndShowGUI();
                }
            });
        } catch (InterruptedException ex) {
            // ignore
        } catch (InvocationTargetException ex) {
            // ignore
        }
    }

    public void createAndShowGUI() {
        JFrame frame = new JFrame("LineHighlightPainter demo");
        frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);

        JTextArea area = new JTextArea(9, 45);
        area.setLineWrap(true);
        area.setWrapStyleWord(true);
        area.setText(revisedText);

        // Highlighting part of the text in the instance of JTextArea
        // based on token.
        highlight(area, token);

        frame.getContentPane().add(new JScrollPane(area), BorderLayout.CENTER);
        frame.pack();
        frame.setVisible(true);
    }

    // Creates highlights around all occurrences of pattern in textComp
    public void highlight(JTextComponent textComp, String pattern) {
        // First remove all old highlights
        removeHighlights(textComp);

        try {
            Highlighter hilite = textComp.getHighlighter();
            Document doc = textComp.getDocument();
            String text = doc.getText(0, doc.getLength());

            int pos = 0;
            // Search for pattern
            while ((pos = text.indexOf(pattern, pos)) >= 0) {
                // Create highlighter using private painter and apply around pattern
                hilite.addHighlight(pos, pos + pattern.length(), myHighlightPainter);
                pos += pattern.length();
            }

        } catch (BadLocationException e) {
        }
    }

    // Removes only our private highlights
    public void removeHighlights(JTextComponent textComp) {
        Highlighter hilite = textComp.getHighlighter();
        Highlighter.Highlight[] hilites = hilite.getHighlights();

        for (int i = 0; i < hilites.length; i++) {
            if (hilites[i].getPainter() instanceof MyHighlightPainter) {
                hilite.removeHighlight(hilites[i]);
            }
        }
    }
    // An instance of the private subclass of the default highlight painter
    Highlighter.HighlightPainter myHighlightPainter = new MyHighlightPainter(Color.red);

    // A private subclass of the default highlight painter
    class MyHighlightPainter
            extends DefaultHighlighter.DefaultHighlightPainter {

        public MyHighlightPainter(Color color) {
            super(color);
        }
    }
}

此示例基于突出显示 JTextComponent 中的单词

You'd better use JTextPane or JEditorPane, instead of JTextArea.

A text area is a "plain" text component, which means taht although it can display text in any font, all of the text is in the same font.

So, JTextArea is not a convenient component to make any text formatting.

On the contrary, using JTextPane or JEditorPane, it's quite easy to change style (highlight) of any part of loaded text.

See How to Use Editor Panes and Text Panes for details.

Update:

The following code highlights the desired part of your text.
It's not exectly what you want. It simply finds the exact phrase in the text.

But I hope that if you apply your algorithms, you can easily
modify it to fit your needs.

import java.lang.reflect.InvocationTargetException;
import javax.swing.*;
import javax.swing.text.*;
import java.awt.*;

public class LineHighlightPainter {

    String revisedText = "Extreme programming is one approach "
            + "of agile software development which emphasizes on frequent"
            + " releases in short development cycles which are called "
            + "time boxes. This result in reducing the costs spend for "
            + "changes, by having multiple short development cycles, "
            + "rather than one long one. Extreme programming includes "
            + "pair-wise programming (for code review, unit testing). "
            + "Also it avoids implementing features which are not included "
            + "in the current time box, so the schedule creep can be minimized. ";
    String token = "Extreme programming includes pair-wise programming";

    public static void main(String args[]) {
        try {
            SwingUtilities.invokeAndWait(new Runnable() {

                public void run() {
                    new LineHighlightPainter().createAndShowGUI();
                }
            });
        } catch (InterruptedException ex) {
            // ignore
        } catch (InvocationTargetException ex) {
            // ignore
        }
    }

    public void createAndShowGUI() {
        JFrame frame = new JFrame("LineHighlightPainter demo");
        frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);

        JTextArea area = new JTextArea(9, 45);
        area.setLineWrap(true);
        area.setWrapStyleWord(true);
        area.setText(revisedText);

        // Highlighting part of the text in the instance of JTextArea
        // based on token.
        highlight(area, token);

        frame.getContentPane().add(new JScrollPane(area), BorderLayout.CENTER);
        frame.pack();
        frame.setVisible(true);
    }

    // Creates highlights around all occurrences of pattern in textComp
    public void highlight(JTextComponent textComp, String pattern) {
        // First remove all old highlights
        removeHighlights(textComp);

        try {
            Highlighter hilite = textComp.getHighlighter();
            Document doc = textComp.getDocument();
            String text = doc.getText(0, doc.getLength());

            int pos = 0;
            // Search for pattern
            while ((pos = text.indexOf(pattern, pos)) >= 0) {
                // Create highlighter using private painter and apply around pattern
                hilite.addHighlight(pos, pos + pattern.length(), myHighlightPainter);
                pos += pattern.length();
            }

        } catch (BadLocationException e) {
        }
    }

    // Removes only our private highlights
    public void removeHighlights(JTextComponent textComp) {
        Highlighter hilite = textComp.getHighlighter();
        Highlighter.Highlight[] hilites = hilite.getHighlights();

        for (int i = 0; i < hilites.length; i++) {
            if (hilites[i].getPainter() instanceof MyHighlightPainter) {
                hilite.removeHighlight(hilites[i]);
            }
        }
    }
    // An instance of the private subclass of the default highlight painter
    Highlighter.HighlightPainter myHighlightPainter = new MyHighlightPainter(Color.red);

    // A private subclass of the default highlight painter
    class MyHighlightPainter
            extends DefaultHighlighter.DefaultHighlightPainter {

        public MyHighlightPainter(Color color) {
            super(color);
        }
    }
}

This example is based on Highlighting Words in a JTextComponent.

分分钟 2024-11-24 17:25:50

从技术角度来看:您可以选择或开发一种标记语言,并向原始文档添加注释或标签。或者您想创建第二个文件来记录所有潜在的抄袭行为。

使用标记,您的文本可能如下所示:(

[...] rather than one long one. <plag ref="1234">Extreme programming 
includes pair-wise programming</plag> (for code review, unit testing). [...]

ref 引用描述原始内容的一些元数据记录)

From a technical point of view: You can either choose or develop a markup language and add annotations or tags to the original document. Or you want to create a second file that records all potential plagiarisms.

With markup, your text could look like this:

[...] rather than one long one. <plag ref="1234">Extreme programming 
includes pair-wise programming</plag> (for code review, unit testing). [...]

(with ref referencing to some metadata record that describes the original)

吹梦到西洲 2024-11-24 17:25:50

您可以使用 java.text.AttributedString 来注释原始文档中的预处理标记。
然后将 TextAttributes 应用于相关的(在原始文档中生效。

You could use java.text.AttributedString to annotate the preprocessed tokens in the original document.
Then apply TextAttributes to the relevant ones (which whould take effect in the original document.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文