HTML 到格式化文本

发布于 2025-01-07 06:59:51 字数 531 浏览 3 评论 0原文

是否有任何 java API 可以像 Android 中的 Html.fromHtml() 那样执行类似的操作? JSoup 确实解析并删除了标签,但输出不是格式化的。 例如:

<ol type="1">
 <li>Test1</li>
 <ol type="a">
  <li>TestA1</li>
  <li>TestB1</li>
 </ol>
 <li>Test2</li>
 <ol type="a">
  <li>TestA2</li>
  <li>TestB2</li>
 </ol>
</ol>

应该给我类似

  1. Test1

    的内容

    a.测试A1

    b.测试B1

  2. 测试2

    a.测试A2

    b.测试B2

Are there any java APIs which does similar action like Html.fromHtml() as in Android? JSoup does parse and remove the tags but the output is not a formatted one.
eg:

<ol type="1">
 <li>Test1</li>
 <ol type="a">
  <li>TestA1</li>
  <li>TestB1</li>
 </ol>
 <li>Test2</li>
 <ol type="a">
  <li>TestA2</li>
  <li>TestB2</li>
 </ol>
</ol>

should give me something like

  1. Test1

    a. TestA1

    b. TestB1

  2. Test2

    a. TestA2

    b. TestB2

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

烟花肆意 2025-01-14 06:59:51

没有用于jsoup-to-"formatted text"的API,但您可以自己转换列表:

  1. 迭代所有子项作为列表根的 ul / ol 元素的
  2. if item:格式和添加输出字符串
  3. 如果子列表:执行1. - 但使用子列表元素 - 并添加结果

示例:

在此示例中,我使用 type< /code> 属性来确定需要哪种类型的项目符号并使用字符 (!) 来索引项目。如果没有正确的属性,则使用 char 1

实现:

/**
 * Convert the Listelement <code>root</code> to a formated string-representation.
 * 
 * @param root      Rootelement of the list (normally 'ul' or 'ol' tag)
 * @param depth     Depth of the list (<code>=0</code> for root element)
 * @return          List as String
 */
public String createList(Element root, int depth)
{
    final String indentation = createIndentation(depth); // create indentation
    StringBuilder sb = new StringBuilder();

    final String typeAttr = root.attr("type"); // Get the character used as bullet (= 'type' attribute)
    char type = typeAttr.isEmpty() ? '1' : typeAttr.charAt(0); // if 'type' attribute: use it, else: use '1' instead

    for( Element sub : root.children() ) // Iterate over all Childs
    {
        // If Java < 7: use if/else if/else here
        switch( sub.tagName() ) // Check if the element is an item or a sublist
        {
            case "li": // Listitem, format and append
                sb.append(indentation).append(type++).append(". ").append(sub.ownText()).append("\n");
                break;
            case "ol": // Sublist
            case "ul":
                if( !sub.children().isEmpty() ) // If sublist is not empty (contains furhter items)
                {
                    sb.append(createList(sub, depth + 1)); // Recursive call for the sublist
                }
                break;
            default: // "Illegal" tag, do furhter processing if required - output as an example here
                System.err.println("Not implemented tag: " + sub.tagName());
        }
    }

    return sb.toString(); // Return the formated List
}


/**
 * Create an Indentationstring of <code>length</code> blanks.
 * 
 * @param length    Size of indentation
 * @return          Indentationstring
 */
private String createIndentation(int length)
{
    StringBuilder sb = new StringBuilder(length);

    for( int i=0; i<length; i++ )
    {
        sb.append(' ');
    }

    return sb.toString();
}

测试代码:

    Document doc = ... // Load / parse your document here

    Element listRoot = doc.select("ol").first(); // Select the root-element (!) of the list here. 
    final String output = createList(listRoot, 0); // Convert the list

    System.out.println(output); // Ouput

结果:

输入 (HTML):

<ol type="1">
    <li>Test1</li>
    <ol type="a">
        <li>TestA1</li>
        <li>TestB1</li>
    </ol>
    <li>Test2</li>
    <ol type="a">
        <li>TestA2</li>
        <li>TestB2</li>
    </ol>
</ol>

输出:

1. Test1
 a. TestA1
 b. TestB1
2. Test2
 a. TestA2
 b. TestB2

就是这样! :-)

There's no api for jsoup-to-"formated text", but you can convert lists by your own:

  1. iterate over all childs of the ul / ol element which is the root of the list
  2. if item: format and add the output String
  3. if sublist: do 1. - but with the sublist element - and add the result

Example:

In this example i use the type attribute to determine what kind of bullet is required and use the character (!) to index the items. If there's no proper attribute, char 1 is used.

Implementation:

/**
 * Convert the Listelement <code>root</code> to a formated string-representation.
 * 
 * @param root      Rootelement of the list (normally 'ul' or 'ol' tag)
 * @param depth     Depth of the list (<code>=0</code> for root element)
 * @return          List as String
 */
public String createList(Element root, int depth)
{
    final String indentation = createIndentation(depth); // create indentation
    StringBuilder sb = new StringBuilder();

    final String typeAttr = root.attr("type"); // Get the character used as bullet (= 'type' attribute)
    char type = typeAttr.isEmpty() ? '1' : typeAttr.charAt(0); // if 'type' attribute: use it, else: use '1' instead

    for( Element sub : root.children() ) // Iterate over all Childs
    {
        // If Java < 7: use if/else if/else here
        switch( sub.tagName() ) // Check if the element is an item or a sublist
        {
            case "li": // Listitem, format and append
                sb.append(indentation).append(type++).append(". ").append(sub.ownText()).append("\n");
                break;
            case "ol": // Sublist
            case "ul":
                if( !sub.children().isEmpty() ) // If sublist is not empty (contains furhter items)
                {
                    sb.append(createList(sub, depth + 1)); // Recursive call for the sublist
                }
                break;
            default: // "Illegal" tag, do furhter processing if required - output as an example here
                System.err.println("Not implemented tag: " + sub.tagName());
        }
    }

    return sb.toString(); // Return the formated List
}


/**
 * Create an Indentationstring of <code>length</code> blanks.
 * 
 * @param length    Size of indentation
 * @return          Indentationstring
 */
private String createIndentation(int length)
{
    StringBuilder sb = new StringBuilder(length);

    for( int i=0; i<length; i++ )
    {
        sb.append(' ');
    }

    return sb.toString();
}

Testcode:

    Document doc = ... // Load / parse your document here

    Element listRoot = doc.select("ol").first(); // Select the root-element (!) of the list here. 
    final String output = createList(listRoot, 0); // Convert the list

    System.out.println(output); // Ouput

Result:

Input (HTML):

<ol type="1">
    <li>Test1</li>
    <ol type="a">
        <li>TestA1</li>
        <li>TestB1</li>
    </ol>
    <li>Test2</li>
    <ol type="a">
        <li>TestA2</li>
        <li>TestB2</li>
    </ol>
</ol>

Output:

1. Test1
 a. TestA1
 b. TestB1
2. Test2
 a. TestA2
 b. TestB2

Thats it! :-)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文