SAX Rss 解析忽略 setLink 上的空间

发布于 2024-11-14 14:50:55 字数 1185 浏览 0 评论 0原文

我制作了一个 RSS 阅读器,可以在小问题上使用一些帮助。当 rss XML 设置如下时:

<link>http://www.grants.gov/search/search.do?mode=VIEW&amp;oppId=98616</link>

我的读者可以很好地拉出链接。

但我试图阅读的一些提要的设置如下:

<link>
http://www.ornl.gov/info/ornlreview/v44_1_11/article06.shtml
</link>

这会导致我的读者错过链接。

我已将问题范围缩小到:

    @Override
    public void characters(char[] ch, int start, int length)
        {
    // TODO Auto-generated method stub

    String strCharacters = new String(ch,start,length);
    if (itemFound==true){
    // "item" tag found, it's item's parameter
        switch(currentState){
        case state_title:
            item.setTitle(strCharacters);
            break;
        case state_description:
            item.setDescription(strCharacters);
            break;
        case state_link:
            item.setLink(strCharacters);
            break;
        case state_pubdate:
            item.setPubdate(strCharacters);
            break;  
        default:
            break;
        }
    }

strCharacters 会拉出当前行中的字符串,但对于带有空格的 RSS,它只会拉出空格。关于如何让它跳过空白并拉下一行的链接有什么想法吗?

I have made a RSS reader and could use some help on small problem. When the rss XML is set up like this:

<link>http://www.grants.gov/search/search.do?mode=VIEW&oppId=98616</link>

my reader can pull the link fine.

But some feed I am trying to read are set up like:

<link>
http://www.ornl.gov/info/ornlreview/v44_1_11/article06.shtml
</link>

which causes my reader to miss the link.

I have narrowed the problem down to:

    @Override
    public void characters(char[] ch, int start, int length)
        {
    // TODO Auto-generated method stub

    String strCharacters = new String(ch,start,length);
    if (itemFound==true){
    // "item" tag found, it's item's parameter
        switch(currentState){
        case state_title:
            item.setTitle(strCharacters);
            break;
        case state_description:
            item.setDescription(strCharacters);
            break;
        case state_link:
            item.setLink(strCharacters);
            break;
        case state_pubdate:
            item.setPubdate(strCharacters);
            break;  
        default:
            break;
        }
    }

The strCharacters pulls the string in the current row, but with the RSS with a space it just pulls whitespace. Any ideas on how to get it to skip the white space and pull the link on the next line?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

初吻给了烟 2024-11-21 14:50:55
  1. 创建documentBuilderFactory

    DocumentBuilderFactory工厂=
    DocumentBuilderFactory.newInstance();

    1. 创建文档构建器

    DocumentBuilder 构建器=工厂。 newDocumentBuilder();

    1. 获取输入流
      ClassLoader cls=DomReader.class.getClassLoader();
      InputStream is=cls.getResourceAsStream("xml 文件");

      1. 解析xml文件并通过调用parse方法获取Document对象
        在 DocumentBuilder 对象上。
        文档 document=builder.parse(is);
      2. 使用文档对象遍历dom树。
        萨克斯:
        简单的xml解析。
        它逐个节点解析
        遍历是从上到下
        内存使用率低
        使用 sax 无法进行后退导航。

    //实现所需的处理程序
    公共类 SaxParse 扩展 DefaultHandler{
    }
    //saxParserFactory的新实例
    SAXParserFactory 工厂=SAXParserFactory.newInstance();
    //SAX 解析器的新实例
    SAXParser saxparser=factory.newSAXParser();
    //解析xml文档
    SAXParser.parse(new File(要解析的文件), new SAXXMLParserImpl());

  1. Create documentBuilderFactory

    DocumentBuilderFactory factory=
    DocumentBuilderFactory.newInstance();

    1. Create DocumentBuilder

    DocumentBuilder builder=factory. newDocumentBuilder();

    1. get input stream
      ClassLoader cls=DomReader.class.getClassLoader();
      InputStream is=cls.getResourceAsStream("xml file");

      1. parse xml file and get Document object by calling parse method
        on DocumentBuilder object.
        Document document=builder.parse(is);
      2. Traverse dom tree using document object.
        SAX:
        Simple xml parsing.
        It parses node by node
        Traversing is from top to bottom
        Low memory usage
        Back navigation is not possible with sax.

    //implementing required handlers
    public class SaxParse extends DefaultHandler{
    }
    //new instance of saxParserFactory
    SAXParserFactory factory=SAXParserFactory.newInstance();
    //NEW INSTANCE OF SAX PARSER
    SAXParser saxparser=factory.newSAXParser();
    //Parsing xml document
    SAXParser.parse(new File(file to be parsed), new SAXXMLParserImpl());

把时间冻结 2024-11-21 14:50:55

您的解析器看起来很奇怪,请尝试这样做:

  private StringBuilder builder;   

  @Override
  public void startDocument() throws SAXException {
    super.startDocument();
    builder = new StringBuilder();
  }

  @Override
  public void characters(char[] ch, int start, int length) throws SAXException {
    super.characters(ch, start, length);
    builder.append(ch, start, length);
  }

  public void endElement(String uri, String localName, String name) throws SAXException {
    if (currentState == state_link) {
      item.setLink(builder.toString().trim());
    }
    builder.setLength(0);
  }

这样您就可以等到内容完全被消耗,而不是只读取一行文本。

Your parser looks weird, try doing this instead:

  private StringBuilder builder;   

  @Override
  public void startDocument() throws SAXException {
    super.startDocument();
    builder = new StringBuilder();
  }

  @Override
  public void characters(char[] ch, int start, int length) throws SAXException {
    super.characters(ch, start, length);
    builder.append(ch, start, length);
  }

  public void endElement(String uri, String localName, String name) throws SAXException {
    if (currentState == state_link) {
      item.setLink(builder.toString().trim());
    }
    builder.setLength(0);
  }

That way you wait until the content is completely consumed, instead of just reading one line of text.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文