之后使用 SAX 解析 XML 时出错
<description>
SEBI : Decision taken by a listed investment company to dispose of a part of its
investment is not “price sensitive information” within meaning of SEBI
(Prohibition of Insider Trading) Regulations, 1992<br>;
By <b> [2011] 15 taxmann.com 229 (SAT)</b>
</description>
这是我想在
之后解析数据的 xml。我可以在
之前解析,但无法在
之后解析
这是我的句柄类代码:
package com.exercise;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class RSSHandler extends DefaultHandler {
final int state_unknown = 0;
final int state_title = 1;
final int state_description = 2;
final int state_link = 3;
final int state_pubdate = 4;
int currentState = state_unknown;
RSSFeed feed;
RSSItem item;
boolean itemFound = false;
RSSHandler(){
}
RSSFeed getFeed(){
return feed;
}
@Override
public void startDocument() throws SAXException {
// TODO Auto-generated method stub
feed = new RSSFeed();
item = new RSSItem();
}
@Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
// TODO Auto-generated method stub
if (localName.equalsIgnoreCase("item")){
itemFound = true;
item = new RSSItem();
currentState = state_unknown;
}
else if (localName.equalsIgnoreCase("title")){
currentState = state_title;
}
else if (localName.equalsIgnoreCase("description")){
currentState = state_description;
}
else if (localName.equalsIgnoreCase("link")){
currentState = state_link;
}
else if (localName.equalsIgnoreCase("pubdate")){
currentState = state_pubdate;
}
else{
currentState = state_unknown;
}
}
@Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
// TODO Auto-generated method stub
currentState = state_unknown;
if (localName.equalsIgnoreCase("item")){
feed.addItem(item);
}
}
@Override
public void characters(char ch[], int start, int length)
throws SAXException {
//super.characters(ch, start, length);
// TODO Auto-generated method stub
StringBuilder buf=new StringBuilder();
if (buf!=null) {
for (int i=start; i<start+length; i++) {
buf.append(ch[i]);
}
String strCharacters=buf.toString();
if (itemFound==true){
// "item" tag found, it's item's parameter
switch(currentState){
case state_title:
item.setTitle(strCharacters);
break;
case state_description:
item.setDescription(strCharacters); //here data coming
break;
case state_link:
item.setLink(strCharacters);
break;
case state_pubdate:
item.setPubdate(strCharacters);
break;
default:
break;
}
}
else{
// not "item" tag found, it's feed's parameter
switch(currentState){
case state_title:
feed.setTitle(strCharacters);
break;
case state_description:
feed.setDescription(strCharacters);
break;
case state_link:
feed.setLink(strCharacters);
break;
case state_pubdate:
feed.setPubdate(strCharacters);
break;
default:
break;
}
}
currentState = state_unknown;
}
}
}
<description>
SEBI : Decision taken by a listed investment company to dispose of a part of its
investment is not “price sensitive information” within meaning of SEBI
(Prohibition of Insider Trading) Regulations, 1992<br>;
By <b> [2011] 15 taxmann.com 229 (SAT)</b>
</description>
This is xml I want to parse data after <br>
. I'm able parse before <br>
but not able to parse after <br>
This is my handle class code :
package com.exercise;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class RSSHandler extends DefaultHandler {
final int state_unknown = 0;
final int state_title = 1;
final int state_description = 2;
final int state_link = 3;
final int state_pubdate = 4;
int currentState = state_unknown;
RSSFeed feed;
RSSItem item;
boolean itemFound = false;
RSSHandler(){
}
RSSFeed getFeed(){
return feed;
}
@Override
public void startDocument() throws SAXException {
// TODO Auto-generated method stub
feed = new RSSFeed();
item = new RSSItem();
}
@Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
// TODO Auto-generated method stub
if (localName.equalsIgnoreCase("item")){
itemFound = true;
item = new RSSItem();
currentState = state_unknown;
}
else if (localName.equalsIgnoreCase("title")){
currentState = state_title;
}
else if (localName.equalsIgnoreCase("description")){
currentState = state_description;
}
else if (localName.equalsIgnoreCase("link")){
currentState = state_link;
}
else if (localName.equalsIgnoreCase("pubdate")){
currentState = state_pubdate;
}
else{
currentState = state_unknown;
}
}
@Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
// TODO Auto-generated method stub
currentState = state_unknown;
if (localName.equalsIgnoreCase("item")){
feed.addItem(item);
}
}
@Override
public void characters(char ch[], int start, int length)
throws SAXException {
//super.characters(ch, start, length);
// TODO Auto-generated method stub
StringBuilder buf=new StringBuilder();
if (buf!=null) {
for (int i=start; i<start+length; i++) {
buf.append(ch[i]);
}
String strCharacters=buf.toString();
if (itemFound==true){
// "item" tag found, it's item's parameter
switch(currentState){
case state_title:
item.setTitle(strCharacters);
break;
case state_description:
item.setDescription(strCharacters); //here data coming
break;
case state_link:
item.setLink(strCharacters);
break;
case state_pubdate:
item.setPubdate(strCharacters);
break;
default:
break;
}
}
else{
// not "item" tag found, it's feed's parameter
switch(currentState){
case state_title:
feed.setTitle(strCharacters);
break;
case state_description:
feed.setDescription(strCharacters);
break;
case state_link:
feed.setLink(strCharacters);
break;
case state_pubdate:
feed.setPubdate(strCharacters);
break;
default:
break;
}
}
currentState = state_unknown;
}
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您粘贴的第一个文本有问题。
尝试在代码模式下再次发布 XML(每行开头 4 个空格)。
我怀疑您拥有 url 编码格式的 xml,并且在开始处理它之前必须对其进行解码。
Something is wrong with the first text you pasted.
Try posting the XML again in code mode (4 spaces in the beginning of each line).
My suspicion is that you're having the xml in url-encoded format and that you'll have to decode it before you start handling it.
&
是 XML 实体引用,表示 &。默认情况下,SAX 将为您执行转换,因此如果您的源 XML 表示 hello&goodbye,您应该会看到 hello&goodbye。
通过此链接。它可能会解决你的问题
&
is an XML entity reference and means &.By default, SAX will do the conversion for you, so if your source XML says hello&goodbye you should see hello&goodbye.
go through This link. It might solve ur problem
正如所发布的 XML 无效一样,您可能还需要转义文档中的引号。
我不知道这是否是你的问题,但这将是一个贡献者。
(报价围绕“价格敏感信息”)
As posted that XML is not valid, you will probably need to escape the quotes in the doc as well.
I don't know if that is your issue, but it will be a contributor.
(the quotes are around "price sensitive information")
我认为在您的情况下,问题在于您正在
characters()
内初始化 StringBuilder,因此每次都会创建新对象。不要在characters()
中初始化它,而是尝试在startElement()
中初始化它I think in your case the problem is that you are initializing the StringBuilder inside the
characters()
so new object is created everytime. Instead of intializing it incharacters()
try to initialize it in thestartElement()