如何查看发送到Solr的XML文档
我们在 Solr 中遇到 UTF-8 问题,需要调试发送用于索引的文档。我们能以某种方式做到这一点吗?
搜索了我找到的所有日志,在 tomcat6 / Catalina 目录中的应用程序 XML 中启用了 debug="1"
。甚至尝试了 Wireshark,但没有骰子。拜托拜托!
PHP 方面的一切看起来都很好,并且到目前为止一直运行良好。但国际字符变成了?,经典的头痛。
We're having problems with UTF-8 in Solr, and need to debug the documents that are sent for indexing. Can we do this somehow?
Searched all logs I've found, enabled debug="1"
in the app XML in the tomcat6 / Catalina directory. Even tried Wireshark, but no dice. Please please!
Everything looks good on the PHP side, and this has been working fine until now. But international characters turns into ?, classic headache.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
确保 php 端是完美的。您是否使用编辑器打开了 xml 文件并将编码显式设置为 UTF8?你的默认系统编码是什么?我打赌将文件从这种编码转换为 UTF8 可以解决问题(例如使用 iconv)。
因为Solr只接受UTF-8。由于 xml 的性质,这甚至只是 xml 的一个子集。您还可以通过 以下代码 即查找无效(xml)字符...
Be sure that the php side is perfect. Did you open the xml file with an editor and explicit setting the encoding to UTF8? What is your default system encoding? I bet converting the file from this encoding to UTF8 can solve the problem (e.g. with iconv).
Because Solr only accepts UTF-8. And because of the nature of xml this is even only a subset of xml. You can also scan the xml generated from php through the following code i.e. look for invalid (xml) chars there ...
您可以使用 Tcpmon。
我经常使用它,因为它允许我在发送到 Solr(或任何 Web 应用程序)时看到 http 标头和有效负载。
You could use Tcpmon.
I use it a lot as it allows me to see the http header and payload when sending to Solr (or any web app).