MySQL 数据库中的特殊字符(例如大撇号)正在破坏我的 XML

发布于 2024-09-02 02:34:49 字数 2765 浏览 1 评论 0原文

我有一个包含报纸文章的 MySQL 数据库。有一个卷表、一个期表和一个文章表。我有一个 PHP 文件,它生成一个属性列表,然后由 iPhone 应用程序拉入并读取。 plist 将每篇文章作为每个问题中的字典,并将每个问题作为每个卷中的字典。该 plist 实际上并不包含整篇文章——仅包含标题和 URL。

有些文章标题包含特殊字符,例如大撇号。查看生成的 XML plist,每当遇到特殊字符时,它都会不可预测地吞噬掉一大堆文本,从而使 XML 损坏且无法读取。

(……无论如何,在 Chrome 中,我猜是在 iPhone 上。Firefox 实际上处理得很好,在黑色菱形中显示一个白色的 ? 来代替任何特殊字符,并且不会吞噬任何东西。)

示例很好-formed plist snippet:

<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> 
<plist version="1.0"> 
<dict> 
    <key>Rows</key> 
    <array>     
        <dict> 
            <key>Title</key> 
            <string>Vol. 133 (2003-2004)</string> 
            <key>Children</key> 
            <array>         
                <dict> 
                    <key>Title</key> 
                    <string>No. 18 (Apr 2, 2004)</string> 
                    <key>Children</key> 
                    <array>                 
                        <dict> 
                            <key>Title</key> 
                            <string>Basketball concludes historic season</string> 
                            <key>URL</key> 
                            <string>http://orient.bowdoin.edu/orient/article_iphone.php?date=2004-04-02&amp;section=1&amp;id=1</string> 
                        </dict>

                        <!-- ... -->

                    </array>
                </dict>     
            </array>
        </dict>
    </array>
</dict>
</plist>

当它遇到大写撇号时会发生什么的示例: 这是来自 Chrome。根据 MS Word 的统计,这次它吃了 5,998 个字符,在开头跳到了一个披萨故事的标题;如果我重新加载,它的行为会有所不同,吃一些其他的量。正确的标题是:歌手兼作曲家 Farrell '05 发现成功超越泡沫

                    <dict> 
                        <key>Title</key> 
                        <string>Singer-songwriter Farrell ing>Students embrace free pizza, College objects to solicitation</string> 
                        <key>URL</key> 
                        <string>http://orient.bowdoin.edu/orient/article_iphone.php?date=2009-09-18&amp;section=1&amp;id=9</string> 
                    </dict> 

在 MySQL 中,标题存储为(二进制):

53 69 6E 67 |65 72 2D 73 |6F 6E 67 77 |72 69 74 65
72 20 46 61 |72 72 65 6C |6C 20 C2 92 |30 35 20 66
69 6E 64 73 |20 73 75 63 |63 65 73 73 |20 62 65 79
6F 6E 64 20 |74 68 65 20 |62 75 62 62 |6C

有什么想法如何正确编码/解码事物吗?如果没有,知道如何以其他方式解决这个问题吗?

我不知道我在说什么,哈哈;如果有什么办法可以帮助您,请告诉我。 :) 非常感谢!

I have a MySQL database of newspaper articles. There's a volume table, an issue table, and an article table. I have a PHP file that generates a property list that is then pulled in and read by an iPhone app. The plist holds each article as a dictionary inside each issue, and each issue as a dictionary inside each volume. The plist doesn't actually hold the whole article -- just a title and URL.

Some article titles contain special characters, like curly apostrophes. Looking at the generated XML plist, whenever it hits a special character, it unpredictably gobbles up a whole bunch of text, leaving the XML mangled and unreadable.

(...in Chrome, anyway, and I'm guessing on the iPhone. Firefox actually handles it pretty well, showing a white ? in a black diamond in place of any special characters and not gobbling anything.)

Example well-formed plist snippet:

<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> 
<plist version="1.0"> 
<dict> 
    <key>Rows</key> 
    <array>     
        <dict> 
            <key>Title</key> 
            <string>Vol. 133 (2003-2004)</string> 
            <key>Children</key> 
            <array>         
                <dict> 
                    <key>Title</key> 
                    <string>No. 18 (Apr 2, 2004)</string> 
                    <key>Children</key> 
                    <array>                 
                        <dict> 
                            <key>Title</key> 
                            <string>Basketball concludes historic season</string> 
                            <key>URL</key> 
                            <string>http://orient.bowdoin.edu/orient/article_iphone.php?date=2004-04-02&section=1&id=1</string> 
                        </dict>

                        <!-- ... -->

                    </array>
                </dict>     
            </array>
        </dict>
    </array>
</dict>
</plist>

Example of what happens when it hits a curly apostrophe: This is from Chrome. This time it ate 5,998 characters, by MS Word's count, skipping down to midway through the opening the title of a pizza story; if I reload it'll behave differently, eating some other amount. The proper title is: Singer-songwriter Farrell ’05 finds success beyond the bubble

                    <dict> 
                        <key>Title</key> 
                        <string>Singer-songwriter Farrell ing>Students embrace free pizza, College objects to solicitation</string> 
                        <key>URL</key> 
                        <string>http://orient.bowdoin.edu/orient/article_iphone.php?date=2009-09-18&section=1&id=9</string> 
                    </dict> 

In MySQL that title is stored as (in binary):

53 69 6E 67 |65 72 2D 73 |6F 6E 67 77 |72 69 74 65
72 20 46 61 |72 72 65 6C |6C 20 C2 92 |30 35 20 66
69 6E 64 73 |20 73 75 63 |63 65 73 73 |20 62 65 79
6F 6E 64 20 |74 68 65 20 |62 75 62 62 |6C

Any ideas how I can encode/decode things properly? If not, any idea how I can get around the problem some other way?

I don't have a clue what I'm talking about, haha; let me know if there's any way I can help you help me. :) And many thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

行至春深 2024-09-09 02:34:49

这里有一些选项,

  • 在插入表时使用 htmlentities() 对特殊字符进行编码
  • 将所有内容更改为 UTF-8
  • 尝试在标题周围使用 CDATA ie

here's a few options

  • use htmlentities() to encode special characters when inserting in the table
  • change everything to UTF-8
  • try using CDATA around the titles ie

    <string><![CDATA[ BLAH BLAH BLAH ]]></string>

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文