有效读取SO的数据转储

发布于 2024-07-29 05:44:55 字数 818 浏览 9 评论 0原文

我目前使用 Vim 来读取 SO 的数据转储。 然而,当我只向下滚动几行时,我的 Macbook 速度就会变慢。 这表明我必须有更有效的方法来读取数据。

我对MySQL知之甚少。 这些文件采用 .xml 格式。 目前读取 .xml 中的数据相当困难。 将 xml 文件转换为 MySQL 然后读取文件可能会更有效。 我只知道 MS db -tool 可以执行此类操作。 不过,我也想知道另一个工具。

问题

  1. 将 .xml 解析为 SQL 查询以便 MySQL 理解它的 。 我们需要了解数据的数据结构。
  2. 要在MySQL中运行数据,
  3. 需要找到一些类似于MS db -tool的工具,通过它我们可以有效地读取数据

如何读取SO的数据有效转储?

--

[编辑]

  1. 如何运行 523 SQL查询在您的终端中创建数据库? 我现在有一个文本文件中的命令。
  2. 如何在数据库中“切换到[恢复模式]到简单恢复模式?”

I use currently Vim to read SO's data dump. However, my Macbook slows down when I roll down just a few rows. This suggests me that there must be more efficient ways to read the data.

I know little MySQL. The files are in .xml -format. It is rather hard to read the data at the moment in .xml. It may be more efficient to convert the xml -files to MySQL and then read the files. I know only MS db -tool for such actions. However, I would like to know another tool too.

Problems

  1. to parse .xml to SQL -queries such that MySQL understand it. We need to know data structures of the data.
  2. to run the data in MySQL
  3. to find some tool similar to MS db -tool by which we can read the data effectively

How do you read SO's data dump effectively?

--

[edit]

  1. How can you run the 523 SQL queries to create the database in your terminal? I have the commands at the moment in a text -file.
  2. How can you "switch to [the recovery mode] to a simple recovery mode in the database?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

呆萌少年 2024-08-05 05:44:55

我制作了我的第一个 python 程序来读取它们并输出 SQL 插入语句以与 mysql 一起使用(它很丑陋但有效)。 您需要先手动创建表格。

import xml.sax.handler
import xml.sax
import sys
class SOHandler(xml.sax.handler.ContentHandler):
        def __init__(self):
                self.errParse = 0

        def startElement(self, name, attributes):
                if name != "row":
                        self.table = name;
                        self.outFile = open(name+".sql","w")
                        self.errfile = open(name+".err","w")
                else:
                        skip = 0
                        currentRow = u"insert into "+self.table+"("
                        for attr in attributes.keys():
                                currentRow += str(attr) + ","
                        currentRow = currentRow[:-1]
                        currentRow += u") values ("
                        for attr in attributes.keys():
                                try:
                                        currentRow += u'"{0}",'.format(attributes[attr].replace('\\','\\\\').replace('"', '\\"').replace("'", "\\'"))
                                except UnicodeEncodeError:
                                        self.errParse += 1;
                                        skip = 1;
                                        self.errfile.write(currentRow)
                        if skip != 1:
                                currentRow = currentRow[:-1]
                                currentRow += u");"
                                #print len(attributes.keys())
                                self.outFile.write(currentRow.encode("utf-8"))
                                self.outFile.write("\n")
                                self.outFile.flush()
                                print currentRow.encode("utf-8");

        def characters(self, data):
                pass

        def endElement(self, name):
                pass

if len(sys.argv) < 2:
        print "Give me an xml file argument!"
        sys.exit(1)

parser = xml.sax.make_parser()
handler = SOHandler()
parser.setContentHandler(handler)
parser.parse(sys.argv[1])
print handler.errParse

I made my first ever python program to read them and output SQL insert statements for use with mysql (It's ugly but worked). You'll need to create the tables first though by hand.

import xml.sax.handler
import xml.sax
import sys
class SOHandler(xml.sax.handler.ContentHandler):
        def __init__(self):
                self.errParse = 0

        def startElement(self, name, attributes):
                if name != "row":
                        self.table = name;
                        self.outFile = open(name+".sql","w")
                        self.errfile = open(name+".err","w")
                else:
                        skip = 0
                        currentRow = u"insert into "+self.table+"("
                        for attr in attributes.keys():
                                currentRow += str(attr) + ","
                        currentRow = currentRow[:-1]
                        currentRow += u") values ("
                        for attr in attributes.keys():
                                try:
                                        currentRow += u'"{0}",'.format(attributes[attr].replace('\\','\\\\').replace('"', '\\"').replace("'", "\\'"))
                                except UnicodeEncodeError:
                                        self.errParse += 1;
                                        skip = 1;
                                        self.errfile.write(currentRow)
                        if skip != 1:
                                currentRow = currentRow[:-1]
                                currentRow += u");"
                                #print len(attributes.keys())
                                self.outFile.write(currentRow.encode("utf-8"))
                                self.outFile.write("\n")
                                self.outFile.flush()
                                print currentRow.encode("utf-8");

        def characters(self, data):
                pass

        def endElement(self, name):
                pass

if len(sys.argv) < 2:
        print "Give me an xml file argument!"
        sys.exit(1)

parser = xml.sax.make_parser()
handler = SOHandler()
parser.setContentHandler(handler)
parser.parse(sys.argv[1])
print handler.errParse
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文