StackOverflow 导入错误:LOB 超出 2,147,483,647 字节?

发布于 08-05 04:38 字数 1085 浏览 9 评论 0原文

下载2009 年 9 月 StackOverflow 数据转储并运行 Brent 的 < a href="http://www.brentozar.com/archive/2009/06/how-to-import-the-stackoverflow-xml-into-sql-server/" rel="nofollow noreferrer">导入查询,我收到以下消息:

Msg 7119, Level 16, State 1, Procedure sp_xml_preparedocument, Line 1
Attempting to grow LOB beyond maximum allowed size of 2,147,483,647 bytes.
Msg 8179, Level 16, State 5, Procedure usp_ETL_Load_Posts, Line 59
Could not find prepared statement with handle 0.
The statement has been terminated.
Msg 7102, Level 20, State 99, Procedure usp_ETL_Load_Posts, Line 121
Internal Error: Text manager cannot continue with current statement. 
    Run DBCC CHECKTABLE.

布伦特的查询基于 7 月的数据,我怀疑这是 9 月数据库更大的结果。

除了获取旧数据之外,有人知道如何解决此问题或以其他方式导入数据吗?

更新:我正在运行“版本:Microsoft SQL Server 2005 - 9.00.1399.06 (Intel X86) 2005 年 10 月 14 日 00:33:37 版权所有 (c) 1988-2005 Microsoft Corporation Developer Edition on Windows NT 5.1(内部版本 2600:Service Pack 3)”

After downloading the September 2009 StackOverflow data-dump and running Brent's import query, I'm getting the following message:

Msg 7119, Level 16, State 1, Procedure sp_xml_preparedocument, Line 1
Attempting to grow LOB beyond maximum allowed size of 2,147,483,647 bytes.
Msg 8179, Level 16, State 5, Procedure usp_ETL_Load_Posts, Line 59
Could not find prepared statement with handle 0.
The statement has been terminated.
Msg 7102, Level 20, State 99, Procedure usp_ETL_Load_Posts, Line 121
Internal Error: Text manager cannot continue with current statement. 
    Run DBCC CHECKTABLE.

Brent's queries were based on July's data and I suspect this is a result of the September database that much bigger.

Aside from getting the older data, anyone know how to fix this or otherwise import the data?

UPDATE: I'm running "Version: Microsoft SQL Server 2005 - 9.00.1399.06 (Intel X86) Oct 14 2005 00:33:37 Copyright (c) 1988-2005 Microsoft Corporation Developer Edition on Windows NT 5.1 (Build 2600: Service Pack 3)"

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

自控2024-08-12 04:38:12

我通过在 .Net 中编写一个小型控制台应用程序(代码如下)解决了这个问题。它一次导入 1 条记录(甚至没有花时间搞乱 sqlbulkcopy 对象),并在我午休时运行。我忘记将时间戳写入控制台,所以我不知道到底花了多长时间。我的最佳估计是 20 多分钟。请注意,下一个问题是 tempdb:保留默认设置,tempdb 在导入过程中将变得非常大。完成后您需要重新启动 sql server 服务。

using System;
using System.Collections.Generic;
using System.Text;
using System.Diagnostics;
using System.Data;
using System.Data.SqlClient;
using System.IO;
using System.Xml;
namespace ImportPostsTable
{
    class Program
    {
        //TODO: pull connection string, data path from app.config or command line
        static string cnString = "Data Source=localhost;Database=SO;Trusted_Connection=True;";
        static string dataPath = @"C:\temp"; 
        static string insertString = "INSERT INTO Posts VALUES (@Id, @PostTypeID, @AcceptedAnswerId, @CreationDate, @Score, @ViewCount, @Body, @OwnerUserId, @OwnerDisplayName, @LastEditorUserId, @LastEditDate, @LastActivityDate, @Title, @Tags, @AnswerCount, @CommentCount, @FavoriteCount, @ClosedDate, @ParentId)";
        static void Main(string[] args)
        {
            Trace.Listeners.Add(new ConsoleTraceListener());

            try
            {
                 ImportPosts(dataPath, cnString);
            }
            catch (Exception e)
            {
                Trace.WriteLine(e.Message);
                Trace.WriteLine(e.StackTrace);
            }
            Console.ReadKey(true);
        }

        public static void ImportPosts(string XmlPath, string ConnectionString)
        {
            using (StreamReader sr = new StreamReader(Path.Combine(XmlPath, "posts.xml")))
            using (XmlTextReader rdr = new XmlTextReader(sr))
            using (SqlConnection cn = new SqlConnection(ConnectionString))
            using (SqlCommand cmd = new SqlCommand(insertString, cn))
            {
                cmd.Parameters.Add("@Id", SqlDbType.Int);
                cmd.Parameters.Add("@PostTypeId", SqlDbType.Int);
                cmd.Parameters.Add("@AcceptedAnswerId", SqlDbType.Int);
                cmd.Parameters.Add("@CreationDate", SqlDbType.DateTime);
                cmd.Parameters.Add("@Score", SqlDbType.Int);
                cmd.Parameters.Add("@ViewCount", SqlDbType.Int);
                cmd.Parameters.Add("@Body", SqlDbType.NVarChar);
                cmd.Parameters.Add("@OwnerUserId", SqlDbType.Int);
                cmd.Parameters.Add("@OwnerDisplayName", SqlDbType.NVarChar, 40);
                cmd.Parameters.Add("@LastEditorUserId", SqlDbType.Int);
                cmd.Parameters.Add("@LastEditDate", SqlDbType.DateTime);
                cmd.Parameters.Add("@LastActivityDate", SqlDbType.DateTime);
                cmd.Parameters.Add("@Title", SqlDbType.NVarChar, 250);
                cmd.Parameters.Add("@Tags", SqlDbType.NVarChar, 150);
                cmd.Parameters.Add("@AnswerCount", SqlDbType.Int);
                cmd.Parameters.Add("@CommentCount", SqlDbType.Int);
                cmd.Parameters.Add("@FavoriteCount", SqlDbType.Int);
                cmd.Parameters.Add("@ClosedDate", SqlDbType.DateTime);
                cmd.Parameters.Add("@ParentId", SqlDbType.Int);

                Trace.Write(DateTime.Now.ToString() + Environment.NewLine + "Reading");
                int count = 0;
                cn.Open();
                while (rdr.Read())
                {
                    if (rdr.AttributeCount <= 5) continue; //everything but the xml declaration and the root element will have at least 5 attributes

                    cmd.Parameters[0].Value = rdr["Id"];
                    cmd.Parameters[1].Value = rdr["PostTypeId"];
                    cmd.Parameters[2].Value = rdr["AcceptedAnswerId"];
                    cmd.Parameters[3].Value = ParseDate(rdr["CreationDate"]);
                    cmd.Parameters[4].Value = rdr["Score"];
                    cmd.Parameters[5].Value = rdr["ViewCount"];
                    cmd.Parameters[6].Value = rdr["Body"];
                    cmd.Parameters[7].Value = rdr["OwnerUserId"];
                    cmd.Parameters[8].Value = rdr["OwnerDisplayName"];
                    cmd.Parameters[9].Value = rdr["LastEditorUserId"];
                    cmd.Parameters[10].Value = ParseDate(rdr["LastEditDate"]);
                    cmd.Parameters[11].Value = ParseDate(rdr["LastActivityDate"]);
                    cmd.Parameters[12].Value = rdr["Title"];
                    cmd.Parameters[13].Value = rdr["Tags"];
                    cmd.Parameters[14].Value = rdr["AnswerCount"];
                    cmd.Parameters[15].Value = rdr["CommentCount"];
                    cmd.Parameters[16].Value = rdr["FavoriteCount"];
                    cmd.Parameters[17].Value = ParseDate(rdr["ClosedDate"]);
                    cmd.Parameters[18].Value = rdr["ParentId"];

                    for (int i = 0; i < cmd.Parameters.Count; i++)
                        if (cmd.Parameters[i].Value == null)
                            cmd.Parameters[i].Value = DBNull.Value;

                    cmd.ExecuteNonQuery();

                    if (count++ % 5000 == 0) Trace.Write(".");
                }
                Trace.WriteLine(string.Format("\n\n{0:d}\nFinished {1} records.", DateTime.Now, count));
            }
        }

        public static object ParseDate(string dateValue)
        {
            if (string.IsNullOrEmpty(dateValue)) return DBNull.Value;
            return DateTime.ParseExact(dateValue, "yyyy-MM-ddTHH:mm:ss.fff", null);
        }
    }
}

I got past the problem by writing a small console app in .Net (code to follow). It imports records 1 at a time (didn't even take the time to mess with sqlbulkcopy object) and ran while I went on lunch break. I forgot to write timestamps to the console, so I don't know exactly how long it took. My best estimate is just over 20 minutes. Be warned that the next issue is tempdb: left with the default settings, tempdb will grow very large during the import. You'll want to restart the sql server service when it's finished.

using System;
using System.Collections.Generic;
using System.Text;
using System.Diagnostics;
using System.Data;
using System.Data.SqlClient;
using System.IO;
using System.Xml;
namespace ImportPostsTable
{
    class Program
    {
        //TODO: pull connection string, data path from app.config or command line
        static string cnString = "Data Source=localhost;Database=SO;Trusted_Connection=True;";
        static string dataPath = @"C:\temp"; 
        static string insertString = "INSERT INTO Posts VALUES (@Id, @PostTypeID, @AcceptedAnswerId, @CreationDate, @Score, @ViewCount, @Body, @OwnerUserId, @OwnerDisplayName, @LastEditorUserId, @LastEditDate, @LastActivityDate, @Title, @Tags, @AnswerCount, @CommentCount, @FavoriteCount, @ClosedDate, @ParentId)";
        static void Main(string[] args)
        {
            Trace.Listeners.Add(new ConsoleTraceListener());

            try
            {
                 ImportPosts(dataPath, cnString);
            }
            catch (Exception e)
            {
                Trace.WriteLine(e.Message);
                Trace.WriteLine(e.StackTrace);
            }
            Console.ReadKey(true);
        }

        public static void ImportPosts(string XmlPath, string ConnectionString)
        {
            using (StreamReader sr = new StreamReader(Path.Combine(XmlPath, "posts.xml")))
            using (XmlTextReader rdr = new XmlTextReader(sr))
            using (SqlConnection cn = new SqlConnection(ConnectionString))
            using (SqlCommand cmd = new SqlCommand(insertString, cn))
            {
                cmd.Parameters.Add("@Id", SqlDbType.Int);
                cmd.Parameters.Add("@PostTypeId", SqlDbType.Int);
                cmd.Parameters.Add("@AcceptedAnswerId", SqlDbType.Int);
                cmd.Parameters.Add("@CreationDate", SqlDbType.DateTime);
                cmd.Parameters.Add("@Score", SqlDbType.Int);
                cmd.Parameters.Add("@ViewCount", SqlDbType.Int);
                cmd.Parameters.Add("@Body", SqlDbType.NVarChar);
                cmd.Parameters.Add("@OwnerUserId", SqlDbType.Int);
                cmd.Parameters.Add("@OwnerDisplayName", SqlDbType.NVarChar, 40);
                cmd.Parameters.Add("@LastEditorUserId", SqlDbType.Int);
                cmd.Parameters.Add("@LastEditDate", SqlDbType.DateTime);
                cmd.Parameters.Add("@LastActivityDate", SqlDbType.DateTime);
                cmd.Parameters.Add("@Title", SqlDbType.NVarChar, 250);
                cmd.Parameters.Add("@Tags", SqlDbType.NVarChar, 150);
                cmd.Parameters.Add("@AnswerCount", SqlDbType.Int);
                cmd.Parameters.Add("@CommentCount", SqlDbType.Int);
                cmd.Parameters.Add("@FavoriteCount", SqlDbType.Int);
                cmd.Parameters.Add("@ClosedDate", SqlDbType.DateTime);
                cmd.Parameters.Add("@ParentId", SqlDbType.Int);

                Trace.Write(DateTime.Now.ToString() + Environment.NewLine + "Reading");
                int count = 0;
                cn.Open();
                while (rdr.Read())
                {
                    if (rdr.AttributeCount <= 5) continue; //everything but the xml declaration and the root element will have at least 5 attributes

                    cmd.Parameters[0].Value = rdr["Id"];
                    cmd.Parameters[1].Value = rdr["PostTypeId"];
                    cmd.Parameters[2].Value = rdr["AcceptedAnswerId"];
                    cmd.Parameters[3].Value = ParseDate(rdr["CreationDate"]);
                    cmd.Parameters[4].Value = rdr["Score"];
                    cmd.Parameters[5].Value = rdr["ViewCount"];
                    cmd.Parameters[6].Value = rdr["Body"];
                    cmd.Parameters[7].Value = rdr["OwnerUserId"];
                    cmd.Parameters[8].Value = rdr["OwnerDisplayName"];
                    cmd.Parameters[9].Value = rdr["LastEditorUserId"];
                    cmd.Parameters[10].Value = ParseDate(rdr["LastEditDate"]);
                    cmd.Parameters[11].Value = ParseDate(rdr["LastActivityDate"]);
                    cmd.Parameters[12].Value = rdr["Title"];
                    cmd.Parameters[13].Value = rdr["Tags"];
                    cmd.Parameters[14].Value = rdr["AnswerCount"];
                    cmd.Parameters[15].Value = rdr["CommentCount"];
                    cmd.Parameters[16].Value = rdr["FavoriteCount"];
                    cmd.Parameters[17].Value = ParseDate(rdr["ClosedDate"]);
                    cmd.Parameters[18].Value = rdr["ParentId"];

                    for (int i = 0; i < cmd.Parameters.Count; i++)
                        if (cmd.Parameters[i].Value == null)
                            cmd.Parameters[i].Value = DBNull.Value;

                    cmd.ExecuteNonQuery();

                    if (count++ % 5000 == 0) Trace.Write(".");
                }
                Trace.WriteLine(string.Format("\n\n{0:d}\nFinished {1} records.", DateTime.Now, count));
            }
        }

        public static object ParseDate(string dateValue)
        {
            if (string.IsNullOrEmpty(dateValue)) return DBNull.Value;
            return DateTime.ParseExact(dateValue, "yyyy-MM-ddTHH:mm:ss.fff", null);
        }
    }
}
帅哥哥的热头脑2024-08-12 04:38:12

SQL Server 没有任何数据类型能够在行的列中存储超过 2 GB 的数据。

如果您有足够的磁盘空间,数据库(除 Express Edition 之外)可以存储超过 2 GB 的数据,但存在行列数据量限制。

SQL Server does not have any data type capable of storing more than 2 gigabytes of data in a column of a row.

Databases (other than Express Editions) can store much more than 2 GB provided you have enough disk space, but the limitation on the amount of data in a column of a row applies.

玩物2024-08-12 04:38:12

您所需要做的就是将 post 文件拆分为 2-3 个文件,并且您仍然可以使用您使用过的导入查询。

例如,您可以通过使用 EditPad 打开文件(不要使用 Notepad++,因为文件太大而失败,而使用 EditPad 会立即打开),然后剪切并粘贴几个文件(不过,请使它们正确的 XML,复制标头和结束标记)。

All you need to do is split the post file into 2-3 files and you can still use the import query you used.

You can do this by opening the file with EditPad for example (don't use Notepad++ for this it'll fail because the file is too big, while with EditPad it'll open instantly) and just cutting and pasting in a few files (make them proper XML though, copy the header and end tag).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文