linq commitchanges 内存不足
我有一个大约有 180,000 条记录的数据库。我正在尝试将 pdf 文件附加到每条记录中。每个 pdf 大小约为 250 kb。然而,大约一分钟后,我的程序开始占用大约 1 GB 的内存,我必须停止它。我尝试这样做,以便在更新后删除对每个 linq 对象的引用,但这似乎没有帮助。怎样才能把引用说清楚呢?
感谢您的帮助
Private Sub uploadPDFs(ByVal args() As String)
Dim indexFiles = (From indexFile In dataContext.IndexFiles
Where indexFile.PDFContent = Nothing
Order By indexFile.PDFFolder).ToList
Dim currentDirectory As IO.DirectoryInfo
Dim currentFile As IO.FileInfo
Dim tempIndexFile As IndexFile
While indexFiles.Count > 0
tempIndexFile = indexFiles(0)
indexFiles = indexFiles.Skip(1).ToList
currentDirectory = 'I set the directory that I need
currentFile = 'I get the file that I need
writePDF(currentDirectory, currentFile, tempIndexFile)
End While
End Sub
Private Sub writePDF(ByVal directory As IO.DirectoryInfo, ByVal file As IO.FileInfo, ByVal indexFile As IndexFile)
Dim bytes() As Byte
bytes = getFileStream(file)
indexFile.PDFContent = bytes
dataContext.SubmitChanges()
counter += 1
If counter Mod 10 = 0 Then Console.WriteLine(" saved file " & file.Name & " at " & directory.Name)
End Sub
Private Function getFileStream(ByVal fileInfo As IO.FileInfo) As Byte()
Dim fileStream = fileInfo.OpenRead()
Dim bytesLength As Long = fileStream.Length
Dim bytes(bytesLength) As Byte
fileStream.Read(bytes, 0, bytesLength)
fileStream.Close()
Return bytes
End Function
I have a database with about 180,000 records. I'm trying to attach a pdf file to each of those records. Each pdf is about 250 kb in size. However, after about a minute my program starts taking about about a GB of memory and I have to stop it. I tried doing it so the reference to each linq object is removed once it's updated but that doesn't seem to help. How can I make it clear the reference?
Thanks for your help
Private Sub uploadPDFs(ByVal args() As String)
Dim indexFiles = (From indexFile In dataContext.IndexFiles
Where indexFile.PDFContent = Nothing
Order By indexFile.PDFFolder).ToList
Dim currentDirectory As IO.DirectoryInfo
Dim currentFile As IO.FileInfo
Dim tempIndexFile As IndexFile
While indexFiles.Count > 0
tempIndexFile = indexFiles(0)
indexFiles = indexFiles.Skip(1).ToList
currentDirectory = 'I set the directory that I need
currentFile = 'I get the file that I need
writePDF(currentDirectory, currentFile, tempIndexFile)
End While
End Sub
Private Sub writePDF(ByVal directory As IO.DirectoryInfo, ByVal file As IO.FileInfo, ByVal indexFile As IndexFile)
Dim bytes() As Byte
bytes = getFileStream(file)
indexFile.PDFContent = bytes
dataContext.SubmitChanges()
counter += 1
If counter Mod 10 = 0 Then Console.WriteLine(" saved file " & file.Name & " at " & directory.Name)
End Sub
Private Function getFileStream(ByVal fileInfo As IO.FileInfo) As Byte()
Dim fileStream = fileInfo.OpenRead()
Dim bytesLength As Long = fileStream.Length
Dim bytes(bytesLength) As Byte
fileStream.Read(bytes, 0, bytesLength)
fileStream.Close()
Return bytes
End Function
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我建议您使用
Take
(在调用ToList
之前)一次处理特定数量的项目。阅读(比如说)10,在全部上设置PDFContent
,调用SubmitChanges
,然后重新开始。 (我不确定此时是否应该从新的 DataContext 开始,但这样做可能是最干净的。)顺便说一句,读取文件内容的代码是至少以几种方式损坏 - 但使用
File.ReadAllBytes
。此外,您处理列表逐渐缩小的方式确实效率很低 - 在获取 180,000 条记录后,您将构建一个包含 179,999 条记录的新列表,然后构建另一个包含 179,998 条记录的列表,依此类推。
I suggest you perform this in batches, using
Take
(before the call toToList
) to process a particular number of items at a time. Read (say) 10, set thePDFContent
on all of them, callSubmitChanges
, and then start again. (I'm not sure offhand whether you should start with a newDataContext
at that point, but it might be cleanest to do so.)As an aside, your code to read the contents of a file is broken in at least a couple of ways - but it would be simpler just to use
File.ReadAllBytes
in the first place.Also, your way of handling the list gradually shrinking is really inefficient - after fetching 180,000 records, you're then building a new list with 179,999 records, then another with 179,998 records etc.
DataContext 是否将 ObjectTrackingEnabled 设置为 true(默认值)?如果是这样,那么它将尝试保留其接触的所有数据的记录,从而阻止垃圾收集器收集其中的任何数据。
如果是这样,您应该能够通过定期处理 DataContext 并创建新的 DataContext 或关闭对象跟踪来解决这种情况。
Does the DataContext have ObjectTrackingEnabled set to true (the default value)? If so, then it will try to keep a record of essentially all the data it touches, thus preventing the garbage collector from being able to collect any of it.
If so, you should be able to fix the situation by periodically disposing the DataContext and creating a new one, or turning object tracking off.
好的。为了使用最少的内存,我们必须以块的形式更新数据上下文。我在下面放置了示例代码。可能有语法错误,因为我使用记事本输入它。
为了进一步优化速度,您可以将 recordID 作为匿名类型从 allitems 获取到数组中,并为该 PDF 字段设置 DelayLoading。
OK. To use the smallest amount of memory we have to update the datacontext in blocks. I've put a sample code below. Might have sytax errors since I'm using notepad to type it in.
To Further optimise the speed you can get the recordID's into an array from allitems as an anonymous type, and set DelayLoading on for that PDF field.