.NET 中的字符串连接和线程
(纯粹出于好奇)在 VB.net 中,我测试了连接 100k 字符串,发现仅一个线程就可以在 23 毫秒内完成。两个线程(每个线程连接 50k)然后最后将两个线程连接起来需要 30 毫秒。从性能角度来看,在仅处理 100k 级联时使用多个线程似乎没有什么好处。然后我尝试了 300 万个字符串连接,每个处理 1.5MM 的两个线程总是拆除一个处理所有 300 万个字符串的线程。我想象在某个时候使用 3 个线程会变得有益,然后是 4 个,依此类推。在 .NET 中是否有更有效的方法来连接数百万个字符串?线程值得使用吗?
仅供参考,这是我写的代码:
Imports System.Text
Imports System.Threading
Imports System.IO
Public Class Form1
Dim sbOne As StringBuilder
Dim sbTwo As StringBuilder
Dim roof As Integer
Dim results As DataTable
Sub clicked(s As Object, e As EventArgs) Handles Button1.Click
results = New DataTable
results.Columns.Add("one thread")
results.Columns.Add("two threads")
results.Columns.Add("roof")
For i As Integer = 1 To 3000000 Step 100000
roof = i
Dim test() As Double = runTest()
results.Rows.Add(test(0), test(1), i)
Console.WriteLine(roof)
Next
Dim output As New StringBuilder
For Each C As DataColumn In results.Columns
output.Append(C)
output.Append(Chr(9))
Next
output.Append(vbCrLf)
For Each R As DataRow In results.Rows
For Each C As DataColumn In results.Columns
output.Append(R(C))
output.Append(Chr(9))
Next
output.Append(vbCrLf)
Next
File.WriteAllText("c:\users\username\desktop\sbtest.xls", output.ToString)
Console.WriteLine("done")
End Sub
Function runTest() As Double()
Dim sb As New StringBuilder
Dim started As DateTime = Now
For i As Integer = 1 To roof
sb.Append(i)
Next
Dim result As String = sb.ToString
Dim test1 As Double = Now.Subtract(started).TotalMilliseconds
sbOne = New StringBuilder
sbTwo = New StringBuilder
Dim one As New Thread(AddressOf tOne)
Dim two As New Thread(AddressOf tTwo)
started = Now
one.Start()
two.Start()
Do While one.IsAlive Or two.IsAlive
Loop
result = String.Concat(one.ToString, two.ToString)
Dim test2 As Double = Now.Subtract(started).TotalMilliseconds
Return {test1, test2}
End Function
Sub tOne()
For i As Integer = 1 To roof / 2
sbOne.Append(i)
Next
End Sub
Sub tTwo()
For i As Integer = roof / 2 To roof
sbTwo.Append(i)
Next
End Sub
End Class
(Out of pure curiosity) In VB.net, I tested concatenating 100k strings and found out one thread alone did it in 23 milliseconds. Two threads (each concatenating 50k) then joining the two at the end took 30 milliseconds. Performance wise, it didn't seem beneficial to utilize multiple threads when dealing with only 100k concatenations. Then I tried 3 million string concatenations and two threads each handling 1.5MM always demolished one thread handling all 3 million. I imagine at some point using 3 threads becomes beneficial, then 4, and so on. Is there a more efficient way to concatenate millions of strings in .NET? Are threads worth using?
fyi, this is the code I wrote:
Imports System.Text
Imports System.Threading
Imports System.IO
Public Class Form1
Dim sbOne As StringBuilder
Dim sbTwo As StringBuilder
Dim roof As Integer
Dim results As DataTable
Sub clicked(s As Object, e As EventArgs) Handles Button1.Click
results = New DataTable
results.Columns.Add("one thread")
results.Columns.Add("two threads")
results.Columns.Add("roof")
For i As Integer = 1 To 3000000 Step 100000
roof = i
Dim test() As Double = runTest()
results.Rows.Add(test(0), test(1), i)
Console.WriteLine(roof)
Next
Dim output As New StringBuilder
For Each C As DataColumn In results.Columns
output.Append(C)
output.Append(Chr(9))
Next
output.Append(vbCrLf)
For Each R As DataRow In results.Rows
For Each C As DataColumn In results.Columns
output.Append(R(C))
output.Append(Chr(9))
Next
output.Append(vbCrLf)
Next
File.WriteAllText("c:\users\username\desktop\sbtest.xls", output.ToString)
Console.WriteLine("done")
End Sub
Function runTest() As Double()
Dim sb As New StringBuilder
Dim started As DateTime = Now
For i As Integer = 1 To roof
sb.Append(i)
Next
Dim result As String = sb.ToString
Dim test1 As Double = Now.Subtract(started).TotalMilliseconds
sbOne = New StringBuilder
sbTwo = New StringBuilder
Dim one As New Thread(AddressOf tOne)
Dim two As New Thread(AddressOf tTwo)
started = Now
one.Start()
two.Start()
Do While one.IsAlive Or two.IsAlive
Loop
result = String.Concat(one.ToString, two.ToString)
Dim test2 As Double = Now.Subtract(started).TotalMilliseconds
Return {test1, test2}
End Function
Sub tOne()
For i As Integer = 1 To roof / 2
sbOne.Append(i)
Next
End Sub
Sub tTwo()
For i As Integer = roof / 2 To roof
sbTwo.Append(i)
Next
End Sub
End Class
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
线程是为比字符串连接更昂贵的任务而设计的。
字符串连接涉及分配和复制内存;这不是一项计算密集型任务。
处理计算密集型任务时应使用多线程,并避免阻塞 UI 线程。
线程对于并行化等待不同事物的任务也很有用(例如,到多个慢速服务器的网络 IO,或网络与磁盘 IO)
Threads are designed for tasks more expensive than string concatenation.
String concatenation involves allocating and copying memory; it's not a very comnpute-intensive task.
Multi-threading should be used when dealing with computationally intensive tasks, and to avoid blocking the UI thread.
Threading can also be useful to parallize tasks that wait on different things (eg, network IO to multiple slow servers, or network vs. disk IO)
查看 StringBuilder 上的 .EnsureCapacity 子例程。如果您正在进行大量连接并且大致知道字符数,则可以一次性初始化字符串生成器的缓冲区,而不是让它动态发生。您应该会看到更多改进。
http://msdn.microsoft.com/en-us /library/system.text.stringbuilder.ensurecapacity.aspx
Check out the .EnsureCapacity subroutine on StringBuilder. If you are doing a lot of concatenation and know roughly the number of characters, you can initialize the stringbuilder's buffer all at once instead of letting it happen dynamically. You should see some more improvement.
http://msdn.microsoft.com/en-us/library/system.text.stringbuilder.ensurecapacity.aspx