Oracle Cloud上的ISCSI磁盘上的Elasticsearch慢速查询响应
我正在使用Elastic 8从AWS到Oracle Cloud进行迁移, 我制作了快照,该索引成功恢复了,但是当具有许多同时连接时,这种弹性需要很长时间才能返回答案。
AWS上的这些机器非常完美,正常工作,这是她的信息
aws 3x节点机器
8gb RAM 2 CPUS Disk SSD NVME JVM heapsize 5gb
Elastic version 7.1 *Query Time 500ms*
iowait (AWS 15,92% Disk: SSD NVME)
aws disk ssd nvme(physic)
[root@es-4-node-1_subnet-1 ec2-user]# fio --name TEST --eta-newline=5s --filename=temp.file --rw=read --size=2g --io_size=10g --blocksize=1024k --ioengine=libaio --fsync=10000 --iodepth=32 --direct=1 --numjobs=1 --runtime=60 --group_reporting
TEST: (g=0): rw=read, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, iodepth=32
fio-2.14
Starting 1 process
Jobs: 1 (f=1): [R(1)] [18.4% done] [246.0MB/0KB/0KB /s] [246/0/0 iops] [eta 00m:31s]
Jobs: 1 (f=1): [R(1)] [30.0% done] [246.0MB/0KB/0KB /s] [246/0/0 iops] [eta 00m:28s]
Jobs: 1 (f=1): [R(1)] [41.5% done] [245.0MB/0KB/0KB /s] [245/0/0 iops] [eta 00m:24s]
Jobs: 1 (f=1): [R(1)] [53.7% done] [239.0MB/0KB/0KB /s] [239/0/0 iops] [eta 00m:19s]
Jobs: 1 (f=1): [R(1)] [65.9% done] [247.0MB/0KB/0KB /s] [247/0/0 iops] [eta 00m:14s]
Jobs: 1 (f=1): [R(1)] [78.0% done] [242.0MB/0KB/0KB /s] [242/0/0 iops] [eta 00m:09s]
Jobs: 1 (f=1): [R(1)] [88.1% done] [241.0MB/0KB/0KB /s] [241/0/0 iops] [eta 00m:05s]
Jobs: 1 (f=1): [R(1)] [100.0% done] [251.0MB/0KB/0KB /s] [251/0/0 iops] [eta 00m:00s]
TEST: (groupid=0, jobs=1): err= 0: pid=29174: Thu Apr 14 04:52:41 2022
read : io=10240MB, bw=255246KB/s, iops=249, runt= 41081msec
slat (usec): min=26, max=41738, avg=3994.68, stdev=6172.41
clat (msec): min=9, max=181, avg=123.92, stdev=22.70
lat (msec): min=9, max=189, avg=127.91, stdev=23.31
clat percentiles (msec):
| 1.00th=[ 13], 5.00th=[ 99], 10.00th=[ 106], 20.00th=[ 116],
| 30.00th=[ 123], 40.00th=[ 126], 50.00th=[ 128], 60.00th=[ 129],
| 70.00th=[ 131], 80.00th=[ 137], 90.00th=[ 145], 95.00th=[ 151],
| 99.00th=[ 159], 99.50th=[ 167], 99.90th=[ 180], 99.95th=[ 180],
| 99.99th=[ 182]
lat (msec) : 10=0.02%, 20=2.03%, 50=0.87%, 100=2.49%, 250=94.59%
cpu : usr=0.11%, sys=1.15%, ctx=9640, majf=0, minf=8204
IO depths : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.4%, 16=0.8%, 32=98.5%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued : total=r=10240/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
READ: io=10240MB, aggrb=255245KB/s, minb=255245KB/s, maxb=255245KB/s, mint=41081msec, maxt=41081msec
Disk stats (read/write):
nvme0n1: ios=46378/222, merge=0/30, ticks=1544352/5552, in_queue=1500556, util=99.15%
,我的问题在这款机器上7.1在OCI上使用Elastic 8,但是请求响应时间太长了,我不知道问题是否是OCI使用的虚拟化磁盘,而我的Elasticsearch慢速r/w具有2TB的大小。 +50亿个文件 Oracle 3x节点机器
16gb RAM 4 CPUS Disk ISCSI - JVM heapsize 10gb
Elastic version 8 *Query time up to 10 seconds / 20 seconds / 30 seconds / +1 minute*
(It only increases the time and does not return the answer or take too long)
iowait (Oracle 39,71% Disk: ISCSI "network storage")
Oracle ISCSI(网络存储磁盘)
root@es-master-1:/home# fio --name TEST --eta-newline=5s --filename=temp.file --rw=read --size=2g --io_size=10g --blocksize=1024k --ioengine=libaio --fsync=10000 --iodepth=32 --direct=1 --numjobs=1 --runtime=60 --group_reporting
TEST: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=32
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [R(1)][16.7%][r=239MiB/s][r=239 IOPS][eta 00m:35s]
Jobs: 1 (f=1): [R(1)][31.0%][r=234MiB/s][r=234 IOPS][eta 00m:29s]
Jobs: 1 (f=1): [R(1)][45.2%][r=196MiB/s][r=196 IOPS][eta 00m:23s]
Jobs: 1 (f=1): [R(1)][59.5%][r=237MiB/s][r=237 IOPS][eta 00m:17s]
Jobs: 1 (f=1): [R(1)][73.8%][r=264MiB/s][r=264 IOPS][eta 00m:11s]
Jobs: 1 (f=1): [R(1)][88.1%][r=251MiB/s][r=251 IOPS][eta 00m:05s]
Jobs: 1 (f=1): [R(1)][100.0%][r=190MiB/s][r=190 IOPS][eta 00m:00s]
TEST: (groupid=0, jobs=1): err= 0: pid=14554: Thu Apr 14 04:52:48 2022
read: IOPS=238, BW=239MiB/s (250MB/s)(10.0GiB/42923msec)
slat (usec): min=12, max=275, avg=26.39, stdev=12.34
clat (msec): min=15, max=350, avg=134.02, stdev=99.43
lat (msec): min=15, max=350, avg=134.05, stdev=99.43
clat percentiles (msec):
| 1.00th=[ 24], 5.00th=[ 40], 10.00th=[ 51], 20.00th=[ 53],
| 30.00th=[ 55], 40.00th=[ 58], 50.00th=[ 73], 60.00th=[ 94],
| 70.00th=[ 245], 80.00th=[ 259], 90.00th=[ 266], 95.00th=[ 288],
| 99.00th=[ 313], 99.50th=[ 330], 99.90th=[ 347], 99.95th=[ 347],
| 99.99th=[ 351]
bw ( KiB/s): min=151552, max=417792, per=99.70%, avg=243557.27, stdev=53597.76, samples=85
iops : min= 148, max= 408, avg=237.84, stdev=52.35, samples=85
lat (msec) : 20=0.31%, 50=10.01%, 100=51.48%, 250=10.07%, 500=28.12%
cpu : usr=0.14%, sys=0.88%, ctx=8661, majf=0, minf=8203
IO depths : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.4%, 16=0.8%, 32=98.5%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued rwts: total=10240,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
READ: bw=239MiB/s (250MB/s), 239MiB/s-239MiB/s (250MB/s-250MB/s), io=10.0GiB (10.7GB), run=42923-42923msec
Disk stats (read/write):
sda: ios=10521/236, merge=0/544, ticks=1399849/35181, in_queue=1435030, util=96.10%
是什么会导致我这些放缓,这是由于低速磁盘而引起的OCI中的问题?响应时间仅根据连接的数量增加,但是AWS机器较低,但它会非常快地返回信息,在OCI中,它永远使用,我该如何确定问题是否有弹性的配置,或机器有问题吗?
在OCI上,我的基准测试可以通过多个连接运行,但是当我将流量从AWS上的旧版本重定向到OCI时,应用程序开始需要很长时间才能响应,直到弹性完全冷冻,并且最多需要10分钟才能返回答案
。是拉力赛基准的结果,我不知道这是否好。 rally Benchmark lasticsearch https://pastebin.com/vjhdetr4
I'm doing a migration from Elastic version 7.1 from AWS to Oracle Cloud using elastic 8,
I made the snapshot the index was restored successfully, but the elastic is taking a long time to return the answer when it has many simultaneous connections.
These machine on AWS are perfect and working properly, here is her information
AWS 3x Nodes Machine
8gb RAM 2 CPUS Disk SSD NVME JVM heapsize 5gb
Elastic version 7.1 *Query Time 500ms*
iowait (AWS 15,92% Disk: SSD NVME)
AWS DISK SSD NVME (PHYSIC)
[root@es-4-node-1_subnet-1 ec2-user]# fio --name TEST --eta-newline=5s --filename=temp.file --rw=read --size=2g --io_size=10g --blocksize=1024k --ioengine=libaio --fsync=10000 --iodepth=32 --direct=1 --numjobs=1 --runtime=60 --group_reporting
TEST: (g=0): rw=read, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, iodepth=32
fio-2.14
Starting 1 process
Jobs: 1 (f=1): [R(1)] [18.4% done] [246.0MB/0KB/0KB /s] [246/0/0 iops] [eta 00m:31s]
Jobs: 1 (f=1): [R(1)] [30.0% done] [246.0MB/0KB/0KB /s] [246/0/0 iops] [eta 00m:28s]
Jobs: 1 (f=1): [R(1)] [41.5% done] [245.0MB/0KB/0KB /s] [245/0/0 iops] [eta 00m:24s]
Jobs: 1 (f=1): [R(1)] [53.7% done] [239.0MB/0KB/0KB /s] [239/0/0 iops] [eta 00m:19s]
Jobs: 1 (f=1): [R(1)] [65.9% done] [247.0MB/0KB/0KB /s] [247/0/0 iops] [eta 00m:14s]
Jobs: 1 (f=1): [R(1)] [78.0% done] [242.0MB/0KB/0KB /s] [242/0/0 iops] [eta 00m:09s]
Jobs: 1 (f=1): [R(1)] [88.1% done] [241.0MB/0KB/0KB /s] [241/0/0 iops] [eta 00m:05s]
Jobs: 1 (f=1): [R(1)] [100.0% done] [251.0MB/0KB/0KB /s] [251/0/0 iops] [eta 00m:00s]
TEST: (groupid=0, jobs=1): err= 0: pid=29174: Thu Apr 14 04:52:41 2022
read : io=10240MB, bw=255246KB/s, iops=249, runt= 41081msec
slat (usec): min=26, max=41738, avg=3994.68, stdev=6172.41
clat (msec): min=9, max=181, avg=123.92, stdev=22.70
lat (msec): min=9, max=189, avg=127.91, stdev=23.31
clat percentiles (msec):
| 1.00th=[ 13], 5.00th=[ 99], 10.00th=[ 106], 20.00th=[ 116],
| 30.00th=[ 123], 40.00th=[ 126], 50.00th=[ 128], 60.00th=[ 129],
| 70.00th=[ 131], 80.00th=[ 137], 90.00th=[ 145], 95.00th=[ 151],
| 99.00th=[ 159], 99.50th=[ 167], 99.90th=[ 180], 99.95th=[ 180],
| 99.99th=[ 182]
lat (msec) : 10=0.02%, 20=2.03%, 50=0.87%, 100=2.49%, 250=94.59%
cpu : usr=0.11%, sys=1.15%, ctx=9640, majf=0, minf=8204
IO depths : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.4%, 16=0.8%, 32=98.5%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued : total=r=10240/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
READ: io=10240MB, aggrb=255245KB/s, minb=255245KB/s, maxb=255245KB/s, mint=41081msec, maxt=41081msec
Disk stats (read/write):
nvme0n1: ios=46378/222, merge=0/30, ticks=1544352/5552, in_queue=1500556, util=99.15%
And my problem is on this machine the elastic snapshot from 7.1 to this on OCI with elastic 8, but the request response time is too long i dont know if the problem is that kind of virtualized disk that OCI uses, with slow R/W my ElasticSearch has 2TB of size. +5 Billions of Documents
Oracle 3x Nodes Machine
16gb RAM 4 CPUS Disk ISCSI - JVM heapsize 10gb
Elastic version 8 *Query time up to 10 seconds / 20 seconds / 30 seconds / +1 minute*
(It only increases the time and does not return the answer or take too long)
iowait (Oracle 39,71% Disk: ISCSI "network storage")
Oracle ISCSI (Network Storage Disk)
root@es-master-1:/home# fio --name TEST --eta-newline=5s --filename=temp.file --rw=read --size=2g --io_size=10g --blocksize=1024k --ioengine=libaio --fsync=10000 --iodepth=32 --direct=1 --numjobs=1 --runtime=60 --group_reporting
TEST: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=32
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [R(1)][16.7%][r=239MiB/s][r=239 IOPS][eta 00m:35s]
Jobs: 1 (f=1): [R(1)][31.0%][r=234MiB/s][r=234 IOPS][eta 00m:29s]
Jobs: 1 (f=1): [R(1)][45.2%][r=196MiB/s][r=196 IOPS][eta 00m:23s]
Jobs: 1 (f=1): [R(1)][59.5%][r=237MiB/s][r=237 IOPS][eta 00m:17s]
Jobs: 1 (f=1): [R(1)][73.8%][r=264MiB/s][r=264 IOPS][eta 00m:11s]
Jobs: 1 (f=1): [R(1)][88.1%][r=251MiB/s][r=251 IOPS][eta 00m:05s]
Jobs: 1 (f=1): [R(1)][100.0%][r=190MiB/s][r=190 IOPS][eta 00m:00s]
TEST: (groupid=0, jobs=1): err= 0: pid=14554: Thu Apr 14 04:52:48 2022
read: IOPS=238, BW=239MiB/s (250MB/s)(10.0GiB/42923msec)
slat (usec): min=12, max=275, avg=26.39, stdev=12.34
clat (msec): min=15, max=350, avg=134.02, stdev=99.43
lat (msec): min=15, max=350, avg=134.05, stdev=99.43
clat percentiles (msec):
| 1.00th=[ 24], 5.00th=[ 40], 10.00th=[ 51], 20.00th=[ 53],
| 30.00th=[ 55], 40.00th=[ 58], 50.00th=[ 73], 60.00th=[ 94],
| 70.00th=[ 245], 80.00th=[ 259], 90.00th=[ 266], 95.00th=[ 288],
| 99.00th=[ 313], 99.50th=[ 330], 99.90th=[ 347], 99.95th=[ 347],
| 99.99th=[ 351]
bw ( KiB/s): min=151552, max=417792, per=99.70%, avg=243557.27, stdev=53597.76, samples=85
iops : min= 148, max= 408, avg=237.84, stdev=52.35, samples=85
lat (msec) : 20=0.31%, 50=10.01%, 100=51.48%, 250=10.07%, 500=28.12%
cpu : usr=0.14%, sys=0.88%, ctx=8661, majf=0, minf=8203
IO depths : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.4%, 16=0.8%, 32=98.5%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued rwts: total=10240,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
READ: bw=239MiB/s (250MB/s), 239MiB/s-239MiB/s (250MB/s-250MB/s), io=10.0GiB (10.7GB), run=42923-42923msec
Disk stats (read/write):
sda: ios=10521/236, merge=0/544, ticks=1399849/35181, in_queue=1435030, util=96.10%
What could be causing me these slowdowns, is the problem in the OCI due to a low speed disk? The response time only increases according to the number of connections, however the AWS machine is inferior but it returns the information very fast and in the OCI it is taking forever, How can I determine the problem is there any configuration for elastic, or is the problem with the machine?
On OCI my benchmark runs fine with multiple connections but when I redirect traffic from the old version on AWS to OCI the application starts to take a long time to respond until the elastic is totally frozen and it takes up to 10 minutes to return the answer
This is the result from Rally Benchmark, i dont know if this is a good.
Rally Benchmark ElasticSearch
https://pastebin.com/vjhDEtR4
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论