©XSIBackup-Free: Free Backup Software for ©VMWare ©ESXi

Forum ©XSIBackup: ©VMWare ©ESXi Backup Software


You are not logged in.

#1 2021-07-04 09:51:35

jakorsme
Member
Registered: 2021-06-18
Posts: 12

While restoring a VM other Windows VMs unusably slow.

I was restoring a VM to the SSD from a repository on our NAS across our internal 1gb network. 3 Vm's share this SSD as their primary drive.
During the entire time of the restore (some 8 hours), our Windows server on another VM (same ESXI) was unusable. services wouldn't start. A command on the Window's UI or Windows command prompt (even just hitting enter) took 20-30 seconds to be even recognized.
Completely unusable.
As soon as the restore completed, the Windows machine returned to normal.

I ran ESXTOP.
CPU was low (surprised me - I expected it to be high).
SSD Disk was active, but not crazy high.
Network was high (between 70-90% utilization).

I'm obviously concerned, if I essentially can't have other VMs on the box functional when doing a restore.
Maybe I should nice the restore, but then it's going to take much longer.

Offline

#2 2021-07-04 12:22:01

admin
Administrator
Registered: 2017-04-21
Posts: 2,055

Re: While restoring a VM other Windows VMs unusably slow.

Well, you don't say the most importants fact: the size of your VM. If it took >8 h to restore, we asume it must be bigger than 700GB, probably much more than that.

You don't say whether the other VM was running from the NAS, we assume it wasn't.

You obviously need to size your infrastructure to your needs. A full restore process is something that you hopefully don't have to accomplish very often, still you need to make sure that your other production VMs are functional in the meanwhile.

As you may imagine, we are constantly performing that kind of operations and what we observe is quite different: the rest of the production VMs still run, they become slower, as expected, still they run O.K.

The figures where you should have posed your attention the most are: MEM overcommit avg and CPU load average.

12:22:52pm up 138 days  1:27, 470 worlds, 1 VMs, 2 vCPUs; MEM overcommit avg: 0.00, 0.00, 0.00
12:08:05pm up 138 days  1:12, 473 worlds, 1 VMs, 2 vCPUs; CPU load average: 0.01, 0.01, 0.01

If you say that Load Average was low we assume these tripplet was below 2.00 all the time.

Still, we don't know what kind of hardware you have, the amount of memory: you may need more, or, in case you have an HP server, whether you have enabled the read/ write cache on the controller, enable HP controller cache.

You can always use a different NIC for the backup/ restore operations, this will alleviate the NIC load. Nice is not available in (c)ESXi

Offline

#3 2021-07-05 00:06:33

jakorsme
Member
Registered: 2021-06-18
Posts: 12

Re: While restoring a VM other Windows VMs unusably slow.

VM files participating in the restore are 71GB
All VMs (3) are running on ESXI on SSD on their dedicated host 3930K cpu, 32G ram.


Yes triplet parts were like .03 or less.
I don't recall overcommit numbers, but I'll watch for that going forward. There was no paging.

Now when I tried to bring up the restored VM it brought up a really old version of the vm. The XSIbackup backup was only a few days ago.

I did notice this backup has multiple vmdk's and snapshots. It's like it came up on the wrong one or something.

In the ESXI web console under manage snapshots it just hangs looking for snapshots.
The VM came up using JAKWEB_2.vmdk

Here's an ls -la of the vm files.

drwxr-xr-x    1 root     root          3780 Jul  4 22:43 .
drwxr-xr-t    1 root     root          2520 May  4 03:29 ..
-rw-r--r--    1 root     root     2147483648 Jul  4 07:12 JAKWEB-Snapshot1.vmem
-rw-r--r--    1 root     root       5772757 Jul  4 00:43 JAKWEB-Snapshot1.vmsn
-rw-r--r--    1 root     root         28762 Jul  4 07:12 JAKWEB-Snapshot2.vmsn
-rw-r--r--    1 root     root     107374182400 Jul  3 22:48 JAKWEB-flat.vmdk
-rw-r--r--    1 root     root          8684 Jul  4 22:43 JAKWEB.nvram
-rw-r--r--    1 root     root           589 Jul  3 22:48 JAKWEB.vmdk
-rw-r--r--    1 root     root           671 Jul  4 03:48 JAKWEB.vmsd
-rw-r--r--    1 root     root           671 Jul  4 07:12 JAKWEB.vmsd.tmp
-rw-r--r--    1 root     root          3345 Jul  4 22:43 JAKWEB.vmx
-rw-r--r--    1 root     root          3353 Jul  4 07:12 JAKWEB.vmx.tmp
-rw-r--r--    1 root     root           150 Jul  4 03:48 JAKWEB.vmxf
-rw-r--r--    1 root     root     107374182400 Jul  4 00:43 JAKWEB_1-flat.vmdk
-rw-r--r--    1 root     root           591 Jul  4 00:43 JAKWEB_1.vmdk
-rw-r--r--    1 root     root     22246797312 Jul  4 06:56 JAKWEB_2-000001-delta.vmdk
-rw-r--r--    1 root     root           379 Jul  4 06:56 JAKWEB_2-000001.vmdk
-rw-r--r--    1 root     root     107374182400 Jul  4 22:43 JAKWEB_2-flat.vmdk
-rw-r--r--    1 root     root           644 Jul  4 22:24 JAKWEB_2.vmdk
-rw-r--r--    1 root     root        246829 Jul  4 07:12 vmware-48.log
-rw-r--r--    1 root     root        258347 Jul  4 06:56 vmware-49.log
-rw-r--r--    1 root     root        713218 Jun 27 05:05 vmware-50.log
-rw-r--r--    1 root     root        610720 Jul  3 20:28 vmware-51.log
-rw-r--r--    1 root     root        205432 Jul  3 20:30 vmware-52.log
-rw-r--r--    1 root     root        651121 Jul  4 07:12 vmware-53.log
-rw-r--r--    1 root     root        211475 Jul  4 22:43 vmware.log
-rw-r--r--    1 root     root     115343360 Jul  4 06:56 vmx-JAKWEB-812182546-2.vswp
[root@elevation:/vmfs/volumes/588366f8-5f993700-3530-902b3451b465/JAKWEB]

I'm now trying to create a new VM using an OVF backup from April 2021.

Offline

#4 2021-07-05 08:26:35

admin
Administrator
Registered: 2017-04-21
Posts: 2,055

Re: While restoring a VM other Windows VMs unusably slow.

Your hardware is clearly underperforming, please do make sure that you have activated the raid controller cache, especially if you own an HP server, 70-80%/20-30% R/W will do it.

Our software backs your VMs up. We don't know what snapshots you have or what time they were taken at. The backup snapshot is taken to backup the VM and then deleted once the backup finishes. If you abruptly interrupt some backup/replica process (c)XSIBackup will still be able to detect it and delete the backup snapshot automatically.

If for whatever reason you receive some kind of unhandled segfault, then you will have to delete the backup snapshot manually. It is clearly marked as XSINNNNNN..., so there isn't any possibility that you take it for something else.

Files are copied, but there isn't any kind of magical mechanism that can teleport to some other previous state of the VM apart from your own user snapshots.

Why they were taken or what they are doing there is something we can't get into.

Offline

#5 2021-07-05 09:36:28

jakorsme
Member
Registered: 2021-06-18
Posts: 12

Re: While restoring a VM other Windows VMs unusably slow.

There is no raid controller nor raid controller cache on the ESXI server.
XSIBackup restore has restored it to a very old state (like a year ago). I had backed it up with XSIbackup less than a month ago. There were snapshots on it and it's like after restore the snapshots are being ignored and it comes up on the base revision of the vmdk.

Last edited by jakorsme (2021-07-05 10:00:36)

Offline

#6 2021-07-05 14:25:00

admin
Administrator
Registered: 2017-04-21
Posts: 2,055

Re: While restoring a VM other Windows VMs unusably slow.

We don't know what your setup is, we do know it's abnormally slow. If you don't want to share with the rest of us, it's OK, nonetheless, the less information you offer, the less other people will be able to help you.

We aren't really getting you.

Why do you attribute (c)XSIBackup-DC a potential that it doesn't have?.
It's a backup/ replication program. It copies what's there. It doesn't decide what should come up for you, it's you that has to have control on what happens.

If you have some snapshots containing intermediate data, of course, just check that the .vmx file is pointing to the snapshot you want to use, changing it to use the base disk or any of the snapshots in a chain is something trivial and we won't get into it.

In any case if the .vmx file was pointing to the base disk and you switched it on, you should restore again and use the original base disk to prevent corrupting the data. Restore the VM and edit the .vmx file selecting the snapshot to use before your switch it on.

Offline

#7 2021-07-05 22:23:15

jakorsme
Member
Registered: 2021-06-18
Posts: 12

Re: While restoring a VM other Windows VMs unusably slow.

I thought I explained the topology in previous replies. But here it is:
ESXI ------------- NAS
NET = 1gb, same subnet
ESXI = 3930K 6 core - 32G mem - SSD (spec 550r, 530w).
NAS = MYCloud Mirror, 5400rpm,

I think it may be the myCloud as the bottleneck (I'm going to do dd tests). I am also considering backing up in two phases (headache), by backingup/restoring from a network fast disk (repo on the fast disk) and then only archiving repos on the NAS. Basically what you talked about as the enterprise scheme. But I'm a bit concerned about the window of risk where the repo is sitting on a single drive without the raid 1 of the NAS.

I get what you're saying about XSI not affecting the contents of files (at least things like changes to a single data field - but corruption maybe). I'll change the .vmdk field in the .vmx to point to the most recent snapshot after restoring again and see if it comes up with more current data. Although there seems to be data problems as the ESXI UI under manage snapshots hangs when trying to enumerate snapshots, so there seems to be corruption ostensibly from the backup/restore as it was fine before the backup. That doesn't mean it's XSIBackup's fault, just that it likely occured during the backup/restore process.

Last edited by jakorsme (2021-07-05 22:26:34)

Offline

#8 2021-07-06 17:49:01

admin
Administrator
Registered: 2017-04-21
Posts: 2,055

Re: While restoring a VM other Windows VMs unusably slow.

WD MYCloud is not among the professional kind of NAS device you should expect close to theoretical limit speeds, still you should expect 20-30 MB/s of sustained throughput, especially if it's not a MyCloud NAS from the home line of appliances.

Your i7 3930K CPU is from 2011, still not bad, you should expect some decent performance from it.

Saying 1 gb NIC is close to saying nothing. Cheap devices, like the popular Realtek chipsets will not yield more than some MB/s when used in ESXi. Use Intel NICs, even cheap desktop ones will do it at good average rates.

Check your switch, it might be the culprit. You should not expect more than 15-20MB/s of effective troughput from cheap devices.

We don't know what's the state of your data, nonetheless what's important here is the -flat.vmdk files and the snapshots containing intermediate data. Try to not backup VMs with previous snapshots to simplify things. Also, if you know the topolgy of a chain of snapshots, which is rather simple, you can consolidate from any snapshot by using vmkfstools clone features too and generate a new consolidated VM.

Offline

#9 2021-07-06 22:51:45

jakorsme
Member
Registered: 2021-06-18
Posts: 12

Re: While restoring a VM other Windows VMs unusably slow.

The ESXI is using an Intel on-motherboard NIC.

<<DD test from ESXI to MyCloud Mirror>>

[root@elevation:~] time dd if=/dev/zero bs=1000000 count=4000 of=/vmfs/volumes/NAS1/testfile
4000+0 records in
4000+0 records out
real    6m 5.43s
user    0m 2.24s
sys    0m 0.00s
Approx. 10.9 MB/s

<<DD test from ESXI to external disk drive on Mac Pro - (Gdrive 7200, ~=255MB/s R/W)>
[root@elevation:~] time dd if=/dev/zero bs=1000000 count=4000 of=/vmfs/volumes/macpro/testfile
4000+0 records in
4000+0 records out
real    1m 9.10s
user    0m 2.16s
sys    0m 0.00s
Approx. 57.9 MB/s

XSIBackup to External (usb 3) Gdrive 7200 on macpro, ~=255MB/s R/W)
Backup end date: 2021-07-06T06:13:35
-------------------------------------------------------------------------------------------------------------
Time taken: 02:10:42 (7842 sec.)
-------------------------------------------------------------------------------------------------------------
Total time:     7842 sec.
-------------------------------------------------------------------------------------------------------------
Full file speed:                                                                        19.60 mb/s
-------------------------------------------------------------------------------------------------------------
Real data speed:                                                                        30.67 mb/s

During the above backup the cpu triplet was approx. 39, 30, 37 -  The memory overcommit you mentioned was always 0.


The XSI Backups/restores to the myCloud get approx 10MB/s and less.

-------------------------------------------
Now I'm completely confused.
I just did a restore (same one I did before to the esxi ssd that took 8 hours)
I did it again from the myCloud mirror, but this time to the esxi's vmdata disk drive (not the ssd).

It processed at 63MB/s. Maybe there's a problem with my esxi's ssd. It's the common denominator with all the very slow backups and restores.

Last edited by jakorsme (2021-07-07 02:08:32)

Offline

#10 2021-07-07 08:34:52

admin
Administrator
Registered: 2017-04-21
Posts: 2,055

Re: While restoring a VM other Windows VMs unusably slow.

Great, you narrowed your issue down a bit more. Maybe the SSD is not supported by ESXi and it's working in some sort of compatibility mode.
Use this kind of syntax when running dd tests

time dd if=/dev/random bs=1000000 count=4000 of=/vmfs/volumes/NAS1/testfile

The /dev/random device will ensure random strings of bytes are read, otherwise, using the /dev/zero device might not offer a real insight, as  optimizations in the NFS/iSCSI layer can blur the results.

Offline

Board footer