I noticed that just for one VM ( out of 20 ), XSI has not erased taken snapshots:
Each disk got 2 snapshots, from 2 backup jobs that run:
For every other VM, "xsi" snapshots are being removed after backup is completed.
What can be causing this? Are they safe to remove and merge through vsphere?
Last edited by Marcin (2022-02-07 14:02:12)
You need to take immediate action. Stop the backup for that VM and solve the issue.
You should see some error in the backup report or log, unless those snapshots where there before you started to run the backups.
Errors taking snapshots usually have to do with some rogue service on the guest not responding to VMWare Tools. I believe your snapshot is with --quiesce, isn't it?
Check the event viewer or VMWare Tools log
There were no any snapshots. Also email report is "green", and no errors in logs. I do not use "--quiesce", only "backup" and "exclude". This VM runs MSSQL, and here : https://33hops.com/xsibackup-using-backup-programs.html , under OneDiff it says to use "quiesce" for database servers?
What could be next steps to solve this? Is deleting those snapshots safe?
Last edited by Marcin (2022-02-07 14:17:21)
Yes, deleting the snapshots is safe. If you have some active DB server you need to quiesce the system, otherwise your replicated DB might get corrupted, nothing serious, still it would require to run a repair.
Can you share the backup log, it's weird it didn't register any error.
Nothing in error.log
backupdb.log shows weirdly 2 VMS in state 0, and wrong disk information.
Two VMs from above have disks of size 40Gb, and VM Tools not installed.
I dont know if this is related, but snapshot were being removed for that particular VM correctly, when backup command was defined only with that single VM.
That(problematic) task is executed with single command and 11 VMs defined.
The issue is probably not replated to "--quiesce" but I will take a change to ask - should all vms be "--backup" with "--quiesce" ?
Last edited by Marcin (2022-02-07 16:14:46)
Quiescing issues are always relative to the OS, version, running services and many other circumstances. Still, the most important thing is to comprehend what it is and how to deal with it. We have covered this in many posts and articles before, I will offer an excerpt here:
Some services like databases, specially if they are busy systems, require that their pages are fully written to disk to be 100% consistent. Let's say you have a series of SQL statements that add some data to your DB. If the snapshot is taken in the middle of some I/O operation to disk, the cut point may not include the final statements and the closing bytes of the transaction. This would cause that restoring the VM backup would return some corrupted DB message from part of the DB system.
Do not panic. When someone uses the word corruption one tends to think of some fully corrupted data from the beginning to the end. In case of non-quiesced snapshots, what you have is some unclosed page written to the DB. Fixing it is easy, it just requires to run some standard repair tool that will get rid of the partially written data. Still, that is a hassle and in case of big DBs might take some valuable time.
This is where quiescing comes in handy. Quiescing is a concept, it may involve many different pieces of software depending on the OS and DB system you are using. Microsoft OSs are in general trickier to quiesce than Linux. You will usually need the Virtual disk service running plus Volume Shadow Copy service in automatic mode and, off course, the latest version of VMWare Tools correctly installed.
The checklist below will do it most of the times:
Virtual Disk service is started and startup type is Automatic. VMware snapshot provider service is stopped and disabled. VMware Tools services are running. Ensure that Volume Shadow Copy service start up type is Automatic
There exist additional helper services for MS SQL Server and Exchange that may be required in your case.
But the above are just the steps to take to get (c)VMWare (c)ESXi to quiesce your system without errors. You need to comprehend what's behind this, which in the end is very simple and easy to understand.
What the different quiescing mechanisms do is to put the DB system in Read Only mode, flush any pending DB I/O buffers (namely: write any pending data to disk) and take the snapshot. When the snapshot is finally taken the DB system is put back in RW mode. This usually requires just some seconds to complete 3-5 seconds from our own experience. During that time the DB system is still available for reading, thus there's only a short glitch for writes which can easily be addressed from the application layer by just delaaying some write when you get a RO message from the database server (to be continued)...
Now, the thing is how to accomplish the above for every database system and OS.
As said there exist some helper services that will do that for you and may add some more sophisticated logic that will minimize the downtime, still there will always be some minimal downtime. As stated, this can be addressed from the application layer.
What if I have some DB system that does not offer some helper service to accomplish the above?: well you can do it on your own with the help of a script from within (c)VMWare Tools. This script will handle three events: freeze, thaw and freezefail
Windows batch version:
Create the dir: C:\Program Files\VMware\VMware Tools\backupScripts.d
Create some file inside of it and run the code below adapted to your own DB system, we are using some code suitable to be used with MariaDB or MySQL. The scripts in the above dir will be run in alphabetical order.
FREEZE is the event that will be run just before taking the snapshot
THAW is the event that will be run when finishing to take the snapshot
FREEZEFAIL is the event that will be run when the snapshot fails to be taken.
@echo off if "%~1" == "" goto USAGE if %1 == freeze goto FREEZE if %1 == freezeFail goto FREEZEFAIL if %1 == thaw goto THAW :USAGE echo "Usage: %~nx0 [ freeze | freezeFail | thaw ]" goto END :FREEZE set PATH=C:\Program Files\MariaDB 10.6\bin\ mysql -uroot -p"yourpassword" -e "FLUSH TABLES WITH READ LOCK;SET GLOBAL read_only = 1;" goto END :FREEZEFAIL set PATH=C:\Program Files\MariaDB 10.6\bin\ mysql -uroot -p"yourpassword" -e "SET GLOBAL read_only = 0;UNLOCK TABLES;" goto END :THAW set PATH=C:\Program Files\MariaDB 10.6\bin\ mysql -uroot -p"yourpassword" -e "SET GLOBAL read_only = 0;UNLOCK TABLES;" goto END :END
What is explained here is basically the same concept applied to make some consistent backup of a running database server.
There exist some wrapper SQL commands like FLUSH TABLES FOR EXPORT that will basically join the flush and set to read only operation.