Is there a way to use XSIBackup DC to sucessfully replicate a MS Windows Domain Controller?
In my case there is only a PDC, no secondaries. So no USM rollbackup to worry about and as no secondaries I can't transfer roles.
Will quiesce work? Or does a warm/cold backup need to be taken?
At the moment I need to use DSRM to make the server functional. Any replicas will get a 0xc00002e2 BSOD on boot.
(e.g. https://community.spiceworks.com/topic/ … 0xc00002e2 )
That is due to the AD DB getting corrupt due to some pending I/O operation.
"This error is an indication that the Active Directory database (NTDS.DIT) is corrupt."
How to fix AD 0xc00002e2 error
It's not difficult to fix it, still obviously the best approach is to have a 100% functional DC after restoring.
You have a number of ways to ensure the integrity of your DC:
1/ The easiest way is through a warm backup, if you can afford to stop the DC for 30 sec. to 1 minute at most.
2/ Revise the MS documents to find out how you must configure your DC to allow the AD DB to be quiesced in coordination with VMWare Tools, namely: make sure that it writes any pending data, just like before taking any snapshot.
3/ Take multiple VSS snapshots during the day and revert to the latest after restoring (not very convinient).
4/ Use pre and post snapshot scripts to stop the AD service or put it in read-only mode before taking the snapshot and start it up or put it back in R/W mode after the snapshot has been taken. This is what the related MS services should do, still you can easily implement it on your own.
Regarding fixing - I have tried this on a repilica. In case anyone else needs this, quick instructions below.
NB only use if you have only one DC
F8 into DSRM (F8 may bring up blue screen first if so choose Boot Normally and keep hitting F8)
Choose Directory Services repair mode
Logon as .\administrator - you need your DSRM admin password.
Make a copy of C:\Windows\NTDS - just in case.
Run > cmd
c: Cd c:\Windows\NTDS Del *.log NTDSUTIL activate instance ntds files info quit esentutl /p "c:\windows\ntds\ntds.dit" md C:\Windows\NTDS\Temp Cd C:\Windows\NTDS NTDSUTIL activate instance ntds files info compact to “C:\Windows\NTDS\Temp” quit Cd C:\Windows\NTDS copy /Y C:\Windows\NTDS\temp\NTDS.dit C:\Windows\NTDS del *.log shutdown /r
I'm keeping a copy of this ont he server just in case
still obviously the best approach is to have a 100% functional DC after restoring.
Couldn't agree more - especially as servers normally die at the wrong times and you need to work on your phone in the middle of the night from a different country whilst at a night club!
Last edited by Corbeau (2022-01-13 14:33:58)
How does XSIBackup trigger the shutdown in a warm backup - is it via vmtools? What I really want to know is how safe it is.
Also using --quiesce what happens -- does xsibackup ask vmtools to quiesce the system using "VMware Tools Quiescence"?
From DC manual
--backup-how[=hot|war|cold] I like the idea of a war backup!:)
Take a look at <install dir>/etc/xsibackup.conf
# When power on/off request is issued, the VM power state is queried every N seconds power_query_interval=2 # When power on/off request is issued, the VM power state is queried N times # Thus the power state will be queried a total of power_query_interval*power_query_times seconds # Should the query_times limit be reached, a plain power off will be issued power_query_times=10
As explained there (c)XSIBackup will try to perform a controlled shut down as per the above mentioned variables before issuing a plain power-off.
We like to torture VMs specially VMs hosting DB servers. We have some CentOS 6.0/ MySQL 5.6 here that we have been excruciatingly powering-off in the rudest manner for years and they never suffered from DB corruption, although that will off course depend on how busy the DB is when you commit the crime.
Yes, --quiesce will issue a quiesce request, thus you can use regular pre-freeze/ post-thaw VMWare Tools scripts to prevent DB corruption.
We already fixed that typo, it will show up in some hours.
Update on this.
Today I tried a boot of 2 replicas. Neither worked.
I was planning on booting and fixing the AD as per previous post.
Neither normal boot or DSRM boot worked on either replica.
I booted via a server iso but it's not possible to fix it this way.
So I have updated my xsibackup config to try a warm backup rather than a hot backup - not something I'm kean on doing but I will give it a try tonight.
Warm backup worked. server down for a couple of minutes.
I suspect rebooting a windows server regualarly like this will likely break it at some point.
(I would like to make it very clear to anyone else reading this. The server was Windows Server Essentials. So only one DC.
On testing replicas I discovered 1) it wouldn't boot due to corrupt AD. 2) I couldn't boot into directory services mode to fix things.
So do not use hot backup of a DC )
Last edited by Corbeau (2022-03-07 09:54:10)
Thank you for your feedback.
This is yet another issue having to do with quiescing your FS. We are writing about this all the time, still we have recently updated the main post relative to this topic and added some specific notes.
Every user should try to make the effort to see this kind of problems as a broad issue, even though each particular situation should require a slightly different procedure to solve it.
Of course your proposed solution will always work, as you are shutting your server down before taking the backup snapshot. Even though it is immediately switched on after taking it, the snapshot is indeed taken from a stopped state of the VM, thus the possibilities that your Active Directory DB gets corrupted are zero.
If you can afford to stop the VM for some seconds, a warm backup is definitely the simplest solution to this kind of problems. Still, not everybody can afford to stop the DC to backup the AD VMs.
AD information is kept in a DB. That DB could become corrupt, just like any other DB server which is abruptly stopped. The snapshot issue is about the same as a sudden power outage, which before virtualization became popular was the most frequent way to corrupt the AD database.
The mere fact that it does indeed become corrupt is random and proportional to how busy it is. You might be lucky and the service might be iddle just when you take your snapshot, you should not count on that though.
The DB becoming corrupt does not mean that the whole database goes corrupt. People tend to think in maximalistic terms all the time, which causes terror, doubt and in the end wrong decisions.
Databases become corrupt on power outages or non-quiesced snapshots just because the last pages that are being written get chopped before the end of the page is written to disk. Thus, the system preprocessing routines detect this unfinished write because some page in the DB lacks a footer or closing structure.
Fixing the problem consists in the same conceptual thing in every case: detecting the wrong pages and removing them, which is usually done with the database repair commands. This obviously varies depending on the DB system. In case of a DB server like MySQL or MS SQL Server, you would just loose the last writes or updates. In case of AD, the repairing would chop off the latest AD related operations.
Active Directory adds an additional problem, which is that the DC controller is dependent on the healthyness of the Active Directory DB to boot up. This could be considered an OS design flaw, as it puts you in a technical paradox. The solutions proposed by Microsoft don't seem to work in your case, still, there should be a fairly easy way to fix that DB, as said, this is an old issue which has mature fixing procedures since many years ago, as stated, power outages were a common source of AD relates corruption problems before they were replaced in frequency by virtualization snapshots.
All this kind of issues are prevented the same way: quiescing the FS before actually taking the snapshot. It consists in about the same as a controlled shutdown for DB services, still done with the OS running and resuming normal operations ASAP. It usually takes some seconds at most to quiesce the different DB services in a server.
In the notes on quiescing we describe the procedure to follow in case of DB services in Windows servers.
There are a few services related to quiescing a Windows guest: VSS, VMWare Tools, Virtual Disk and in some cases some additional helper services. Just as long as those services are configured as described in our post and all other related services are installed and configured properly, using a quiesced snapshot should prevent any corruption on the different DB services that may be running in your guest.
Quiescing in a nutshell consists in the (c)ESXi server communicating a snaphot is about to be taken to the VMWare Tools service in the guest, then the VMWare Service should coordinate the controlled pause of the running DB services.
Still, if you have some host that is not responding to automatic quiescing. You can control the process on your own, how?:
(c)VMWare Tools offer a way to run custom pre and post backup scripts, like described in the post. This scripts can handle three events related to snapshots: pre-FREEZE, THAW and FREEZEFAIL.
FREEZE happens right before the snapshot is taken, THAW happens right after the snapshot has been created (please, note that some documents on the web wrongly describe THAW as happening when the snapshot is deleted), finally FREEZEFAIL is run in the event that some error is triggered.
Controlling your AD services quiescing on your own would consist in adding the necessary AD Service stop command to FREEZE and AD Service start command to THAW, as well as to FREEZEFAIL. That way you make sure that before your backup snapshot is taken the AD Service is stopped gracefully preventing any data corruption and that once the snapshot has been completed it is started again.
It is conceptually the same as running a "warm" backups, still, you make sure that you don't have to reboot the server. It is indeed the same that the coordinated services in the server should do when they are configured the right way.
@echo off if "%~1" == "" goto USAGE if %1 == freeze goto FREEZE if %1 == freezeFail goto FREEZEFAIL if %1 == thaw goto THAW :USAGE echo "Usage: %~nx0 [ freeze | freezeFail | thaw ]" goto END :FREEZE net stop YOUR_AD_INSTANCE_NAME goto END :FREEZEFAIL net start YOUR_AD_INSTANCE_NAME goto END :THAW net start YOUR_AD_INSTANCE_NAME goto END :END
Hi, I'm interested in trying this, but how can I get the value for "YOUR_AD_INSTANCE_NAME" ?
The goal of that line in the procedure is to set AD off and then back on once the snapshot has been finally taken.
Active Directory Domain Services usually appears as NTDS in the Services applet, that may vary depending on your setup and customization level.
Most of the times you will issue:
net stop NTDS net start NTDS
Please, note that this is a straight solution that will turn your AD service off during a couple of seconds. VSS services in your DC should take care to hold NTDS writes while the snapshot is being taken just as long as VMWare Tools are correctly installed and configured. The checklist that works for most of our users is:
Virtual Disk service is started and startup type is Automatic. VMware snapshot provider service is stopped and disabled. VMware Tools services are running. Ensure that Volume Shadow Copy service start up type is Automatic
We can't obviously guarantee that your MS Server DC will behave as you expect it to, that will depend on so many other things, that's why we offer this straight procedure that should work in every case.