©XSIBackup-Free: Free Backup Software for ©VMWare ©ESXi

Forum ©XSIBackup: ©VMWare ©ESXi Backup Software


You are not logged in.

#1 2022-01-12 10:23:07

Corbeau
Member
Registered: 2021-02-07
Posts: 19

backup a domain controller

Hi,

Is there a way to use XSIBackup DC to sucessfully replicate a MS Windows Domain Controller?
In my case there is only a PDC, no secondaries. So no USM rollbackup to worry about and as no secondaries I can't transfer roles.

Will quiesce work? Or does a warm/cold backup need to be taken?


At the moment I need to use DSRM to make the server functional. Any replicas will get a 0xc00002e2 BSOD on boot.
(e.g. https://community.spiceworks.com/topic/ … 0xc00002e2 )

Thanks

Offline

#2 2022-01-12 14:10:36

admin
Administrator
Registered: 2017-04-21
Posts: 2,032

Re: backup a domain controller

That is due to the AD DB getting corrupt due to some pending I/O operation.

"This error is an indication that the Active Directory database (NTDS.DIT) is corrupt."
How to fix AD 0xc00002e2 error

It's not difficult to fix it, still obviously the best approach is to have a 100% functional DC after restoring.
You have a number of ways to ensure the integrity of your DC:

1/ The easiest way is through a warm backup, if you can afford to stop the DC for 30 sec. to 1 minute at most.

2/ Revise the MS documents to find out how you must configure your DC to allow the AD DB to be quiesced in coordination with VMWare Tools, namely: make sure that it writes any pending data, just like before taking any snapshot.

3/ Take multiple VSS snapshots during the day and revert to the latest after restoring (not very convinient).

4/ Use pre and post snapshot scripts to stop the AD service or put it in read-only mode before taking the snapshot and start it up or put it back in R/W mode after the snapshot has been taken. This is what the related MS services should do, still you can easily implement it on your own.

Offline

#3 2022-01-13 14:29:24

Corbeau
Member
Registered: 2021-02-07
Posts: 19

Re: backup a domain controller

Regarding fixing - I have tried this on a repilica. In case anyone else needs this, quick instructions below.
NB only use if you have only one DC


F8 into DSRM (F8 may bring up blue screen first if so choose Boot Normally and keep hitting F8)
Choose Directory Services repair mode
Logon as .\administrator - you need your DSRM admin password.
Make a copy of C:\Windows\NTDS - just in case.

Run > cmd

c:
Cd c:\Windows\NTDS
Del *.log    
NTDSUTIL
activate instance ntds
files
info
quit
esentutl /p "c:\windows\ntds\ntds.dit"
md C:\Windows\NTDS\Temp
Cd C:\Windows\NTDS
NTDSUTIL
activate instance ntds
files
info
compact to “C:\Windows\NTDS\Temp”
quit
Cd C:\Windows\NTDS
copy /Y C:\Windows\NTDS\temp\NTDS.dit  C:\Windows\NTDS
del *.log
shutdown /r

cross fingers.

I'm keeping a copy of this ont he server just in case

still obviously the best approach is to have a 100% functional DC after restoring.

Couldn't agree more - especially as servers normally die at the wrong times and you need to work on your phone in the middle of the night from a different country whilst at a night club!

Last edited by Corbeau (2022-01-13 14:33:58)

Offline

#4 2022-01-13 14:50:24

Corbeau
Member
Registered: 2021-02-07
Posts: 19

Re: backup a domain controller

Some questions

How does XSIBackup trigger the shutdown in a warm backup - is it via vmtools? What I really want to know is how safe it is.

Also using --quiesce what happens -- does xsibackup ask vmtools to quiesce the system using "VMware Tools Quiescence"?

From DC manual
--backup-how[=hot|war|cold] I like the idea of a war backup!:)

Thanks

Offline

#5 2022-01-13 19:00:29

admin
Administrator
Registered: 2017-04-21
Posts: 2,032

Re: backup a domain controller

smile Take a look at <install dir>/etc/xsibackup.conf

# When power on/off request is issued, the VM power state is queried every N seconds

power_query_interval=2

# When power on/off request is issued, the VM power state is queried N times
# Thus the power state will be queried a total of power_query_interval*power_query_times seconds
# Should the query_times limit be reached, a plain power off will be issued

power_query_times=10

As explained there (c)XSIBackup will try to perform a controlled shut down as per the above mentioned variables before issuing a plain power-off.

We like to torture VMs specially VMs hosting DB servers. We have some CentOS 6.0/ MySQL 5.6 here that we have been excruciatingly powering-off in the rudest manner for years and they never suffered from DB corruption, although that will off course depend on how busy the DB is when you commit the crime.

Yes, --quiesce will issue a quiesce request, thus you can use regular pre-freeze/ post-thaw VMWare Tools scripts to prevent DB corruption.

We already fixed that typo, it will show up in some hours.

Offline

#6 2022-03-05 12:10:48

Corbeau
Member
Registered: 2021-02-07
Posts: 19

Re: backup a domain controller

Update on this.

Today I tried a boot of 2 replicas. Neither worked.
I was planning on booting and fixing the AD as per previous post.
Neither normal boot or DSRM boot worked on either replica.
I booted via a server iso but it's not possible to fix it this way.

So I have updated my xsibackup config to try a warm backup rather than a hot backup - not something I'm kean on doing but I will give it a try tonight.

Offline

#7 2022-03-07 09:51:25

Corbeau
Member
Registered: 2021-02-07
Posts: 19

Re: backup a domain controller

further update.

Warm backup worked. server down for a couple of minutes.
I suspect rebooting a windows server regualarly like this will likely break it at some point.
Work ongoing....


(I would like to make it very clear to anyone else reading this. The server was Windows Server Essentials. So only one DC.
On testing replicas I discovered 1) it wouldn't boot due to corrupt AD. 2) I couldn't boot into directory services mode to fix things.
So do not use hot backup of a DC )

Last edited by Corbeau (2022-03-07 09:54:10)

Offline

#8 2022-03-07 10:37:41

admin
Administrator
Registered: 2017-04-21
Posts: 2,032

Re: backup a domain controller

Thank you for your feedback.

This is yet another issue having to do with quiescing your FS. We are writing about this all the time, still we have recently updated the main post relative to this topic and added some specific notes.

Every user should try to make the effort to see this kind of problems as a broad issue, even though each particular situation should require a slightly different procedure to solve it.

Of course your proposed solution will always work, as you are shutting your server down before taking the backup snapshot. Even though it is immediately switched on after taking it, the snapshot is indeed taken from a stopped state of the VM, thus the possibilities that your Active Directory DB gets corrupted are zero.

If you can afford to stop the VM for some seconds, a warm backup is definitely the simplest solution to this kind of problems. Still, not everybody can afford to stop the DC to backup the AD VMs.

Problem description

AD information is kept in a DB. That DB could become corrupt, just like any other DB server which is abruptly stopped. The snapshot issue is about the same as a sudden power outage, which before virtualization became popular was the most frequent way to corrupt the AD database.

The mere fact that it does indeed become corrupt is random and proportional to how busy it is. You might be lucky and the service might be iddle just when you take your snapshot, you should not count on that though.

The DB becoming corrupt does not mean that the whole database goes corrupt. People tend to think in maximalistic terms all the time, which causes terror, doubt and in the end wrong decisions.

Databases become corrupt on power outages or non-quiesced snapshots just because the last pages that are being written get chopped before the end of the page is written to disk. Thus, the system preprocessing routines detect this unfinished write because some page in the DB lacks a footer or closing structure.

Fixing the problem consists in the same conceptual thing in every case: detecting the wrong pages and removing them, which is usually done with the database repair commands. This obviously varies depending on the DB system. In case of a DB server like MySQL or MS SQL Server, you would just loose the last writes or updates. In case of AD, the repairing would chop off the latest AD related operations.

Active Directory adds an additional problem, which is that the DC controller is dependent on the healthyness of the Active Directory DB to boot up. This could be considered an OS design flaw, as it puts you in a technical paradox. The solutions proposed by Microsoft don't seem to work in your case, still, there should be a fairly easy way to fix that DB, as said, this is an old issue which has mature fixing procedures since many years ago, as stated, power outages were a common source of AD relates corruption problems before they were replaced in frequency by virtualization snapshots.

Quiescing backups

All this kind of issues are prevented the same way: quiescing the FS before actually taking the snapshot. It consists in about the same as a controlled shutdown for DB services, still done with the OS running and resuming normal operations ASAP. It usually takes some seconds at most to quiesce the different DB services in a server.

In the notes on quiescing we describe the procedure to follow in case of DB services in Windows servers.

There are a few services related to quiescing a Windows guest: VSS, VMWare Tools, Virtual Disk and in some cases some additional helper services. Just as long as those services are configured as described in our post and all other related services are installed and configured properly, using a quiesced snapshot should prevent any corruption on the different DB services that may be running in your guest.

Quiescing in a nutshell consists in the (c)ESXi server communicating a snaphot is about to be taken to the VMWare Tools service in the guest, then the VMWare Service should coordinate the controlled pause of the running DB services.

Still, if you have some host that is not responding to automatic quiescing. You can control the process on your own, how?:

(c)VMWare Tools offer a way to run custom pre and post backup scripts, like described in the post. This scripts can handle three events related to snapshots: pre-FREEZE, THAW and FREEZEFAIL.

FREEZE happens right before the snapshot is taken, THAW happens right after the snapshot has been created (please, note that some documents on the web wrongly describe THAW as happening when the snapshot is deleted), finally FREEZEFAIL is run in the event that some error is triggered.

Controlling your AD services quiescing on your own would consist in adding the necessary AD Service stop command to FREEZE and AD Service start command to THAW, as well as to FREEZEFAIL. That way you make sure that before your backup snapshot is taken the AD Service is stopped gracefully preventing any data corruption and that once the snapshot has been completed it is started again.

It is conceptually the same as running a "warm" backups, still, you make sure that you don't have to reboot the server. It is indeed the same that the coordinated services in the server should do when they are configured the right way.

@echo off
if "%~1" == "" goto USAGE
if %1 == freeze goto FREEZE
if %1 == freezeFail goto FREEZEFAIL
if %1 == thaw goto THAW
:USAGE
echo "Usage: %~nx0 [ freeze | freezeFail | thaw ]"
goto END
:FREEZE
net stop YOUR_AD_INSTANCE_NAME
goto END
:FREEZEFAIL
net start YOUR_AD_INSTANCE_NAME
goto END
:THAW
net start YOUR_AD_INSTANCE_NAME
goto END
:END

Offline

#9 2022-03-10 11:01:07

sistemi
Member
Registered: 2017-08-29
Posts: 74

Re: backup a domain controller

Hi, I'm interested in trying this, but how can I get the value for "YOUR_AD_INSTANCE_NAME" ?

Thank you.

Offline

#10 2022-03-10 11:38:28

admin
Administrator
Registered: 2017-04-21
Posts: 2,032

Re: backup a domain controller

The goal of that line in the procedure is to set AD off and then back on once the snapshot has been finally taken.

Active Directory Domain Services usually appears as NTDS in the Services applet, that may vary depending on your setup and customization level.
Most of the times you will issue:

net stop NTDS
net start NTDS

Please, note that this is a straight solution that will turn your AD service off during a couple of seconds. VSS services in your DC should take care to hold NTDS writes while the snapshot is being taken just as long as VMWare Tools are correctly installed and configured. The checklist that works for most of our users is:

Virtual Disk service is started and startup type is Automatic.
VMware snapshot provider service is stopped and disabled.
VMware Tools services are running.
Ensure that Volume Shadow Copy service start up type is Automatic

We can't obviously guarantee that your MS Server DC will behave as you expect it to, that will depend on so many other things, that's why we offer this straight procedure that should work in every case.

Offline

Board footer