#1 2019-02-12 22:50:15

Timbo
Member
Registered: 2019-01-23
Posts: 12

No daily backups after a power outage

I configured a nightly backup of our VM server and was getting emails each morning with the result for the last week or more.

On Saturday, we had a power outage midway through one of the backups. It only lasted a few hours and power was restored and VM restarted.  The xsibackup.log ends on the 9th with the last line cloning at 21%.  It's been up 3 days now, and the nightly backups have not resumed.  That is a very bad sign for a backup solution. Backups need to be resilient and robust.

When I check the jobs in xsibackup, it still lists the job and shows it enabled.

The options are to install cron or remove cron, it would be good to see the current cron status to see if the problem is cron not being installed (which would imply an installation issue with your install script).

What do I have to do to get backups working again? Reinstall cron?

Thanks

Offline

#2 2019-02-13 14:39:49

admin
Administrator
Registered: 2017-04-21
Posts: 1,368

Re: No daily backups after a power outage

You said:

"On Saturday, we had a power outage midway through one of the backups"

And then

"It's been up 3 days now, and the nightly backups have not resumed. That is a very bad sign for a backup solution"

Haven't you thought of the possibility that the outage caused some additional trouble that you need to fix?, that is not a trivial incident for a server.

- Make sure that your cron service is running and is loaded just once.
- Check that the ESXi crontab is O.K.

https://33hops.com/xsibackup-cron-troubleshooting.html

Offline

#3 2019-02-13 20:33:59

Timbo
Member
Registered: 2019-01-23
Posts: 12

Re: No daily backups after a power outage

admin wrote:

You said:

"On Saturday, we had a power outage midway through one of the backups"

And then

"It's been up 3 days now, and the nightly backups have not resumed. That is a very bad sign for a backup solution"

Haven't you thought of the possibility that the outage caused some additional trouble that you need to fix?, that is not a trivial incident for a server.

- Make sure that your cron service is running and is loaded just once.
- Check that the ESXi crontab is O.K.

https://33hops.com/xsibackup-cron-troubleshooting.html

That's the point of why I'm here. I can't think of another backup solution I've ever used that didn't resume backups or inform through some notification that backups were no longer happening. So when it doesn't automatically resume backups, something MUST be wrong. On all my linux boxes, I've never had cron stuff just stop getting called.

There are no other known issues with the server. VM's restarted. All storage is accounted for. RAID operation status ok. No errors in ESXi.  This is the reason I had an expectation of backups resuming. I'll work on the troubleshooting later today.

Offline

#4 2019-02-13 21:36:10

admin
Administrator
Registered: 2017-04-21
Posts: 1,368

Re: No daily backups after a power outage

We really don't know what's the point in your argument.
So, you always used backup software that sent notifications when it was not working?.
Your issue is most probably related to your cron service, so we find it a bit difficult that any e-mails could be sent.

By the way, the crond service is ESxi's cron, but there isn't any reason why it shouldn't work, unless something wen't wrong in the reboot.

Inspect the /etc/rc.local.d/local.sh file, it should look like this.
If it does not, maybe it's due to the crontab having been edited manually and your changes having been lost in the reboot.

#!/bin/sh

# local configuration options

# Note: modify at your own risk!  If you do/use anything in this
# script that is not part of a stable API (relying on files to be in
# specific places, specific tools, specific output, etc) there is a
# possibility you will end up with a broken system after patching or
# upgrading.  Changes are not supported unless under direction of
# VMware support.

# Note: This script will not be run when UEFI secure boot is enabled.

"/vmfs/volumes/datastore1/xsibackup-dir/src/cron-init" root
exit 0

Offline

#5 2019-02-15 00:35:51

Timbo
Member
Registered: 2019-01-23
Posts: 12

Re: No daily backups after a power outage

admin wrote:

We really don't know what's the point in your argument.
So, you always used backup software that sent notifications when it was not working?.

Your issue is most probably related to your cron service, so we find it a bit difficult that any e-mails could be sent.

By the way, the crond service is ESxi's cron, but there isn't any reason why it shouldn't work, unless something wen't wrong in the reboot.

Inspect the /etc/rc.local.d/local.sh file, it should look like this.
If it does not, maybe it's due to the crontab having been edited manually and your changes having been lost in the reboot.

#!/bin/sh

# local configuration options

# Note: modify at your own risk!  If you do/use anything in this
# script that is not part of a stable API (relying on files to be in
# specific places, specific tools, specific output, etc) there is a
# possibility you will end up with a broken system after patching or
# upgrading.  Changes are not supported unless under direction of
# VMware support.

# Note: This script will not be run when UEFI secure boot is enabled.

"/vmfs/volumes/datastore1/xsibackup-dir/src/cron-init" root
exit 0

I said exactly my point, "Backups need to be resilient and robust.".  Yes, of course, I expect emails when ***backups*** fail.  I've always received emails for failed backups because there was a problem with the VM or destinations, not in the programs launching themselves. I've never had a problem with backup software failing to start their services on boot (edit, this isn't true. Crashplan will from time to time fail to start on boot, where they usually fix it with a software upgrade. In these cases, an email is sent from their cloud because no backups occurred in X days. But I would argue that is a consumer managed desktop, not a headless ESXi Server), with many more power failures than the first and only one I ran into before having a problem. If a program's service fails to start on Windows or Linux consistently, I'd switch to another app.

I've used remote access software on headless and hard to reach servers for decades. They need to always restart itself, reconnect, retry broken connections over and over, etc. It should perform error checks and correct if something will prevent it from working. Everything it can and should do to ensure the remote access is always available and prevent a truck roll or onsite visit. My first suggestion, was something that indicates the cron installation status, as the GUI shows to install or remove, nothing to indicate whether its currently enabled/installed.

So, getting back to troubleshooting.

No, /etc/rc.local.d/local.sh doesn't contain that line. Blank space where that line is.
No, /var/spool/cron/crontabs/root doesn't contain the xsibackup line to backup job 1.

#min hour day mon dow command
1    1    *   *   *   /sbin/tmpwatch.py
1    *    *   *   *   /sbin/auto-backup.sh
0    *    *   *   *   /usr/lib/vmware/vmksummary/log-heartbeat.py
*/5  *    *   *   *   /bin/hostd-probe.sh ++group=host/vim/vmvisor/hostd-probe/stats/sh
00   1    *   *   *   localcli storage core device purge
cat conf/root-crontab
30 14 * * * "/vmfs/volumes/59a3eb97-be17a947-81b7-0cc47a47ea53/xsi-dir/jobs/1"

One thing I noticed when running the job manually, is the line:

cat xsibackup.log |grep crontab
2019-01-21T23:12:14|  Alert: crontab is not installed for user root
2019-01-22T10:50:20|  Alert: crontab is not installed for user root
2019-01-22T11:06:39|  Alert: crontab is not installed for user root
2019-01-22T14:30:08|  Alert: crontab is not installed for user root
2019-01-23T03:15:40|  Alert: crontab is not installed for user root
2019-01-23T14:30:06|  Alert: crontab is not installed for user root
2019-01-24T14:30:07|  Alert: crontab is not installed for user root
2019-01-25T14:30:07|  Alert: crontab is not installed for user root
2019-01-26T14:30:07|  Alert: crontab is not installed for user root
2019-01-27T14:30:07|  Alert: crontab is not installed for user root
2019-01-28T14:30:06|  Alert: crontab is not installed for user root
2019-01-29T14:30:06|  Alert: crontab is not installed for user root
2019-01-30T14:30:06|  Alert: crontab is not installed for user root
2019-01-31T14:30:06|  Alert: crontab is not installed for user root
2019-02-01T14:30:06|  Alert: crontab is not installed for user root
2019-02-02T14:30:07|  Alert: crontab is not installed for user root
2019-02-03T14:30:06|  Alert: crontab is not installed for user root
2019-02-04T14:30:06|  Alert: crontab is not installed for user root
2019-02-05T14:30:06|  Alert: crontab is not installed for user root
2019-02-06T14:30:06|  Alert: crontab is not installed for user root
2019-02-07T14:30:06|  Alert: crontab is not installed for user root
2019-02-08T14:30:06|  Alert: crontab is not installed for user root
2019-02-09T14:30:07|  Alert: crontab is not installed for user root
2019-02-14T11:27:05|  Alert: crontab is not installed for user root

This makes me think if I rebooted the ESXi box before the power failure, the same result would occur.

This appears in the xsibackup.log, but not in the emails.  Maybe it should be an ERROR and not an Alert that doesn't get seen by user? Not sure how I'm getting daily job executions if crontab is not installed for user root. But I would suggest including this in the emails so the user knows crontab wasn't installed properly. Otherwise, when the job is running daily, the user THINKS the installation went fine, when it didn't.

I have made no manual mods to cron. I used the provided installer and configured one job through GUI.

In the troubleshooting guide, I'm currently stuck on step 3. I get no error when chmod'ing /var/spool/cron/crontabs/root, but vi gives me an error when trying to save.

'/var/spool/cron/crontabs/root' Operation not permitted

So yea, the xsibackup job isn't being saved to /var/spool/cron/crontabs/root for some permissions issue.

ls -la /var/spool/cron/crontabs/root
-rwx------    1 root     root           324 Sep 29  2017 /var/spool/cron/crontabs/root

p.s. on the troubleshooting page, you suggest to run commands blindly, but you don't say to adjust for the datastore path the xsibackup is installed to, and you incorrectly tell them to tail the conf log file that has since been moved to var/logs/xsibackup.log, not conf/xsibackup.log.  It would be better to tell the user to adapt the command than to tell them to run them blindly. Running blindly usually means "copy and paste and don't change".

I just googled the operation not permitted and got an ESXi forum response https://communities.vmware.com/thread/337354. I cp'd root to root2, added the xsibackup line to it, successfully saved it, then mv'd root2 back to root. I can now edit root file without problem. Major *shrug* on that behaviour.  But its been observed going back to ESXi 5, at least.

I added the date test to the root cron file, but it wasn't getting written to. So I ran these commands, and now it is working.  So not sure if the commands I did in troubleshooting step 2 actually worked.

# kill -HUP $(cat /var/run/crond.pid)
# /usr/lib/vmware/busybox/bin/busybox crond

You can probably add this to step 3 or make a step 4 so people don't go through the hassle of reinstalling ESXi when they don't need to.

When you're copying the root-crontab contents to root, you're checking that it wrote successfully, right? If not, you can try the copy, edit, move plus crond restart to get things working again. That will make it more resilient.

I'll try and find some time this weekend to do a reboot test and see if the cron copy happens successfully this time.

Regards

Last edited by Timbo (2019-02-15 00:38:27)

Offline

#6 2019-02-15 09:40:00

admin
Administrator
Registered: 2017-04-21
Posts: 1,368

Re: No daily backups after a power outage

Your problem is a lot easier to understand that having to go through all the checks in this post. The clue to resolving your problem and using our software appropiately is that you are overlooking some fundamental pieces of information, even though it's clear that the program's output and this very same post thread are insisting on them:

1 - The ESXi crontab is not persistent.
2 - Our software adds that line to the /etc/rc.local.d/local.sh when you install the XSIBackup cron to make sure the XSIBackup crontab contents are re-copied to the ESXi's crontab on every reboot.
3 - The message: 2019-02-05T14:30:06|  Alert: crontab is not installed for user root is almost desperately letting you know that point.

So, yes, you are right. Your backups would have never resumed on reboot, even if no power outage event would have occured.
No software can be "resilient and robust" when you have not installed it.

Offline

#7 2019-02-15 23:50:26

Timbo
Member
Registered: 2019-01-23
Posts: 12

Re: No daily backups after a power outage

admin wrote:

Your problem is a lot easier to understand that having to go through all the checks in this post. The clue to resolving your problem and using our software appropiately is that you are overlooking some fundamental pieces of information, even though it's clear that the program's output and this very same post thread are insisting on them:

1 - The ESXi crontab is not persistent.
2 - Our software adds that line to the /etc/rc.local.d/local.sh when you install the XSIBackup cron to make sure the XSIBackup crontab contents are re-copied to the ESXi's crontab on every reboot.
3 - The message: 2019-02-05T14:30:06|  Alert: crontab is not installed for user root is almost desperately letting you know that point.

So, yes, you are right. Your backups would have never resumed on reboot, even if no power outage event would have occured.
No software can be "resilient and robust" when you have not installed it.

Here's the crux of the issue. I'm (edit: was) arguing I *DID* install cron. And I thought that was confirmed with daily backups occurring. But its looking like I didn't, but I'm still able to configure jobs that run using cron. I just went in and viewed the job and changed a few screens but didn't make any changes. Very briefly, it tells me the root crontab has been updated.

Let's say you install on a fresh system. You configure a job. If the user doesn't explicitly install cron service, the expectation is that the jobs will fire according to schedule until rebooted, just not after a reboot, and that's what happened here. I know that's not my use case, maybe someone else will, but can't think of it right now.

Yep, I will have to say, it is looking like I didn't install cron. I just went to this screen and its telling me to "Enter the crontab username" (not 'select'), but doesn't have a place to enter a username. It displays "root" and "Administrator", with only the "r" in root in black and the whole "Administrator" in black, so it looks like Administrator is selected. Up/down arrows don't work, tab just selects between OK and Cancel. Pressing "r" does not change anything. So I can say now with confidence, I did not install cron service because I would have had feedback on this screen for sure. It needs work.  Edit: I think I understand now. There is only *one* user, "root", who has the role of "Administrator", not two users, "root" and "Administrator". So if there were multiple users, then pressing "r" would have selected root. I can see this making more sense if there were more users to select. It would be an optimization to JUST automatically select the root user when only the root user is displayed.

I would prefer the installer automatically (or prompt to install it by default) install cron service during initial installation and not make the user manually install it. Saves a step and will make a better default setting.

So, my feedback:
1. Install cron setting on boot service by default during install, or prompt user with default option to install it
2. In the GUI, on the Cron Install/Remove screen, show the current cron installation status
3. Expose the "Alert: crontab is not installed for user root" to the emailed results
4. Users are idiots. The more you can do to help them not make mistakes, the better the software looks.

Thanks and have a great weekend.

p.s. Please increase the forum login timeout. I'll be logged in when starting a reply, and logged out by the time I go to submit the post. Luckily, browser back still has the post contents to paste back in after relogin.

Offline

#8 2019-02-16 17:19:38

admin
Administrator
Registered: 2017-04-21
Posts: 1,368

Re: No daily backups after a power outage

Since installing the cron implies modifying your ESXi system files, you must do that explicitly.

Offline

Board footer