Monday, June 14, 2010

The tale of the mysterious Certificate Revocation Check failure in SCCM

One of the more fun applications in the Microsoft server set is System Center Configuration Manager, the new version of what was previously called Systems Management Server (SMS). SCCM is a godsend when it works, but it can exhibit some quirky behavior that's hard to diagnose.

On one computer today, I had an issue where I was trying to use my slick PXE automatic OS imaging function to reimage the machine. (I'll probably discuss my SCCM architecture at a later date, as it does some truly awesome things with PXE boot and task sequences for zero-touch image deployment.) I kept seeing that the task sequence was failing, as evidenced by what appeared to be a random reboot before the TS started. So, I dutifully looked in the SMSTS.log file for the source of the problem -- and it turned out to be an issue with certificate revocation list checking, specifically the dreaded WINHTTP_CALLBACK_STATUS_FLAG_CERT_REV_FAILED error. This error's particularly annoying for those of us with working certificate authority infrastructure, as it doesn't say anything about why the revocation check failed. Some of the time, at least in our environment, this is due to the CA randomly failing to publish its delta CRLs -- an issue easily addressed by restarting the CA. After doing this, I found the check was still failing, so I took at look at the CRL publication servers. We publish our CRLs to Active Directory and to two web sites, one internal and one external. All three of these were showing current, valid CRLs.

I was about to angrily disable CRL checking when I suddenly realized that the most recent delta CRL from the issuing CA has been published just a couple of hours ago. A quick check of the time on this newly unboxed computer revealed that it was a day and 16 hours off the current time, which would of course make CRL checking fail; as far as the machine's concerned, the CRL it downloaded wasn't yet valid. A quick fix of the time and date in the BIOS, a clearing of the PXE advertisement, and a reboot was all it took to get the task sequence humming again.

So, the moral of this story is, always check the dates and times of your client machines! Not only is this critical for Kerberos, but it can impact your PKI in ways that aren't immediately obvious.