A few months ago
I blogged about a particular situation with cluster creation under Windows Server 2008. Typically, as long as the cluster validation tool passes, cluster creation completes successfully. However,
in our case, the issue ended up getting escalated all the way up to the clustering group at Microsoft.
Strange as it might seem, the root cause was the fact that the root of Active Directory had too many individual ACLs assigned. These were inherited by every object further into the AD tree structure, as long as inheritance wasn't blocked. AD has an architectural limit of 64K ACLs per object. The cluster creation process needed to assign a few more ACLs to the newly created computer object and this hit the limit of Active Directory.
At the time, we were the 5th case in the world to have something like this to happen, but those cases were still unresolved. Due to the numerous TTT (Time-Travel-Trace) dumps of the cluster creation process before, during and after the failure, we were able to nail the root cause with Microsoft PSS.
The deceiving part of all of this is that it was not readily apparent that this "ticking time bomb" of a problem existed. After a certain amount of ACL entries (I suspect around 2000), "Active Directory Users and Computers" will not show any additional ACLs. Only after removing duplicate/unneeded ACLs, more would show up in the console. Using ADSIEDIT.msc directly would show all the entries, but I like to tread lightly at customer sites when I can.
Once the ACL entries were cleaned up, the showstopper issue of "The parameter is incorrect" went away and we could create the cluster.
Months later, either as a fluke or as an emerging issue overall, this happened at another customer site with a different group of engineers within our organization. They already had a case open with Microsoft PSS but thankfully how we fixed the problem at the other site allowed us to fix the error and close the issue before PSS could dig into this issue.
The common denominator, software-wise, at both companies? The use of
Bindview. It might be a fluke, or it might be a case of "Bindview gone wild" with creation of excess ACLs. Hopefully, someone out there will benefit from this information. If you run into this error, especially with Bindview, I'd like to hear about it.
Here are the notes from PSS on the case, if you are curious:
ISSUE:
- The existing DACL on the computer object is near the size limit of an ACL (65532)
- Cluster Setup adds an ACE to the DACL, which exceeds the size of an ACL but ADSI Security Descriptor objects do not check for this limit.
- Cluster Setup builds the ADSI Security Descriptor (including the new ACE added by Cluster) and then attempts to overwrite the existing ADSI security descriptor of the computer object with the new ADSI Security Descriptor using the PUT method (of the IADS computer object) and passing the "ntSecurityDescriptor" attribute and the variant of new ADSI Security Descriptor.
- The PUT method converts the IADS Security Descriptor and its sub components to native Security Descriptors and Access Control Lists
- The native Windows API for ACL creation checks the requested size limit against the max size limit of 65532 and fails returning STATUS_INVALID_PARAMETER
RESOLUTION:
- Remove the number of ACEs within the original Security Descriptor protecting the computer object to allow Cluster Setup to add the required ACE and still be within the maximum size of the ACL