ACTIVE DIRECTORY REPLICATION ISSUES AND TROUBLESHOOTING
Ing. Ondřej Ševeček | GOPAS a.s. | MCM: Directory Services | MVP: Enterprise Security |[email protected] | www.sevecek.com |
GOPASTECHED 2012
NETWORK SERVICESActive Directory Replication Issues and Troubleshooting
Central Database
LDAP – Lightweight Directory Access Protocol database query language, similar to SQL TCP/UDP 389, SSL TCP 636 Global Catalog (GC) – TCP/UDP 3268, SSL TCP 3269 D/COM Dynamic TCP – Replication D/COM Dynamic TCP – NSPI
Kerberos UDP/TCP 88
Windows NT 4.0 SAM SMB/CIFS TCP 445 (or NetBIOS)
password resets, SAM queries SMB/DCOM Dynamic TCP
NTLM pass-through Kerberos PAC validation
Design Considerations
Distributed system DCs disconnected for very long times
several months Multimaster replication
with some FSMO roles
Design Considerations
Example: Caribean cruises, DC/IS/Exchange on board with tens of workstations and users, some staff hired during journey. No or bad satelite connectivity only. DCs synced after ship is berthed at main office.
Challenge: Must work independently for long time periods. Different independent cruise-liners/DCs can accomodate changes to user accounts, email addresses, Exchange settings. Cannot afford lost of any one.
Database
Microsoft JET engine JET Blue common with Microsoft Exchange used by DHCP, WINS, COM+, WMI, CA,
CS, RDS Broker %WINDIR%\NTDS\NTDS.DIT
ESENTUTL Opened by LSASS.EXE
Installed servicesLSASS
Security Accounts Manager
TCP 445SMB + Named
Pipes
Kerberos Key Distribution Center
UDP, TCP 88Kerberos
Active Directory Domain Services
UDP, TCP 389LDAP
NTDS.DIT
D/COM Dynamic TCP
Installed services
LSASS
SAM
KDC
NTDS
TCP 445SMB + Named
Pipes
UDP, TCP 88Kerberos
UDP, TCP 389, ...LDAP
NT4.0
NTLM Pass-through
PAC validation
Windows 2000+
LDAP/ADSI ClientNTDS Replication
FIM/DRS API Client
Connect to domain
D/COM Dynamic TCP
Uninstallation
DCPROMO requires working replication connectivity
with other DCs DCPROMO /forceremoval
does not access network at all can run in DS Restore Mode
NTDSUTIL Metadata Cleanup
Connection Connect to server srv2.idtt.local Quit Select operation target List sites Select site 0 List domains in site Select domain 0 List servers in site Select server 0 Quit Remove selected server
Metadata Cleanup
TOPOLOGYActive Directory Replication Issues and Troubleshooting
Knowledge Consistency Checker (KCC)
runs 5 minutes after boot Repl topology update delay (secs)
runs every 15 minutes periodically Repl topology update period (secs)
Intrasite Replication Topology
DC1
DC2
DC4
DC3
Originating Updates and Notifications
DC1
DC2
DC4
DC3
15 sec
3 sec
3 sec
Notification and Replication
DC1 DC2
I have got some changes
Kerberos AuthenticatedDCOM TCP Rando
m
Give me your replica
Kerberos AuthenticatedDCOM TCP Rando
m
Intrasite Replication – 3 Hops max.
DC1 DC
4
DC3DC
5DC6
DC7
DC2
Intersite Replication (no Bridgeheads)
DC1
DC2
DC3
DC5
DC6
DC7DC
4
Intersite Replication (no Bridgeheads)
DC1
DC2
DC3
DC5
DC6
DC7DC
4
15 sec
3 sec
3 sec3 sec
3 secschedule
Intersite Replication with a Bridgehead
DC1
DC2
DC3
DC5
DC6
DC7DC
4
15 sec
3 sec
3 sec3 sec
3 sec
schedule
Intrasite Replication
Uses notifications by default (originating/received) 300/30 sec on Windows 2000 15/3 sec on Windows 2003
Occurs every hour as scheduled nTDSSiteSettings At this frequency KCC detects unavailable partners
HKLM\System\CCS\Services\NTDS\Parameters Replicator notify pause after modify (secs) Replicator notify pause between DSAs (secs)
Intrasite Replication
DC1 DC2
notification
random TCP
downloadchanges
random TCP
15 sec
downloadchanges
random TCP
schedule
Intersite Replication
DC1 DC2
downloadchanges
random TCP
schedule
Intersite Replication
Does not use notifications by default siteLink: options = USE_NOTIFY (1)
Compression used siteLink: options =
DISABLE_COMPRESSION (4) Bridge all site links
Site Link Design
Site Link Design (Better?)
London
Olomouc
Roma
Cyprus
Paris
Berlin
Site Link Design (Worse?)
Olomouc
Roma
Cyprus
Paris
Berlin
London
Static TCP for Replication HKLM\System\CurrentControlSet\Services NTDS\Parameters
TCP/IP Port = DWORD Replication + NSPI
Netlogon\Parameters DCTcpipPort = DWORD LSASS (Pass-through)
NTFRS\Parameters RPC TCP/IP Port Assignment = DWORD
DFSRDIAG StaticRPC /port:xxx /Member:dc1
Urgent Replication (Notification)
Intrasite only intersite also if notification enabled
Do not wait for delay (15/3 sec) In the case of
account lockout password and lockout policy RID FSMO owner change DC password or trust account password
change
Immediate Replication (Notification)
Password changes from DCs to PDC
Regardless of site boundaries PDC downloads only the single user
object all changed attributes but only single
object From DC/PDC further with normal
replication
Example Replication Traffic Atomic replication of a single object with
a one byte attribute change Notification + replication
intersite compressed Overall 7536 B 30 packets ~10 round trips
50 ms round trip means 500 ms transfer time consumption at 120 kbps
Useful data ~80 B
Bridge All Site Links On
Olomouc
London
Prague
ParisRoma
Cyprus
B
B A
site links are transitive
can be disabled on IP transportA
A
A
A
Bridge All Site Links Off
Olomouc
London
Prague
ParisRoma
CyprusA
A
site links are not transitive
Cyprus partition is cut off
A
A
A
B
B
GC Replication
Olomouc
London
Prague
ParisRoma
Cyprus
A
A
A
A
A
one-way:from the source NC into the nearest GC
two-way:GCs between themselves
B
GC
GC
GC
Roma
London
GC Replication
Olomouc
Prague
Paris
Cyprus
A
A
A
A
B
AB
one-way:from the source NC into the nearest GC
two-way:GCs between themselves
GC
Subnetting in AD (Apps)
10.10.x.x / 16
10.10.0.248 / 29
DC1
DC2
DC3 DC4
DC5Exchang
eExchangeExchang
e
Subnetting in AD (Recovery)
10.10.x.x / 16
Recovery Site10.10.0.7 / 32
DC1
DC2
DC3 DC4
DC5
Rebuilding After Failure
Rebuilding After Failure
Inter-site IntersiteFailuresAllowed MaxFailureTimeForIntersiteLink (secs)
Intra-site (immediate neighbors) CriticalLinkFailuresAllowed MaxFailureTimeForCriticalLink
Intra-site (optimalization for non-critical) NonCriticalLinkFailuresAllowed MaxFailureTimeForNonCriticalLink
MODIFICATIONSActive Directory Replication Issues and Troubleshooting
Modification operations
Create new object Modify attributes
change/delete value change distinguishedName = rename
Rename container all subobjects renamed as well
Replication Metadata
REPADMIN /ShowObjMeta all attributes when originating DC
Replication conflicts
The later action wins if no one is later then random (USN)
Attribute modified on two DCs “simultaneously” only one change wins
Linked multivalue attribute modified merged (on 2003+ forest level)
Object/container deleted and object modified deleted
Object moved into a deleted container CN=lost and found
Two objects with the same sAMAccountName, cn or userPrincipalName created object renamed, logins duplicit
Linked Multi-values
DC1
Replication
Kamil 10:00Helen 11:00
DC2
DC1 9:00
11:05
DC1
Replication Basics
Kamil 10:00Helen 11:00
DC2
DC1 11:30Kamil 10:00Helen 11:00
11:30
DC1
Replication Basics
Kamil 10:00Helen 11:00
DC2
DC1 11:30Kamil 10:00Helen 11:00
Judith 12:00
12:05
DC1
Replication Basics
Kamil 10:00Helen 11:00
DC2
DC1 12:30Kamil 10:00Helen 11:00
Judith 12:00 Judith 12:00
12:30
DC1
Replication Basics
Kamil 10:00Helen 11:00 DC2
DC1 12:30Kamil 10:00Helen 11:00
Judith 12:00
Judith 12:00
DC1DC1DC1
DC3
Marie 11:00 Me
12:30
DC1
Replication Basics
Kamil 10:00Helen 11:00 DC2
DC1 12:30Kamil 10:00
Helen 11:00
Judith 12:00
Judith 12:00
DC1
DC1DC1
DC3DC1 10:30DC2 7:00
Kamil 10:00 DC1
Marie 11:00 Me
12:30
DC1
Replication Basics
Kamil 10:00Helen 11:00 DC2
DC1 12:30Kamil 10:00
Helen 11:00
Judith 12:00
Judith 12:00
DC1
DC1DC1
DC3DC1 10:30DC2 7:00
Kamil 10:00 DC1
Marie 11:00 Me
13:30
DC1
Replication Basics
Kamil 10:00Helen 11:00 DC2
DC1 12:30Kamil 10:00
Helen 11:00
Judith 12:00
Judith 12:00
DC1
DC1DC1
DC3DC1 12:30DC2 13:30
Kamil 10:00 DC1
Marie 11:00 Me
13:30
DC1
Replication Basics
Kamil 10:00Helen 11:00
Kamil 10:00Helen 11:00
Judith 12:00
Judith 12:00
DC1DC1DC1DC3
DC1 12:30DC2 13:30
Marie 11:00 DC2
14:15
USN
Each object modification increments USN for that object and for the whole DC
Each DC remembers USNs of its replication partners
repadmin /showutdvec
USN 2USN5001
3USN3001
1USN1001
2 50013 3001
1 10013 3001
1 10012 5001
USN 2USN5001
3USN3001
1USN1003
2 50013 3001
13 3001
1 10012 5001
Kamil 1002John 1003
1001
USN 2USN5001
3USN3001
1USN1003
2 50013 3001
13 3001
1 10012 5001
Kamil 1002John 1003
Notify
Give me
1002, 3
1001
USN 2USN5003
3USN3001
1USN1003
2 50013 3001
1 10033 3001
1 10012 5001
Kamil 5002John 5003
Kamil 1002John 1003
USN 2USN5004
3USN3001
1USN1003
2 50013 3001
1 10033 3001
1 10012 5001
Kamil 5002John 5003
Maria 5004Kamil 1002John 1003
USN 2USN5004
3USN3004
1USN1003
2 50013 3001
1 10033 3001
1 10032 5004
Kamil 3002John 3003
Kamil 5002John 5003
Maria 5004
Maria 3004
Kamil 1002John 1003
2
11
11
USN 2USN5004
3USN3004
1USN1003
2 50013 3001
1 10033 3001
1 10032 5004
KamilJohn
Kamil 1002John 1003
KamilJohn
MariaKamilJohn
50025003
5004
2
11
KamilJohnKamilJohn
Maria
300230033004
2
11
11
USN 2USN5004
3USN3004
1USN1003
2 50013 3004
1 10033 3001
1 10032 5004
KamilJohn
Kamil 1002John 1003
KamilJohn
MariaKamilJohn
50025003
5004
2
11
KamilJohnKamilJohn
Maria
300230033004
Maria2
REPLICATION PROBLEMSActive Directory Replication Issues and Troubleshooting
The Three Problems
Single DC offline for a long time not so long as tombstone! authentication problem
Tombstone lifetime two separate DC zones not a “business” consistency problem
USN rollback restore from snapshot, image, manual
backup total inconsistency!
DC Offline for Long Time
DC1
DC2
DC3
DC2 PWD 21
DC3 PWD 31
PWD 21
Month 0
OLD PWD -
PWD 31OLD PWD -
MyPWD 11
DC Offline for Long Time
DC1
DC2
DC3
DC2 PWD 21
DC3 PWD 31
PWD 22
Month 1
OLD PWD 21
PWD 32OLD PWD 31
MyPWD 11
DC Offline for Long Time
DC1
DC2
DC3
DC2 PWD 21
DC3 PWD 31
PWD 23
Month 2
OLD PWD 22
PWD 33OLD PWD 32
MyPWD 11
PWD 21
DC Offline for Long Time
DC1
DC2
DC3
DC2 PWD 21
DC3 PWD 31
PWD 23
Month 3
OLD PWD 22
PWD 33OLD PWD 32
Kerberos
KDC TGS Ticket
MyPWD 11
PWD 23
DC Offline for Long Time
DC1
DC2
DC3
DC2 PWD 21
DC3 PWD 31
PWD 23
Month 3
OLD PWD 22
PWD 33OLD PWD 32
KDC Disabled TGS
Ticket Kerberos
KDC
MyPWD 11
DC Isolated for Long Time
DC1
DC2
DC3
MyPWD 13
Month 3
Kerberos
KDC
DC1 PWD 11
DC1 PWD 11
KDC Disabled
PWD 13TGT
Ticket
DC Isolated for Long Time
DC1
DC2
DC3
Month 3
DC1 PWD 14
DC1 PWD 14
NETDOM RESETPWD
PWD 14TGT
Ticket
MyPWD 14
KDC Disabled
Lingering Objects
When DC didn’t replicate during the tombstoneLifetime, it halts replication
Can be restored by Allow Replication with Divergent and Corrupt Partner HKLM\System\CCS\Services\NTDS\
Parameters turn on, replicate, turn off
DC4
DC3
DC2
DC1
Objects and Tombstones
FrankStanTania
FrankStanTania
FrankStanTania
FrankStanTania
DC4
DC3
DC2
DC1
Objects and Tombstones
FrankStanTania
FrankStanTania
FrankStanTania
FrankStanTania
DC4
DC3
DC2
DC1
Objects and Tombstones
FrankStanTania
FrankStanTania
FrankStanTania
FrankStanTania
DC4
DC3
DC2
DC1
Objects and Tombstones
FrankStanTania
FrankStanTania
FrankStanTania
FrankStanTania
DC4
DC3
DC2
DC1
Garbage Collection 1/day
Frank
Tania
FrankStanTania
FrankStanTania
Frank
Tania
DC4
DC3
DC2
DC1
Garbage Collection 1/day
Frank
Tania
Frank
Tania
Frank
Tania
Frank
Tania
DC4
DC3
DC2
DC1
Lingering Objects
FrankStanTania
FrankStanTania
FrankStanTania
FrankStanTania
DC4
DC3
DC2
DC1
Lingering Objects
FrankStanTania
FrankStanTania
FrankStanTania
FrankStanTania
DC4
DC3
DC2
DC1
Lingering Objects
Frank
Tania
FrankStan
Frank
Tania
FrankStan
Tania
Tania
DC4
DC3
DC2
DC1
Lingering Objects
Frank
Tania
FrankStan
Frank
Tania
FrankStan
Tania
Tania
Possible Problems
Inconsistent distributed database Proliferation of partial objects
after modification of some attributes
Allow Replication with Divergent and Corrupt Partner blocks replication after tombstone
lifetime Strict Replication Consistency
detects partial objects if replication allowed
Lingering Objects
Lingering Objects
Strict Replication Consistency HKLM\System\CCS\Services\NTDS\
Parameters 1 – do not replicate 0 – request full copy from source
By default only on new Windows 2003+ installations
Automatic Repair Philosphy? Business logic says “deleted already”
should we investigate? Metadata cleanup?
we may need some data from the vesel Remove lingering objects
Removing Lingering Objects REPADMIN /RemoveLingeringObjects
target sourceGUID DN /advisory_mode sourceGUID – healthy DC’s GUID
(without {}) target – suspected DC’s name with
lingering objects DN – naming context DN /advisory_mode just logs the found objects (on the ill DC)
Lingering Object found/deleted
Correct Registry Settings
Long term normal operation Strict consistency = 1 Allow divergent partner = 0
Temporary repair operation Strict consistency = 1 Allow divergent partner = 1
USN Rollback
May or may not be detected Cannot be repaired
not always lingering objects! DC must be denoted/repromoted
unplug network DCPROMO /forceremoval NTDSUTIL Roles NTDSUTIL Metadata Cleanup
USN Rollback
1001DC1
2USN5001
13 3001
Snapshot
1001
USN Rollback
Kamil 1002John 1003
Judith 1004Helen 1005
1001DC1
Eva 1006 2USN5001
13 3001
Snapshot
1001
USN Rollback
Kamil 1002John 1003
Judith 1004Helen 1005
1001DC1
Eva 1006 2USN5001
1 10063 3001
SnapshotKamil 1002John 1003
Judith 1004Helen 1005Eva 1006
Restore
1001DC1
2USN5001
1 10063 3001
RestoreKamil 1002John 1003
Judith 1004Helen 1005Eva 1006
USN Rollback (Detectable)
1001DC1
2USN5001
1 10063 3001
RestoreKamil 1002John 1003
Judith 1004Helen 1005Eva 1006
USN Rollback (Detectable)
1001DC1
2USN5001
1 10063 3001
RestoreKamil 1002John 1003
Judith 1004Helen 1005Eva 1006
Frank 1002Stan 1003
USN Rollback (Detectable)
USN Rollback (Detectable)
USN Rollback (Detectable)
USN Rollback (Detectable)
USN Rollback (Non-detect.)
Frank 1002Stan 1003
1001DC1
2USN5001
1 10063 3001
Tania 1004Mark 1005
Martin 1006Victor 1007Leo 1008
RestoreKamil 1002John 1003
Judith 1004Helen 1005Eva 1006
USN Rollback (Non-detect.)
Frank 1002Stan 1003
1001DC1
2USN5001
1 10083 3001
Tania 1004Mark 1005
Martin 1006Victor 1007Leo 1008
Restore
Victor 1007Leo 1008
Kamil 1002John 1003
Judith 1004Helen 1005Eva 1006
Restoring VM Snapshots
Restore offline HKLM\System\CurrentControlSet\Services\
NTDS Database Restored from Backup =
DWORD = 1 Restart NTDS service
changes InvocationID of the database instance
THANK YOU!
Ing. Ondřej Ševeček | GOPAS a.s. | MCM: Directory Services | MVP: Enterprise Security |[email protected] | www.sevecek.com |
GOPASTECHED 2012
Top Related