Sunday, June 20, 2010

Exchange 2010: Cluster core resources, the replication service, and active manage


Every Exchange 2010 server has a process internal to the replication service known as Active Manager.  The Active Manager is responsible for all database mount, dismount, and move operations that occur in Exchange 2010.
When a server is a standalone server, Active Manager is configured as a Standalone Active Manager. 
When a server is a member of a Database Availability Group (DAG), Active Manager is either configured as:
  • PAM – Primary Active Manager
  • SAM – Secondary Active Manager
The Active Manager status in a DAG is determined by the node that owns the cluster core resources.  If a node owns the cluster core resources group, this node is then known as the Primary Active Manager (PAM).  All other nodes successfully participating in the cluster and not owning the cluster core resources are Secondary Active Managers.
Let’s take a look at an example database availability group.
DAGName:  DAG
DagMembers:  DAG-1,DAG-2,DAG-3,DAG-4
Running get-databaseavailabilitygroup –identity DAG –status | fl name,primaryActiveManager you can determine which machine currently owns the cluster core resources and is acting as the PAM.
Get-DatabaseAvailabilityGroup -Identity DAG -Status | fl name,primaryactivemanager
Name                 : DAG
PrimaryActiveManager : DAG-3

Using cluster.exe we can also confirm the owner of the cluster core resources group
cluster.exe DAG.domain.com group
Group                Node            Status
-------------------- --------------- ------
cluster group        DAG-3           Online

Using the cluster command line, the cluster core resources can be moved to another DAG member and the PAM will subsequently change.
cluster.exe DAG.domain.com group "cluster group" /moveto:DAG-4
Moving resource group 'cluster group'...
Group                Node            Status
-------------------- --------------- ------
cluster group        DAG-4           Online

Get-DatabaseAvailabilityGroup -Identity DAG -Status | fl name,primaryactivemanager
Name                 : DAG
PrimaryActiveManager : DAG-4

Remember that Active Manager runs inside the Microsoft Exchange Replication service which is installed on every Exchange 2010 Mailbox Role Server.  This is important – if the replication service on a DAG member is not started, but that DAG member owns the cluster core resources, database mount / dismount / move functionality will not function.
Here is an example…
Currently the cluster core resources are owned on the node DAG-4 which is successfully participating in the cluster DAG.  Using the services control panel the Microsoft Exchange Replication service on the server DAG-4 was stopped.  We can confirm using the commands above that DAG-4 is still seen as the PAM.
Get-DatabaseAvailabilityGroup -Identity DAG -Status | fl name,primaryactivemanager
Name                 : DAG
PrimaryActiveManager : DAG-4

cluster dag.domain.com group
Listing status for all available resource groups:

Group                Node            Status
-------------------- --------------- ------
Cluster Group        DAG-4           Online
Available Storage    DAG-1           Offline 

Using test-replicationHealth and test-serviceHealth we can see that the replication service on node DAG-4 is unavailable.
Server          Check                      Result     Error      
------          -----                      ------     -----    DAG-4           ClusterService             Passed  
DAG-4           ReplayService              *FAILED*   The Microsoft Exchange Replication service is not running on s...
DAG-4           DagMembersUp               Passed
         
Role                    : Mailbox Server Role
RequiredServicesRunning : False
ServicesRunning         : {IISAdmin, MSExchangeADTopology, MSExchangeIS, MSExchangeMailboxAssistants, MSExchangeMailSubmission, MSExchangeRPC, MSExchangeSA, MSExchangeSearch, MSExchangeServiceHost, MSExchangeThrottling, MSExchangeTransportLogSearch, W3Svc, WinRM}
ServicesNotRunning      : {MSExchangeRepl}

At this time a dismount operation on a database was issuing using the dismount-database command.  An error is immediately returned:
Dismount-Database DAG-DB0
Confirm
Are you sure you want to perform this action?
Dismounting database "DAG-DB0". This may result in reduced availability for mailboxes in the database.
[Y] Yes  [A] Yes to All  [N] No  [L] No to All  [?] Help (default is "Y"): y


Couldn't dismount the database that you specified. Specified database: DAG-DB0; Error code: An Active Manager operation
failed. Error: The Microsoft Exchange Replication service may not be running on server DAG-4.domain.com. Specific RPC error message: Error 0x6d9 (There are no more endpoints available from the endpoint mapper) from cli_MountDatabase.
    + CategoryInfo          : InvalidOperation: (DAG-DB0:ADObjectId) [Dismount-Database], InvalidOperationException
    + FullyQualifiedErrorId : D64CA7E2,Microsoft.Exchange.Management.SystemConfigurationTasks.DismountDatabase
This error is the occurs because the server that is designated as the Primary Active Manager does not have it’s replication service running (and therefore the Active Manager is not running).  Stopping the replication service does not automatically arbitrate Active Manager functions to another DAG member.
To fix this error:
  • Start the replication service on the machine that is designated as the Primary Active Manager (preferred).
  • Move the cluster core resources to another DAG member (promoting that server to the Primary Active Manager.  (Least preferred since it does not address why the replication service is stopped on a running DAG member).
It is important that the replication service be monitored on all DAG members to ensure it remains functional.
*Updated – 5/30/2010 – Corrected the commandlet for testing services –> test-serviceHealth instead of test-serverHealth.