Home > Uncategorized > Outlook freezing and spiking Exchange Indexing service processes? You probably have some corruption in your mailboxen

Outlook freezing and spiking Exchange Indexing service processes? You probably have some corruption in your mailboxen

Last night, at about 4:50PM we faced a momentary “freeze” in all Outlook clients.

I hopped onto one of our mailbox servers (the DAG member where our three DBs were mounted), and noticed a large lag in connection.

Immediately on connection, I popped open taskmgr, and noticed that a msftefd.exe process was spiking in CPU. After some quick research, I associated that process name with the Exchange Search Indexer service. As noted:

MS-Search is actually composed of the core indexer (msftesql.exe) and a sacrificial filter daemon (msftefd.exe) which can be recycled at will. That’s where the protocol handler, filters and word breakers live.

The new Search in Exchange 2007

With visions of Ikon the Verbal Hologram running through my head, I continued searching and came upon the idea that spikes occur for primarily two reasons:

  1. There are corrupted Exchange items in some mailbox or public folder that are causing the crawler to go cahrazy.
  2. There are mail items populating a mailbox so quickly that it is racing ahead of the indexer, and it can never be fully indexed… it just keeps going and going (an NDR loop is a possible cause).

I’ll cover troubleshooting and solving both scenarios.

Stop the CPU spike:

You can stop the CPU spike by disabling indexing on all mailbox databases:

#all DBs (should work https://msdn.microsoft.com/library/ff326162%28v=exchg.150%29.aspx):
get-mailboxdatabase | Set-MailboxDatabase -IndexEnabled $false
#per DB:
Set-MailboxDatabase "Database name" -IndexEnabled $false

Doing this will cause the following:

  • slow/no search for: Online mailbox clients… like non-cached Exchange mode Outlook, Outlook web access (OWA), and Windows Mobile phone clients.

It may be worth it so that you can solve the problem while having your mailbox server do it’s job.

Finding Corrupted Exchange Items in your Mail Databases:

It’s very easy to find corrupted items in your exchange databases (we will fix them later):

#per database:
New-MailboxRepairRequest -Database "database name" -CorruptionType AggregateCounts,ProvisionedFolder,SearchFolder,FolderView -DetectOnly
#per mailbox:
New-MailboxRepairRequest -mailbox iamanidentity@contoso.com -CorruptionType AggregateCounts,ProvisionedFolder,SearchFolder,FolderView -DetectOnly

You can only perform one mailbox repair request at a given time.

This command will not block during execution (showing output when finished) and there is no show-mailboxrepairrequest command, but you can look at the Application event log to see the status of the request:

  • 10047 A mailbox-level repair request started
  • 10059 A database-level repair request started.
  • 10064 A Public Folder repair request started
  • 10048 The repair request successfully completed.
  • 10050 The mailbox repair request task skipped a mailbox .
  • 10062 Corruption was detected.

Or for those of you who like using Event Log filters:

10047,10059,10064,10048,10050,10062

<QueryList>
  <Query Id="0" Path="Application">
    <Select Path="Application">*[System[(EventID=10047 or EventID=10059 or EventID=10064 or EventID=10048 or EventID=10050 or EventID=10062) and TimeCreated[timediff(@SystemTime) <= 10800000 ]]]</Select>
  </Query>
</QueryList>

Fixing Corrupted Exchange Items in your Mail Databases:

Once you locate corrupted items (by reviewing 10062 events after performing `new-mailboxrepairrequest` with `-detectonly`), identify the associated mailbox and then make a decision based on the type of corruption reported: can you/the user “lose” the corrupted Email items? Do you have a backup? Then you’re probably fine to have the items be deleted, and if the user complains, then you can always restore. I will cover both scenarios:

Option 1: “Remove” corrupted items from a mailbox:

This solution requires no downtime in the mailbox database and allows the mailbox to stay live and accessible by clients.

You should have multiple mailbox databases. If you don’t have another database, create a new one.

Referring to event 10062 to identify the mailbox with item corruption, move that mailbox to another mailbox database not moving the corrupted mailbox items:

New-MoveRequest -Identity iamanidentity@contoso.com -TargetDatabase mbdb3 -baditemlimit 500 -acceptlargedataloss #sounds scary, simply doesn't cause a failure if 500 corrupted items are found

Option 2: Repair the corrupted items from a mailbox:

This solution requires no downtime in the mailbox database, but the mailbox itself will be offline while the repair takes place. Notify the user that their mailbox will be inaccessible for emergency maintenance and perform the following:

New-MailboxRepairRequest -mailbox iamanidentity@contoso.com -CorruptionType SearchFolder,AggregateCounts,ProvisionedFolder,FolderView

Finding fast growing mailboxes:

First, inspect the current item count on all mailboxes:

Get-MailboxStatistics -server nymb1 | Select ItemCount,database,DisplayName,TotalItemSize | Sort-Object TotalItemSize -Descending

Something look fishy already? Good. Check out why that mailbox that’s a meeting room has 40000 items.

Then, you can clearly re-run this command multiple times, comparing the outcome, such as with your eyes:

while ($true -eq $true) { clear; Get-MailboxStatistics -server nymb1 | Select ItemCount,DisplayName; sleep 1}

Or put a low-end threshold on itemcount to reduce what you will see:

while ($true -eq $true) { clear; Get-MailboxStatistics -server nymb1 | where {$_.itemcount -gt 100000} | Select ItemCount,DisplayName; sleep 3}

Obviously, if you see something weird, look into it more by likely accessing the mailbox and investigating.

Further investigation:

One of the suggestions I found was to set CPU preference for msftefd.exe and msftesql.exe. I actually find this quite stupid, as multiple instances of these processes might spawn to assist with indexing, and need CPU to function properly. Instead, try to determine if you have a performance problem. Using VMs? You may have a resource contention problem, review for Host CPU spikes during that time. Maybe you want to set policies to separate specific VMs to alternate Hosts (DRS in VMware).

Post-event monitoring:

Exchange performance monitoring is a huge topic, and there are probably better resources to refer to than what I can write here. I will focus on our “problem,” the search indexer and the processes that it spawns.

I set up a performance monitor data collector with the following counters to run for 48 hours:

  • MSExchange Search Indexer\Average Batch Latency
  • MSExchange Search Indexer\Number of Databases Being Crawled
  • MSExchange Search Indices\Indexing Slow: for all instances
  • MSExchange Search Indices\Number of Mailboxes Left to Crawl: : for all instances
  • Process\% Processor Time: for Microsoft.Exchange.Search.ExSearch
  • Processor Information\% Processor Time: total

I also set a performance monitor data collector with the following performance counter alerts:

  • Process\% Processor Time: for msftefd when above 24% (because I have four cores and the process is single threaded)
  • Process\% Processor Time: for msftesql when above 24% (because I have four cores and the process is single threaded)

The alert action will log an entry int eh application event log.

The alert task is as follows:

run:

powershell.exe

Task arguments:

-command{send-mailmessage -smtpserver cas1 -to monitoring@contoso.com -from nymb1@contoso.com -subject "msftefd or msfftesql violated CPU tolerance"

This allows me to know if and when the problem occurs again.

Still having trouble?

I still had some mysterious CPU spikes, so I figured why not try another non-invasive tool the Content Index Troubleshooter.

run:

. "$env:programfiles\Microsoft\Exchange Server\V14\Scripts\Troubleshoot-CI.ps1" -Server nymb1

Watch the Applications and Service Logs for Microsoft-Exchange-Troubleshooters\operational.

Still still having trouble?

The above troubleshooter returned no problems, yet we still were suffering from CPU spikes. The most invasive move is to delete the Content Indexes and have the Indexer reindex them.

. "$env:programfiles\Microsoft\Exchange Server\V14\Scripts\ResetSearchIndex.ps1" -force -all

You can simply run this command and it will delete the existing indexes and restart the indexer. Once the indexer doesn’t see indexes for the databases it will rebuild the indexes. It took our VM about 12 hours to reindex and crawl our databases (concurrently). To learn more about expected times, refer to this short series of blog posts by the Exchange Team.

During a reindex, I warned users:

  • Email will function without issue.
  • Search results may be unavailable or incomplete during the maintenance.

You should keep an eye on the following event IDs:

<QueryList>
  <Query Id="0" Path="Application">
    <Select Path="Application">*[System[Provider[@Name='MSExchange Search Indexer']]]</Select>
  </Query>
</QueryList>
  • 109: Beginning of crawl.
  • 110: End of crawl.

You can also keep an eye on the following performance counters:

\MSExchange Search Indexer\Number of Databases Being Crawled
\MSExchange Search Indexer\Number of Databases Being Indexed
\MSExchange Search Indices(*)\Document Indexing Rate
\MSExchange Search Indices(*)\Full Crawl Mode Status
\MSExchange Search Indices(*)\Number of Mailboxes Left to Crawl
\MSExchange Search Indices(*)\Number of Outstanding Batches

 

 

Sources:

  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: