BES Errors – Why is my server restarting?

Just a quick on on Blackberry / BES. I ran into this over the holidays and it was driving me crazy.

The client I was workign with had BES troubles, it woudl crash, restart itself, try and restart the nBES task

even thgough it was already running (and crashign the Domino server) … it was ugly.

The issues:

Some of the problems were that the BES had allot of “garbage” on it, and was producing allot of errors (wrong

file paths for personal address books, inactive users, mail files missing, etc.) and cleanign alllll of those

errors up made a big difference.

The first tech also suggested that we change the entry of the ‘WaitToRestartAgentOnHung’ registry entry to [20]

so that the Blackberry Controller service waits longer for unresponsive tasks to clear themselves before it tres

to restart services. Default seems to be 6, we changed it to 20, also installed the BES MR2 and upgraded the

Domino FP on the server as well.

Why is that server down again????

What we ended up with was a serer that would shut itself down every between 1 AM and 2 AM and STAY DOWN.

Actually, the Controller task would shut down the Domino server but never successfully restart it – not a fund

thing to wake up to n the morning.

I had been uploading log files like crazy and this week a different tech took charge of the ticket and brought

the following Problem to my attention:

The BlackBerry Enterprise Server restarts repeatedly due to unresponsive threads processing


Development Task 766644

Development Task 799826

BlackBerry Controller does not restart IBM Lotus Domino after it was stopped when hung thread wait count

threshold was reached

Development Task 764171

Here some of the errors from the BES logs:

***** CONTROLLER LOG *****

(01/04 00:00:02.597):{0xD60} [CFG] Controller will wait for WaitCount = 20 to restart Domino & agent on hung


(01/04 00:00:02.628):{0xD60} [CFG] Controller will wait 30 minutes for NSD to complete after agent crash

(01/04 01:18:45.915):{0xD60} Hung agent threads detected. WaitCount = 20

(01/04 01:18:45.915):{0xD60} Requesting Domino restart

(01/04 01:48:59.243):{0xD60} Controller is stopping Domino


(01/04 01:18:45.852):{0x24A4} Thread: *** No Response *** Thread Id=0x21CC, Handle=0xA70, WaitCount=20,

LastActivityTime=01/03 21:49:51, Activities:

(01/04 01:18:45.852):{0x24A4} StartTime=01/03 21:49:51, Activity=CalendarControl::DoSyncRequest

(01/04 01:18:45.852):{0x24A4} StartTime=01/03 21:49:51, Activity=Processing NEW_MESSAGE

(01/04 01:18:45.852):{0x24A4} StartTime=01/03 21:49:51, Activity=Processing work for Grethel


(01/04 01:18:45.852):{0x24A4} StartTime=01/03 21:49:51, Activity=Processing work for Id=90

(01/04 01:18:45.852):{0x24A4} Thread 21CC, utilization=0.0208%, failed health check 20 times

(01/04 01:48:59.415):{0x18DC} Starting shutdown of BlackBerry Enterprise Server

(01/04 01:49:28.666):{0x18DC} Shutting down BlackBerry Server

(01/04 01:49:32.026):{0x18DC} BlackBerry Mailbox Agent for Lotus Domino shutdown complete

The known Issue:

These are known issues and there is no resolution time frame currently. I interpret that as that RIM is robably

workign on it, but will not commit publicly to a date by which this will be fixed.

How to Circumvent:

The simplest way to fix this is to simply change the ‘WaitToRestartAgentOnHung’ registry entry to [0] to keep

the controller from trying to restart the server. If you still have hung threads then you will need to recycle

your BES on a regular basis and eliminate the reasons for hung threads (other than the calendar synch request