IBM Connections: Metrics / Cognos and the HTTP timeout . . . .


Let me preface this post with a statement:

I really, really don’t like Cognos. Metrics is a pain as well.
And .. in case it lacked sufficient emphasis … I really, R E A L L Y don’t like  Cognos.

 

This is an interesting one that I have been battling with for quite some time at my current client. We had been running into errors with Cognos reports not finishing and the only errors we saw were in the syemOut.log files for HTTP sessions suddenly being reset:

 [3/5/13 11:51:05:341 EST] 000000b8 CognosBIReque 3 com.ibm.connections.metrics.reportgeneration.cognos.CognosBIRequestProcessor processCognosBIRequest post jobTemplateSearchPath=/content/folder[@name=’IBMConnectionsMetrics’]/package[@name=’Metrics’]/folder[@name=’static’]/jobDefinition[@name=’jobtemplate5′]

[3/5/13 11:56:05:532 EST] 000000b8 SystemErr R java.net.SocketException: Connection reset

Metrics sends Cognos 5 HTTP requests for each report time range – these correspond to the Jobtemplate1 – Jobtemplate5 reports in Cognos that are called and executed. These HTTP requests are synch calls so they have to stay connected and wait until the Jobtemplate call is finished so metrics can update the process. for all successful calls yo will see HTTP status 200 results and that is exactly what you want. We were seeing the above resets for calls to the Jobtemplate4 and Jobtemplate5 calls – it was KILLING ME.

Metrics was not at fault – it has it’s timeout settings in the metrics-config.xml file (secsPerRequest) and that was set to 3600 so it was off the list of culprits.

We reset the HTTP servers plug-in.xml setting for timeouts (ServerIOTimeout) first to 400 seconds and then to 600 and we saw no change.

We then did a test – we changed the interService href in the LotusConenctions-config.xml file as follows – btw that only works because we have a single Cognos server, not a clustered pair:

sloc:serviceReference bootstrapHost=”” bootstrapPort=”” clusterName=”admin_replace” enabled=”true” serviceName=”cognos” ssl_enabled=”true”> 
<sloc:href>
 
<sloc:hrefPathPrefix>/cognos</sloc:hrefPathPrefix>
 
<sloc:static href=”
http://connect.domain.com” ssl_href=”https://connect.domain.com“/> 
      <sloc:interService href=”https://cognosserverFQHN.domain.com:9443“/> 

Drum-roll ….. Here we go, it fixed the issue, but now the progress display (“xxx% complete”) on the metrics page to be permanently stuck at 0%. What this did do was point ut to the problem …. the F5 load balancer that we in front of the dual HTTP servers. It had a permanent 5 minute http thread timeout set and was killing ANY thread that was going over 5 minutes.

 

The Takeaway:

Metrics/Cognos spawns exactly 110 jobs for each Community metrics update request, many of these requests will go over 5 minutes and you should check that any device/server in your network has a higher HTTP timeout seting.