IBM Connections: Metrics / Cognos and the HTTP timeout . . . .


Let me preface this post with a statement:

I really, really don’t like Cognos. Metrics is a pain as well.
And .. in case it lacked sufficient emphasis … I really, R E A L L Y don’t like  Cognos.

 

This is an interesting one that I have been battling with for quite some time at my current client. We had been running into errors with Cognos reports not finishing and the only errors we saw were in the syemOut.log files for HTTP sessions suddenly being reset:

 [3/5/13 11:51:05:341 EST] 000000b8 CognosBIReque 3 com.ibm.connections.metrics.reportgeneration.cognos.CognosBIRequestProcessor processCognosBIRequest post jobTemplateSearchPath=/content/folder[@name=’IBMConnectionsMetrics’]/package[@name=’Metrics’]/folder[@name=’static’]/jobDefinition[@name=’jobtemplate5′]

[3/5/13 11:56:05:532 EST] 000000b8 SystemErr R java.net.SocketException: Connection reset

Metrics sends Cognos 5 HTTP requests for each report time range – these correspond to the Jobtemplate1 – Jobtemplate5 reports in Cognos that are called and executed. These HTTP requests are synch calls so they have to stay connected and wait until the Jobtemplate call is finished so metrics can update the process. for all successful calls yo will see HTTP status 200 results and that is exactly what you want. We were seeing the above resets for calls to the Jobtemplate4 and Jobtemplate5 calls – it was KILLING ME.

Metrics was not at fault – it has it’s timeout settings in the metrics-config.xml file (secsPerRequest) and that was set to 3600 so it was off the list of culprits.

We reset the HTTP servers plug-in.xml setting for timeouts (ServerIOTimeout) first to 400 seconds and then to 600 and we saw no change.

We then did a test – we changed the interService href in the LotusConenctions-config.xml file as follows – btw that only works because we have a single Cognos server, not a clustered pair:

sloc:serviceReference bootstrapHost=”” bootstrapPort=”” clusterName=”admin_replace” enabled=”true” serviceName=”cognos” ssl_enabled=”true”> 
<sloc:href>
 
<sloc:hrefPathPrefix>/cognos</sloc:hrefPathPrefix>
 
<sloc:static href=”
http://connect.domain.com” ssl_href=”https://connect.domain.com“/> 
      <sloc:interService href=”https://cognosserverFQHN.domain.com:9443“/> 

Drum-roll ….. Here we go, it fixed the issue, but now the progress display (“xxx% complete”) on the metrics page to be permanently stuck at 0%. What this did do was point ut to the problem …. the F5 load balancer that we in front of the dual HTTP servers. It had a permanent 5 minute http thread timeout set and was killing ANY thread that was going over 5 minutes.

 

The Takeaway:

Metrics/Cognos spawns exactly 110 jobs for each Community metrics update request, many of these requests will go over 5 minutes and you should check that any device/server in your network has a higher HTTP timeout seting.

 

WebSphere: wasadmin – how to recover a lost password


I had the case recently where was working on a WebSphere 7 environment that was being uncooperative and the previous creator had not taken any backups and the documentation was rather thin . . . one of my most favorite scenarios to walk into.

I found myself with three sets of documents that all stated a different wasadmin password and none of them worked and the client had never bothered to set up any secondary WBM console admins. A dilemma I had to solve.

Encoding vs Encrypting

You might know that all sensitive information about security is entered into the security.xml document that can be found at [$WAS_HOME]/profiles/[profile name]/config/cells/[cell name] folder. In Windows this might equate to:

C:\IBM\Websphere\AppServer\profiles\Dmgr01\config\cells\cell01\security.xml

Linux/AIX would likely be something like:

/usr/IBM/WebSphere/AppServer/profiles/Dmgr01/config/cells/cell01/security.xml

This document contains the name and password information for the primary admin account for the WebSphere cell – in most cases that will be the default account [wasadmin]. The password is, however, not encrypted but rather encoded. Encryption would use an encryption key to hash the password and without that key you would not be able to retrieve it. Encoding however is a whole other deal – the coding/decoding information is integral to WebSphere itself and is the same for any install anywhere in the world. That means if you encode the same password anywhere, the resulting hash will be exactly the same no matter which server you do it on.

Now, this is not great security in and upon itself and I will not go into details on this – other than it is really important to lock down the physical access to to any WebSphere server you are in charge of, all the way down to file rights …. or you might regret it at some later time.

How to Decrypt:

I am not the first blogger out there that is writing about this, but nobody every wrote it out for Windows servers so I am going to concentrate on that OS right now, and most of the blog entries out there are for older versions and the proces has changed since. Here some of the articles that I have read over the last few years Robert Farstad, Robert Maldon,  and a few more . . . . google the conent here and you will find them.

Here some basic details:

  • WebSphere Version: 7.0.0.21 (the process is the same for any V 7.x server)
  • $WAS_HOME=C:\IBM\WebSphere\AppServer

Step 1: find the wasadmin information

Open the security.xml, find the entry for the encrypted password: it always starts with {xor}, in my case it is:

userId=”wasadmin” password=”{xor}LDo8LTor”

Step 2: Find your WAS Version Specific Java Plug-in Folder:

In my case it was:

C:\IBM\WebSphere\AppServer\deploytool\itp\plugins\com.ibm.websphere.v7_7.0.2.v20110524_2321\

Step 3: Find your java home and open a command prompt

In my case this equates to

C:\IBM\WebSphere\AppServer\java\bin\

Change to this folder in the command prompt you opened.

Step 4: Run the Password Encoder/Decoder:

This is where you need the folder location and the encoded password you looked up in the previous steps.

In C:\IBM\WebSphere\AppServer\java\bin\ run the following command

java – java.ext.dirs=C:\IBM\WebSphere\AppServer\deploytool\itp\plugins\com.ibm.websphere.v7_7.0.2.v20110524_2321\wasJars\ -cp securityimpl.jar:iwsorb.jar com.ibm.ws.security.util.PasswordDecoder {xor}LDo8LTor

This above command is one long command string (it might wrap depending on your screen) and it will create the following output in the command prompt:

encoded password == “{xor}LDo8LTor”, decoded password == “secret”

The process for Linux/AIX is basically the same, however the folder structure will be different. The commands are about the same but depending on which version of Linux you are running the Java switches might need some fidlding – though the base does not change.

Security!

Next to being helpful to retrieve lost passwords, this article hopefully also shows you just how important good physical security for your WebSphere servers is – don’t think that just because you have a log-on or locked down root that you are safe.

WebSphere: Errors installing Plug-in fix pack on IHS V7.x


This was a new one, not even the IBMers that I consulted with had run into this before.

I have been working at a client site on a large IBM Connections project since last year – V3.0.0, upgrade to 3.0.1, upgrade to 3.0.1.1 … now multiple code drops for V4 beta installs and preparation to get gold code V4 up as soon as possible (once it is released). In the course of the last year I have probably installed and upgraded more WAS and IHS servers that I have in several years previously – loving every moment of it!

Problem:

Today I had a new V 7.0 IHS on AIX to set up and we were running into issues installing the Plug-in fix for 7.0.0.21. The IHS FPO went without a problem, but the plug-in did not work. Errors, errors, errors:

java.lang.NullPointerException
        at com.ibm.ws.install.ni.framework.simplugins.SimVerifyFilePermissionsPlugin$ValidateFilePermissions.checkFilePermissions(SimVerifyFilePermissionsPlugin.java:245)
        at com.ibm.ws.install.ni.framework.simplugins.SimVerifyFilePermissionsPlugin$ValidateFilePermissions.checkFilePermissions(SimVerifyFilePermissionsPlugin.java:317)
        at com.ibm.ws.install.ni.framework.simplugins.SimVerifyFilePermissionsPlugin$ValidateFilePermissions.run(SimVerifyFilePermissionsPlugin.java:139)
java.lang.NullPointerException
        at com.ibm.ws.install.ni.framework.simplugins.SimVerifyFilePermissionsPlugin$ValidateFilePermissions.checkFilePermissions(SimVerifyFilePermissionsPlugin.java:245)
        at com.ibm.ws.install.ni.framework.simplugins.SimVerifyFilePermissionsPlugin$ValidateFilePermissions.checkFilePermissions(SimVerifyFilePermissionsPlugin.java:317)         at com.ibm.ws.install.ni.framework.simplugins.SimVerifyFilePermissionsPlugin$ValidateFilePermissions.run(SimVerifyFilePermissionsPlugin.java:139)

Nothing was working … I looked at fie permissions, ownership etc. – no change. root, or no root – it failed. I did some searching and after a while came across this tech noteswg21408430.

the errors were close enough – I was using the latest version of the IBM UpdateInstaller (at this time 7.0.0.23) but I decided to wipe out the UDI, install it once more new into a DIFFERENT folder (queue the drum roll) …. that made the difference.

So, sometimes it is not the tool being installed, but rather the tool doing the install that is at fault.  AND – UNIX file permissions are not always at fault either. Poor little Unix – there’s good boy!!

WebSphere – The backupConfig script your friend


This is just a short post – But the built-in utility you get with the backupConfig script is worth looking into for everybody!

if you have worked with WebSphere for any significant time you have come across the built-in backup and restore utility that each WebSphere server has by default: [backupconfig.bat] or on Unix/Linux [backupconfig.sh] and the corresponding restore scripts [restoreConfig.bat] and [restoreConfig.sh]. 

At my current client we are working on application customizations and testing them on new servers. This is where the backupConfig comes in handy as it does not just back-up your application server(s) but the deployment manager and all the node(s) configuration as well – so you can replay a whole server configuration along with the installed applications and any application specific configuration. backupConfig can also be used to migrate servers from one piece of hardware to another (or VM server or, …. any combination is possible).

The process is simple: find the scripts in the [/bin] folder of either the Dmgr or the node,  execute the script and it will check your servers configuration, stop all server instances including nodeagents and then create a zip file of ALL files necessary for the back-up – and all of this wonderfulness is  unencumbered by the human thought process … 🙂

Whenever I am about to install a new application, install any fixes or make configuration changes to a WebSphere  server I run the backupConfig script once first and keep a copy of the zip file it created on a local machine – just to be safe.

Where to run:

Depending on your architecture you can run it on the Deployment Manager and/or all nodes. When you run it on the Deployment manager it will grab all the configuration for the Dmgr, nodes and application servers in one go. This is essential if you have to restore an environment. On a managed node (separate HW) it will capture the configuration and applications installed on that physical node – so you might need to run it on each physical WebSphere server in your environment once to get a total base back-up. Once I have that I usually just run the scripts on the deployment manager as most of the work happens there anyway and all changes are synchronized out to the nodes.

Restores:

On windows it is simple – just run the restoreConfig script and tell it which zip file to use … and presto. On Unix/Linux you have to think a bit more. The backupConfig script does not keep any file rights or ownership information, when restoring it basically sets the file ownership to the account being used to run the script – so make sure you are using the same account and have read/write rights to the folders involved. 

Here is the link to the documentation in the WebSphere Infocenter – I hope you find it useful and make a back-up of your servers soon!

Get Free Training! New Complimentary IBM WebSphere Education Courses Available from the Global WebSphere Community


I think everybody following me knows that I am telling everybody to learn WebSphere – it will come your way like it or not. Here are some free training courses/videos that you can look at – all you have to do is become a member of the Global WebSphere Community – and PRESTO – free training for you. I will be looking at this later tonight …

Dear WebSpherians,Last year, we polled our Global WebSphere Community members asking you what WebSphere topics you were most interested in having additional training information on from the GWC. The results came back showing a strong interest in WebSphere Application Server and WebSphere MQ. The Global WebSphere Community is pleased to announce the availability of these complimentary IBM WebSphere Education courses to our GWC members.We are pleased to offer the following courses to our members:

  • WebSphere MQ v7 Installation and Configuration
  • WebSphere MQ V7 High Availability Considerations
  • Using WebSphere MQ V7 Traces, Error Logs, and Failure Data Capture Files
  • WebSphere MQ V7 Clustering
  • WebSphere Application Server V8 Architecture
  • WebSphere Application Server v8 Workload Management

Click here for more information or to view one of the courses

Sincerely

The Global WebSphere Community Management Team

Join in the discussion
Follow us on TwitterFacebookFacebook
This email was sent to victor@toalsys.com.If you no longer wish to receive information about the Global WebSphere Community, please inform us here.20 Carematrix Drive, Dedham, MA 02026, USA

N-11691884507-ADA79BCFCF96201E62F8F1BD0CB67ED1.jpg

End of Support for IBM WebSphere Application Server V6.1 is 30 Sept 2012


This came across my desk the other day – if you are still running WebSphere servers in the 6.1 code stream you will need to upgrade son. I don’t think many of those servers are still out there unless you are still running a few applications that require older versions of WAS to run.

 

This is fun …. it’s upgrade time again!!!!

Java: it’s your fault! Connections on AIX


Just a quick one during my lunch hour …. ran into an issue yesterday at my current client that shows once more that when you do not work with a specific OS for a while, you really loose your touch for the small details.

The Saga of WAS, AIX and the damn Java Cache

We installed iFixes yesterday and that all went well. However the syncing of the nodes (kicked off from the Dmgr console) took forever, and then one of the app clusters on one of the nodes would not restart (it eventually did after 4 hours).

To clean the system and get rid of any old temp files we:

  • Stopped all WebSphere servers:  /WAS_Profile/bin/stopServer.sh xxx -user xxx -password ****
  • Stopped the NodeAgent:  /WAS_Profile/bin/stopNode.sh -username xxxx -password ****
  • Cleaned all temp files /WAS_Profile/temp  and /wstemp (everything inside of both folders)
  • Ran  /WAS_Profile/bin/osgiCfgInit.sh
  • Ran  /WAS_Profile/bin/clearClassCache.sh

Note: you can also use the command “./stopNode.sh –stopservers -username xxxx -password ****” to shut down the node agent AND the servers at the same time. We wanted to see the individual servers come down as we had issues with one of them.

We then tried to restart the node agent ….. and it failed. We found this in the startserver.log for the node agent:

ADMU3011E: Server launched but failed initialization

Damn, nothing worked … re-cleaned, checked, cursed, cried ….. and then opened a Sev 1 ticket with IBM support online. (had a REALLY fast response – thanks guys!)

The Cavalry to the Rescue …

The Connections support guy had a look at the logs and brought in a WAS support specialist who had me repeat the clean-ups steps above AND clean this location as well (everything in this folder, but not delete the folder itself):

/tmp/javasharedresources

The IBM tech thinks we had a corrupted system level java cache that was causing the issue.  After that a ./startNode.sh worked like a charm and the servers started fine as well.

Total Clean-up

Incidentally, we ended up shutting each AIX WAS server (including the Dmgr) down one by one so we do not have a service outage and ran the above maintenance once more. On the nodes we also ran a “./synchNode.sh” with the node agent turned off – just to eliminate any possibility of the nodes maybe being out of synch (thanks for the idea Stuart).

We will also be going through our automated scripts to test adding some more items to them (email notifications when individual steps are done, add the “/temp/javasharedresources” to the list of folders to be cleaned,  etc.).

Lessons to be learned:

  • When you don’t work with an OS for a while you forget the important SMALL stuff (/tmp/javasharedresources) – I had run into this very issue a few years ago and totally forgot about it. I actually did not remember it until this morning, the day after.
  • When in doubt – call support RIGHT AWAY, if for no other reason than to validate your thought process is correct and you are not barking up the wrong tree. We did not wait very long to call, but sometimes even 5 minutes can mean the difference between failure and success.