Missing Shutdowns

A Weblogic managed node for a development cluster failed to shutdown when our shutdown script requested. The last managed node to shutdown becomes the JMS node and triggers a rewrite of the config.xml. We have scripts in place to check for the config.xml changing and alert us. Since I am the on call this week, I received the page.

I thought it would be good enough to copy the config.xml into place. Since it would be restarted that night by the usual shutdown script, the cluster would pick up the new config.xml and all would be well again. Ha! Normally a node has an entry in WebCTServer.99999999999.log stating something like “Server shutdown has been requested by system.” for each shutdown then the log ends. These are completely missing. That intrigued me as expected these to be present but some reason for why it failed. Instead, it was like no request was sent.

The shutdown script log showed this error where it ought to show the node was shutdown successfully. (Actual names replaced with CLUSTER_NAME and NODE.)

Error:CLUSTER_NAME:Name=NODE,Location=NODE,Type=ServerConfig.

The shutdown script call Blackboard’s stopWebCTServer.sh which just calls another script which takes various inputs to ultimately call:

java -classpath $CLASSPATH weblogic.Admin -url t3://$HOSTNAME:$PORT -username $WL_USER -password $WL_PASS SHUTDOWN

CLASSPATH= can be found in the start scripts and has multiple entries in setEnv.sh. Just run “sh setEnv.sh” to set this for your session.
HOSTNAME= server hostname
PORT= HTTP port where Weblogic listens
WL_USER= Weblogic user
WL_PASS= Weblogic user’s password

Instead of shutting down the node it just gave an irrelevant status. This one node gave a Weblogic command-line response which seems to mean a botched connection but one where something was listening. The not listening or wrong address error is:

Failed to connect to t3://$HOSTNAME:$PORT: Destination unreachable; nested exception is: java.net.ConnectException: Connection refused; No available router to destination.)

Yay, another case to figure out to correctly handle.

I killed the process. For safe measure I did a “touch REFRESH” in $WL_DOMAIN so the node would dump anything it had cached and download new these things. Since it started up this morning in the normal restart, I think it is fixed.

Other than JMS migrating and failing to do so, I don’t think this caused any problems for users. Just so very odd.

P.S. weblogic.Admin is deprecated in Weblogic 9, so it is interesting Blackboard still makes use of it.

Another Way to Verify Cookie Domain

Just finished a Oracle WebLogic Server 11g: Administration Essentials class today. So there are lots of things floating about in my head I want try. (Thankfully we have lots of development clusters for me to break beyond repair. Kidding. Sorta.)

One of the common support questions Blackboard asks for those of us CE/Vista clients running a cluster is whether we have changed the cookie domain in weblogic.xml. This has to do with specifying where the JSESSIONIDVISTA cookie is valid. By default the value in the weblogic.xml file is set to .webct.com which is not valid anywhere (not even Blackboard.com). One of the install steps is if one is running a cluster, in the administrator node Weblogic Domain directory run some commands to extract the weblogic.xml, edit it, then run some commands to add it back to the WAR file. Placing a “REFRESH” empty file on all the managed nodes deletes the staged and cached copies of the WAR.

No big deal and easy.

Except when it isn’t?

Occasionally someone will distrust your work and want you to verify the right setting is there. Normally they say to extract the weblogic.xml again and verify it is correct there. I had a thought. Why not verify in each managed node’s cache it has the correct value?

It is easier than it sounds. In the Weblogic domain directory (where setEnv.sh is located), change directories to

$WL_DOMAIN/servers/node_name/tmp/_WL_user/webct

(NOTE: Anything I put in bold means it is custom to you and not something I can anticipate what you would use there.)

Here I just used these greps to look for my domain. If I get results for the first one, then all is well. If I don’t get results for the first, then the second one should confirm the world is falling because we are missing the cookie domain.

grep “.domain.edu” */war/WEB-INF/weblogic.xml
grep “.webct.com” */war/WEB-INF/weblogic.xml

Since we use dsh for a lot of this kind of thing, I would use our regex for the node name and add on the path pieces in common. I have not yet studied the pieces between webct and war to know for certain who they are derived except to say they appear to 6 characters long and sufficiently random as to not repeat. Any [ejw]ar exploded into the cache appears to get a unique one. So this might work?

grep “.domain.edu” $WL_DOMAIN/servers/node_name_regex/tmp/_WL_user/webct/??????/war/WEB-INF/weblogic.xml

If not, then try:

cd $WL_DOMAIN/servers/node_name_regex/tmp/_WL_user/webct/
&& pwd && grep “.domain.edu” */war/WEB-INF/weblogic.xml

I’m envisioning this method to verify a number of different things in the nodes. It especially confirms the managed node received what I expected not that the admin node has the correct something.