Rants, Raves, and Rhetoric v4

Missing Shutdowns

A Weblogic managed node for a development cluster failed to shutdown when our shutdown script requested. The last managed node to shutdown becomes the JMS node and triggers a rewrite of the config.xml. We have scripts in place to check for the config.xml changing and alert us. Since I am the on call this week, I received the page.

I thought it would be good enough to copy the config.xml into place. Since it would be restarted that night by the usual shutdown script, the cluster would pick up the new config.xml and all would be well again. Ha! Normally a node has an entry in WebCTServer.99999999999.log stating something like “Server shutdown has been requested by system.” for each shutdown then the log ends. These are completely missing. That intrigued me as expected these to be present but some reason for why it failed. Instead, it was like no request was sent.

The shutdown script log showed this error where it ought to show the node was shutdown successfully. (Actual names replaced with CLUSTER_NAME and NODE.)

Error:CLUSTER_NAME:Name=NODE,Location=NODE,Type=ServerConfig.

The shutdown script call Blackboard’s stopWebCTServer.sh which just calls another script which takes various inputs to ultimately call:

java -classpath $CLASSPATH weblogic.Admin -url t3://$HOSTNAME:$PORT -username $WL_USER -password $WL_PASS SHUTDOWN

CLASSPATH= can be found in the start scripts and has multiple entries in setEnv.sh. Just run “sh setEnv.sh” to set this for your session.
HOSTNAME= server hostname
PORT= HTTP port where Weblogic listens
WL_USER= Weblogic user
WL_PASS= Weblogic user’s password

Instead of shutting down the node it just gave an irrelevant status. This one node gave a Weblogic command-line response which seems to mean a botched connection but one where something was listening. The not listening or wrong address error is:

Failed to connect to t3://$HOSTNAME:$PORT: Destination unreachable; nested exception is: java.net.ConnectException: Connection refused; No available router to destination.)

Yay, another case to figure out to correctly handle.

I killed the process. For safe measure I did a “touch REFRESH” in $WL_DOMAIN so the node would dump anything it had cached and download new these things. Since it started up this morning in the normal restart, I think it is fixed.

Other than JMS migrating and failing to do so, I don’t think this caused any problems for users. Just so very odd.

P.S. weblogic.Admin is deprecated in Weblogic 9, so it is interesting Blackboard still makes use of it.

Comments

Leave a Reply