Rants, Raves, and Rhetoric v4

Chat Connection Resets

Sorry, this was originally supposed to be published a few weeks ago. I’m just getting around to posting it. – Ezra

Tests after applet patch in development reported chat failures. Chat uses an applet, so I was concerned and investigated the problem. The usual culprits were not affected.

  1. setEnv.sh had WEBCT_CONFIG_OPTIONS set to start chat on correct nodes.
  2. customconfig/ChatServer-config.xml had SSL key in correct location.
  3. SSL key was the correct key.
  4. SocketServer logs were not rolling every minute.

So chat looked correctly setup on our end.

On the web server nodes where chat runs, I changed directory into the logs directory. Now that I looked in SocketServer logs for a web node running chat, I noticed some lines had “user” in them. I guessed the value after user was the id in the person table for the user who experienced the error. It was easy enough to look for that id in the database and confirm my guess.

This first grep of SocketServer log isolates the sixth item between colons teases which is that person id. The second grep reports the whole line including the time stamp.

grep “[0-9]-user” SocketServer.??????????.log | awk -F\: ‘{print $6}’ | sort | uniq
grep “[0-9]-user” SocketServer.??????????.log

With the results of the first grep just above, replace the numbers in bold with a comma delimited list.In the database I looked for these ids with this SQL. Substitute your own id numbers in bold. They turned out to be the id numbers for users experiencing the problems. I recognized the accounts as people who had been testing.

select lc.name, p.webct_id, p.id from person p, learning_context lc
where p.learning_context_id = lc.id and p.id in (99999,999999)
order by lc.name, p.webct_id
/
Since I was researching just on one environment and a development one at that, I was curious
about the two kinds of errors:
  1. Connection reset
  2. Session aborted by remote peer.

It looked like in production still had some of the same errors. The one session I profiled appeared to show the errors in chat start about 2 hours after the last action by the user in the webserver.log. This is the time the TCP profile cuts off the chat.

The cause of these more frequently on development in chat turned out to chat having a 5 minute profile instead of the correct 2 hour one. Now everything is consistently having this problem when users let their sessions expire.

Hopefully they are just leaving their windows open on personal computers and not public spaces.


Posted

in

by

Tags:

Comments

Leave a Reply