Security Inside Out #USGRockEagle13

Eddie Carter and Orrin Char, Oracle

    • Identity management and security and access management.
    • Eddie wore a UGA shirt. Guy in front of me made fun of him obviously not wanting to sell to Georgia Tech. Turns out he’s from  Kennesaw. The GT-UGA rivalry knows no bounds. Love it!
    • Handout: Database firewall more auditing and ACLs than enterprise firewalls access to many hosts.
    • 67% records breached from servers. 76% breached through weak or stolen credentials. Discovered by an external party. 97% preventable with basic controls. Source: 2013 Data Breach Investigations Report.
    • Pre-1997: security issues mistakes. 1998-2007: Privilege abuse. Curiosity. Leakage. 2008-2009: Malicious. Social engineering. Sophisticated attacks. Business data theft. Loss of reputation.
    • Can be fined. Buy services for people affected by the breach.
    • DBAs are the targets. Phishing to get credentials.
    • Change is where gaps are opened. Being more available means more highly privileged users. Consultants and vendors claim they need DBA level access.
    • 80% of IT security programs do not address db security. They address outside computers such as with firewalls. More and more attacks exploit legitimate access applications and user credentials.
    • Supports SQL Server and MySQL.
    • Preventative
      • encryption : If data stolen in encrypted form, then do not have report the breach? Application should not even know it is encrypted. Network encryption now free to us. Autonegotiates with destination. No application changes. Little overhead. Integrated with Oracle technologies. Key management 2 layers. Master in hardware module or in a wallet. Wallet can be tied to hardware and accessed at restart. Data encrypted with table or column key. Table and column keys encrypted with master key.
      • redaction : Use ACLs to determine who can see. It will replace text such as on credit card numbers, SSNs, so can only see a full, partial, fixed.
      • data masking for nonproduction use : copy of production data in test with test being less secure. Masking means no longer valuable data. Finds sensitive columns through templates and convert the data so meaningless. Shuffle salaries. ID numbers randomized even partial. Randomize all but first two characters of last name. Can be two way so change for sending to a partner for process but then revert back when returned.
      • privileged user controls : Compartmentalization of commands. Prevent consultants from querying certain tables. Creates protective zones around schema objects.
    • Detective
      • activity monitoring :
      • database firewall : sits on the network. Parses SQL to determine the intent. Whitelist and Blacklist and exception list. If none, then alerts security to it and potentially added to a list. Have a learning and blocking mode. Can return empty result list to a hacker so thinks there are no records.
      • auditing and reporting : analyze audit-event data. Central audit repository so hacker unaware. Default and custom reports.
      • conditional auditing framework : if-this-then-that
    • Administrative
      • privilege analysis : privilege capture mode. report on what actual privileges and roles that are used. Revoke unnecessary.
      • sensitive data discovery : scan Oracle for sensitive fields. data definitions.
      • configuration management : discover and classify databases. scan for secure config.

Minimal Downtime #USGRockEagle13

Marc Pare, Oracle

    • Why work on weekends or late at night? (For us because of director policy because accidents during the day changes untenable with clients. But this is targeted to Banner to not GeorgiaVIEW.)
    • Weblogic High Available Topology (Maximum Available Architecture)
      • Client > Apache Server or OHS > WLS Cluster (> Admin Server) > RAC Cluster (> Oracle DB)
      • Why is this not being used?
    • Impressed someone from Oracle is so in touch with problems of Banner on Oracle and solutions.
    • Use Oracle Enterprise Manager to monitor middle tier. Uses mBeans and exposes most items except for some provisioning.
    • Cloning. Template builder. Add new nodes or setup test instances.
    • Rolling upgrades: Bring down first, patch, bring back up; find second, bring down, patch, bring back up. If it encounters a problem, then it rolls back the patch and brings it back up.
    • When experimenting, bring back up with end user access blocked, test and verify. Only when happy, enable end user access. We should consider giving campus administrators access to our direct nodes for testing so they can verify before release. Or even better really knock out the automated testing.

Content Migration Progress Tracking

When moving hundreds of thousands of courses between WebCT Vista and Desire2Learn, keeping track of what made it through which stage seems like an obvious hindsight thing to do. I added that last bit because we started to notice where things fell between the cracks starting to pile up. The basic process…

    1. Through Oracle SQL we populate a middleware database with those courses which meet acceptable criteria.
      CRACK: With the wrong criteria, courses are not even selected.
    2. Through Oracle SQL we generate XML listing in 50 count sets of the courses.
      CRACK: A subset of data loaded into the database may be extracted.
    3. A shell script automates running the WebCT command-line backup process to create a package for each course.
      CRACK: The command-line backup fails on some courses.
    4. Desire2Learn scripts pick up the files and convert WebCT formatted packages to Desire2Learn.
      CRACKS: Too big fail. Too long paths fail. This step can fail to create CSV files for the next step.
    5. Converted packages are placed in a queue to be imported into Desire2Learn.
      CRACKS: WebCT Vista courses can have 1,000 characters in the name and D2L fails if there are more than 50. Courses named the same as a previously loaded one but with a different file name loads both into the same course.

So, there are apparently five different stages and eight potential failures to track and no end-to-end tracking to even know what is missing. Which means inventing something to check logs for errors. 

First thing, I tried writing SQL to create an ordered list of the courses that are available.

The WebCT backups were a little tougher to convert into a useful list. The file names follow the format of Course_Parent_1234567890.bak. They were also in various directories, so I ended up doing something like this to get a list of the files, strip off the parents & time stamps, strip off the directories, and order it.

ls */*.bak | awk -F_ ‘{print $1}’ | awk -F\/ ‘{print $2}’ | sort

So I have two ordered lists. Anything in the D2L one and not in the WebCT one ought to be the work of clients making their own stuff. Anything in the WebCT one and not in the D2L one ought to be my missing ones. Only almost every line is a mismatch.

Visually comparing them, I realized the same courses had in effect different names. All the spaces were converted to underscores. So I ended up adding a sed to convert spaces to underscores after the sort.

Then I wrote some more scripts.

    • Go through the logs from #4 and #5 to display the errors. With it I was able to compare my list of missing with the errors and confirm why they did not come through.
    • Some of the cracks can be addressed by making new import files. So I wrote a script to add those lines to a redo.csv file. Touch up the problems and go.

Basically at this point it only covers 3-5. At some point I am going to have to check steps 1-5. 

Another Way to Verify Cookie Domain

Just finished a Oracle WebLogic Server 11g: Administration Essentials class today. So there are lots of things floating about in my head I want try. (Thankfully we have lots of development clusters for me to break beyond repair. Kidding. Sorta.)

One of the common support questions Blackboard asks for those of us CE/Vista clients running a cluster is whether we have changed the cookie domain in weblogic.xml. This has to do with specifying where the JSESSIONIDVISTA cookie is valid. By default the value in the weblogic.xml file is set to .webct.com which is not valid anywhere (not even Blackboard.com). One of the install steps is if one is running a cluster, in the administrator node Weblogic Domain directory run some commands to extract the weblogic.xml, edit it, then run some commands to add it back to the WAR file. Placing a “REFRESH” empty file on all the managed nodes deletes the staged and cached copies of the WAR.

No big deal and easy.

Except when it isn’t?

Occasionally someone will distrust your work and want you to verify the right setting is there. Normally they say to extract the weblogic.xml again and verify it is correct there. I had a thought. Why not verify in each managed node’s cache it has the correct value?

It is easier than it sounds. In the Weblogic domain directory (where setEnv.sh is located), change directories to

$WL_DOMAIN/servers/node_name/tmp/_WL_user/webct

(NOTE: Anything I put in bold means it is custom to you and not something I can anticipate what you would use there.)

Here I just used these greps to look for my domain. If I get results for the first one, then all is well. If I don’t get results for the first, then the second one should confirm the world is falling because we are missing the cookie domain.

grep “.domain.edu” */war/WEB-INF/weblogic.xml
grep “.webct.com” */war/WEB-INF/weblogic.xml

Since we use dsh for a lot of this kind of thing, I would use our regex for the node name and add on the path pieces in common. I have not yet studied the pieces between webct and war to know for certain who they are derived except to say they appear to 6 characters long and sufficiently random as to not repeat. Any [ejw]ar exploded into the cache appears to get a unique one. So this might work?

grep “.domain.edu” $WL_DOMAIN/servers/node_name_regex/tmp/_WL_user/webct/??????/war/WEB-INF/weblogic.xml

If not, then try:

cd $WL_DOMAIN/servers/node_name_regex/tmp/_WL_user/webct/
&& pwd && grep “.domain.edu” */war/WEB-INF/weblogic.xml

I’m envisioning this method to verify a number of different things in the nodes. It especially confirms the managed node received what I expected not that the admin node has the correct something.

Weblogic Diagnostics

I noticed one the nodes in a development cluster was down. So I started it again. The second start failed, so I ended up looking at logs to figure out why. The error in the WebCTServer.000000000.log said:

weblogic.diagnostics.lifecycle.DiagnosticComponentLifecycleException: weblogic.store.PersistentStoreException: java.io.IOException: [Store:280036]Missing the file store file “WLS_DIAGNOSTICS000001.DAT” in the directory “$VISTAHOME/./servers/$NODENAME/data/store/diagnostics”

So I looked to see if the file was there. It wasn’t.

I tried touching a file at the right location and starting it. Another failed start with a new error:

There was an error while reading from the log file.

So I tried copying to WLS_DIAGNOSTICS000002.DAT to WLS_DIAGNOSTICS000001.DAT and starting again. This got me a successful startup. Examination of the WLS files revealed the the 0 and 1 files have updated time stamps while the 2 file hasn’t changed since the first occurance of the error.

That suggests to me Weblogic is unaware of the 2 file and only aware of the 0 and 1 files. Weird.

At least I tricked the software into running again.

Some interesting discussion about these files.

  1. Apparently I could have just renamed the files. CONFIRMED
  2. The files capture JDBC diagnostic data. Maybe I need to look at the JDBC pool settings. DONE (See comment below)
  3. Apparently these files grow and add a new file when it reaches 2GB. Sounds to me like we should purge these files like we do logs. CONFIRMED
  4. There was a bug in a similar version causing these to be on by default.

Guess that gives me some work for tomorrow.
🙁

Preserving CE/Vista Settings

I’ve been asked for notes about this a few times. So here’s a blog post instead.
🙂

A coworker is working on scripting our updates. We lost the Luminis Message Adapter settings in applying the patch to the environment we provide to our clients. Fortunately, those settings are maintained by us not our clients. So I pushed those settings back very easily. Unfortunately, it points to the need to capture the settings for the potential purpose of restoring the settings.

In Oracle databases, this is pretty easy. As the schema user, run the following. It does some intentional things. First, we have multiple institutions, so the breaks make identifying which institution easier. Second, the same label for multiple forms gets confusing, so I am sorting by setting description id under the theory these ids are generated at the time the page is created, so the same tools will float together. (The last modified time stamp is probably unnecessary, I used it in an earlier version and left it just in case Vista for whatever reason added a new setting for the same label instead of modifying the existing one.) This can be spooled both before and after the upgrade. Use diff or WinMerge to compare the versions. Anything lost from the before version should be evaluated for inclusion adding back to the settings.

col lc_name format a50
col setting_value format a80
col label format a80
col lock format 999
col child format 999

clear breaks computes
break on lc_name skip 1

select learning_context.name lc_name, settings_description.label, settings.setting_value,
settings.locked_flag “lock”, settings_description.inheritable_flag “child”
from learning_context, settings, settings_description
where settings.settings_desc_id = settings_description.id
and settings.learning_context_id = learning_context.id
and learning_context.type_code in (‘Server’,’Domain’, ‘Institution’,’Campus’,’Group’)
order by learning_context.name, settings.settings_desc_id
/

An example of the multiple forms issue is external authentication. CE/Vista provides an LDAP (A) and an LDAP (B). The settings_description.label for both is contextmgt.settings.ldap.source. The settings_description.name for both is source. It looks like each of the two identical labels has a different settings.settings_desc_id value depending on whether it is A or B. To me it seems lame to use the same label for two different ids.

The most vulnerable parts of the application to lose settings during an update are the System Integration settings. A mismatched Jar on a node will wipe all the settings associated with that Jar.

However, I can see using this to capture the settings as a backup just in case an administrator or instructor wipes out settings by mistake. Yes, this is scope creep. Create a backup of the settings table to actually preserve the settings.

create table settings_backup_pre_sp2hf1 tablespace WEBCT_DATA as select * from settings;

Contexts: As a server admin, I maintain certain settings and push those down. Each client has control over some other settings and may push those down from the institution context. Maybe some are creating division and group admins? Maybe some instructors are changing things at the course or section levels. I may end up capturing everything?

Restoration: The whole purpose of preserving the settings is to restore them later. There are a couple methods in theory:

  1. Providing the settings to a human to re-enter. The labelling issue makes me question the sanity of trying to explain this to someone.
  2. Update the database directly would just need settings.id ensure it is the right location. Maybe dump out the settings in the format of an update command with labels on each to explain the context? Ugh.

If settings were not so easily lost, then this would be so much easier.

View: Another table of interest is the settings_v view. (Redundant?) The only reason I don’t like this view is it reports the values for every learning context which makes reporting off it much, much longer. For example, the encryption key for a powerlink is listed 8 places in settings/settings_description and 18,769 places in settings_v.

IMS Import Error When Node Is Down

This is what I got when a node was down while I attempted to do an IMS import in Blackboard CE/Vista.

Failed to upload files, exiting.
Cause could include invalid permission on file/directory,
invalid file/directory or
repository related problems

The keywords permission, file, and directory in this would have sent me anywhere but to the right place. The keyword repository made me suspicious the node had a worse issue than just bad permissions. So I looked for the most recent WebCTServer log and found it to be a week old. Verifying the last messages in the log confirmed it had been down for a week.
🙁

To see anything in the log questioning whether or not the node was running would have saved me lots of time this morning.

Added to my .bashrc a couple lines to provide a visual indicator how many are running.

JAVA_RUNNING=`ps -ef | grep [j]ava | grep -c [v]ista`
echo ”  — No. Vista processess running = $JAVA_RUNNING”

Better might even be to have it evaluate whether less than one or more than two (or three) are running. If so, then put something obvious the world is falling. Maybe later. Took me just a couple minutes to write and test what I have. The rest will come after I decide what I really want. 🙂

Also, it wasn’t running because a coworker had run into a situation where the fifth node would not start. She thought maybe it was because the number of connection Oracle would accept was not high enough. I suggested a simple test would be to shut down a node and see if the problem one suddenly works. I happened to be working with the one she shut down for the test. It happens she had just started a script to bring them up when I asked.

User Interface v. SQL Reports and Tracking

Blackboard Vista tracks student activity. This tracking data is viewed as a critical feature of Vista. Our instructors depended on the information until we revoked their ability to run reports themselves due to performance issues. Campus administrators can still generate reports (though some still fail). We doubt the solution to this is Blackboard improving the queries to create the reports. We favor deleting tracking data (data preserved outside of Vista) to resolve the performance issues.

We developed SQL reports to look at the tracking data where the user in question was not a student. Yes, the data is limited, but in determining when and where a user was active, can help determine where to look in logs. When we hit the performance issues we started using these reports where the user interface reports failed to generate.

My understanding was the user interface and SQL reports on tracking were the same. Both looked at the same data. The user interface reports were just sexier wrapped in HTML and using icons. I compared a user interface report to a SQL report. Just prior to doing this, I was thinking, WebCT was stupid for not tracking when students look at the list of assessments. Turns out “Assessment list viewed” was tracked in the user interface all along but was missing in our sqlplus queries. WTF?

The data has to be there. The problem has to be our approach in sqlplus is inadvertently excluding the information from the reports. Because these reports must be accurate, I’ll crack this nut… Or become nuts myself.

CRACKED THE NUT: So, part of the data WebCT collected was the name of pages. There is a page name table which was inner joined to the user action table. So pages without a name were not reported. George suggested an outer join. I placed it on the page name table which now lets us see the formerly missing tracked actions. For the specific case where I found this, I now get all the missing actions.

Considering a Blackboard (it’s their problem now) feature request to ensure every page in the application has a title. I consider it developer laziness (someone else said worthlessness) that some pages might not have something so core and simple.

ANOTHER TRICK: Oracle’s NVL function displays a piece of text instead of a null value. Awesome for the above.

links for 2007-10-14