Saturday, September 7, 2013

ErrorDB - a Case for Robust Error Handling Platform

While people love to hate Oracle, there are few things Oracle does exceedingly well. Or let me be more specific – there are few things that I personally love in Oracle core RDBMS.

One of them is the standard error messages that we use in Oracle. The messages are very standard and there is a very good coding mechanism for coding different messages from different layer of Oracle. Let us take a look at different codes that Oracle uses for sending error messages to the users of Oracle. Oracle has almost 50 plus such categories, below are only those categories that are typically seen by DBAs in every day life.
·     
  •          ORA-00000 to ORA-62001 – Oracle Core RDBMS problem
  •          EXP-00000 to EXP-00113 - Errors related to Exp (data export utility)
  •          IMP-00000 to IMP-00401 - Errors related to Imp (data import utility)
  •          SQL*Loader-00100 to SQL*Loader-03120 - Errors related to SQLLDR(data loader utility)

Why Oracle uses these error messages? Simply because Oracle wants all people to understand and talk in common language. So when Satoshi Ikoma, a DBA in Tokyo, Japan, who doesn’t understand English as well as Jim Corbett, his company’s DBA in San Francisco, California gets an error “ORA-01555: snapshot too old: rollback segment number 007 name "UNDOTBS-04" too small”, he understands this exactly as Oracle intends him to understand it despite the fact that he doesn’t understand English as well.  The error means same thing to both Jim and Satoshi San despite the fact that both of them are at different level of competence over English.

If you whisper ORA-00600 (Oracle Internal errors for the uninitiated ones) in a DBA’s ears who is in deep slumber, perhaps after unmentionable number of beers, you can get him to immediately jump, sober up (to sobriety tests totally passed) and almost hyper ventilating. 


ORA-00600 Message

Why can’t we use the similar error reporting mechanism for our applications? The error reporting mechanism in some applications is somewhat mature but most of them have very “intuitive” error messages, like 0, 1 or 2.

When a service engineer or operations center duty person receives an alert stating “EDW job 4403 errored out with error -1”. This means that either the service engineer starts digging around in knowledge base documents to figure out if someone has been kind enough to mention what -1 means or he starts praying that God miraculously dawn him the wisdom to figure out this error code.

Like I said above, many applications are far ahead of others on this curve. They not only report a very informative error message but also have different coding.

How can we have all the applications to follow a very loose framework which also becomes a platform?

Creation of an error reporting platform could be our answer.

Imagine if we have a something like ErrorDB. It could provide a framework that applications could use to register their application with. Once registered, the applications could use meta data tables within ErrorDB to insert rows for each error the application can throw. ErrorDB can perhaps be like RolesDB, nodereg or CM3 or any other platform that can be used by all applications in an organization.

The application error could have following:-

  • Application Error code
  • Description of the code
  • Suggested resolution code

The ErrorDB application could provide APIs to register, set, get error codes which applications could then use to show error in somewhat better and user friendly message.

ErrorDB could also have a CLI that could be used by service engineers who are more geeky than normal users.

Having a very mature ErrorDB will allow people who are supporting the applications to understand and learn application almost instantly. There is no “blackbox” left thereafter. 

Troubleshooting an issue will be so much better. We won’t need “exorcists” to come and wave the wand to figure out the issue. In other words, diagnostics becomes very easy and simple.

It would also improve operability of the application. Developing an application that is always enigma for not only for support engineers but also many a times for the very developers that developed it. It may satisfy someone’s vanity but it leads to wasteful human cycles.
Extending the application becomes way easier when the error codes are easy, decipherable and remediated.



No comments:

Post a Comment