Sunday, June 16, 2013

Data Service Engineers - the new hybrid!

Until recently, at least 3-4 years ago, there were two distinct skill sets in IT market – Database Administrators (DBAs) and Service Engineers (SEs). In some companies, Service Engineers are also referred to as Service Reliability Engineers (SREs). Service Engineers are essential operations folks that have many duties including pushing code to production, bringing up services, deploying new hosts, triaging incidents, monitoring alerts dashboards etc – in short the whole 9-yards for operations.

Every company that has online operations and IT staff maintaining those online operations needs to have SEs or SREs. For the purpose of this blog, I am going to restrict to using the single term SE as opposed to use both SE and SRE.

DBAs in an IT shop in any company are responsible for managing databases. Typically these databases are relational databases like – Oracle, MySQL, MS SQL Server or DB2
.
Both the set of folks – DBAs as well as SEs – work very cohesively to form a compact team. Their duties are also largely segregated. SEs maintain the applications and front end of any application while DBAs tend to the backend databases.

However, in past few years or so (and this could be different in different companies – some are further along in this journey and maturity) – we are seeing a blurring of the lines between the two skill sets. The blurring is largely caused by the advent of all databases that are non-relational data stores. Examples of such data stores are NoSQL (Key Value Stores) – Cassandra, Dynamo DB, MongoDB; Columnar databases like SenSage, Sybase IQ, C-Store, EXASOL, MonetDB, Vertica;  flat file based distributed systems like Hadoop.

The wholesale adoption of these data stores by applications have somewhat destroyed the silos that DBAs had to maintain to keep the data integral. For example, to protect a relational database, the DBAs needed to always maintain the access controls etc very rigid. The more “black box” a relational database was, the more rigid and higher were the silo walls. For example, there are MySQL databases administered by SEs whereas Oracle or DB2 databases or for that matter, even MS SQL Server and Sybase databases are always managed by DBAs.

Since non-relational data stores (and to certain extent even in case of MySQL) are not as rigid in structures as Oracle, there was no deep knowledge of internals of the data store needed to manage these databases. So essentially, the small and nimble IT teams started looking at saving money on dedicated DBA hires and started resorting to SEs managing not only the Applications and Frontend part of the system but also the backend systems comprising of NoSQL and/or MySQL.

This has led to the mushrooming of a new breed of SEs – I refer to them often as Data SEs or DSEs. These are going to be the new breed of operations folks who will manage the Big Data stack in companies. The DSEs will not only have deep knowledge of all front end tools and technologies like apache, tomcat, Perl, PHP, Python, Github, HTML5, CSS, Jenkin and myriad other languages/tools but also be very skilled on managing and working through the technologies like Hadoop, HBase, Hive, STORM, Spark, Shark, Cassandra, MongoDB, MySQL.


As more and more applications move to adopt MySQL/NoSQL data stores in favor of hardcore and very expensive relational stores like Oracle, the lines between DBAs and SEs will vaporize and emergence of Data SEs become an inevitable reality.