I enjoy sharing the lessons I have learned over the years with my colleagues and mentoring junior programmers.
Much of my work has faced the customer and my aim is to meet their needs with working prototypes and with documentation to iteratively arrive at an acceptable if not outstanding solution.
After graduating in Computer Science and Maths from Melbourne, I started in a small software business called STR. My first task was working in a team that was rescuing a software project — we refactored a C++/MFC ETL tool that was only 6 months into development and eventually got this new product to market. The ETL tool was primarily designed to load text data into STR’s DBMS.
After a year with STR, I went on to lead a team of four in new development. The new development was commissioned by the US Department of Commerce to run analysis of the 2000 Census data. The new development affected all parts of the DBMS and most people in the company were involved in some way. My team had to develop a replacement ETL tool for STR to load data, but this time it’s purpose was to load data into the next generation DBMS and from more sources than the just the text format the original ETL tool accepted. (Today, same system is being used in the 2010 US census.)
We designed the new ETL tool around JDBC: we found that we could load from many different DBMS on both Windows and unix with JDBC; JDBC was much simpler than ODBC; and often the JDBC drivers were "wire protocol" so they worked with little configuration and with high speed. As text loads were still the most used method for loading data, we developed a JDBC driver for various text file formats.
As well as sourcing data from JDBC, we wrote a JDBC driver for STR’s new DBMS. We wrote this driver so we could both "INSERT INTO" and "SELECT FROM" so we could load data into any database from the new DBMS. The DBMS did not directly support SQL, so we wrote what we needed. In effect, we over-engineered the software but met all deadlines. When we demo’ed the new ETL tool, some prospects saw the potential to load from any DBMS to any other DBMS: the ABS wanted to use it to load data from Oracle to SAS, so the design laid the groundwork for a tool that could be sold itself as well as meeting customer requirements.
Later for STR, I travelled the world in pre- and post-sales performing ETL for potential customers in the government sector such as statistics bureaus around the world, from diverse databases. On top of complex ETL tasks, I implemented OLAP reports and ad-hoc queries that the analytical DBMS specialized in. One interesting client was the FBI: I sanitized their data and showed them some BI queries that could improve their efficiency. I used SAS, Perl, C/C++ and Python and our newly built ETL tool for the majority of my work.
After a break from work at the end of 2000, I started work at Ericsson on a load-balancing web server; that was, until the dot-com crash eventually caught up with our project. In my brief time here, I optimized an HTTP routing system by moving the routing from user-space to the kernel by writing a kernel module using the IPFilter kernel code as a framework.
Then I returned to STR for more OLAP support work. Later, I researched alternate technologies for STR’s next steps using SQL, CORBA, C++ and Python.
For the next year or two, health problems prevented me from work.
To ease me into working life again, I wrote a system for Bosch for their fuel injection testing rigs: data acquisition and analysis. Then I contracted to Intrepid Travel to audit their software development, checking the health of a development project that continued to miss deadlines. I then went to Boeing as an ETL specialist where I spent a year implementing Maximo (an OTS J2EE system).
In 2007/2008 I spent a few months contracting for MySQL, porting MySQL Cluster to Windows. ’Cluster’ is very different from what most people think ’MySQL’ is. Cluster was used by telcos before MySQL acquired the project. So the Cluster DBMS was plugged into MySQL so you can query it as though it was just another MySQL engine. But Cluster is a "several-nines" high availability DBMS and it can scale out to many ’data’ nodes. Porting this to Windows was a lengthy task. During my time with MySQL AB, I became an employee then saw the company acquired by Sun and subsequently Oracle. The last thing I did while at Oracle was to write a Windows installer for MySQL Cluster.
My last job was a short contract to First Derivatives for about a month. Here I was involved in pre-sales, preparing for a demo to a Singapore based prospect. First Derivatives use an amazing database called kdb+ and I hope I can use more of this technology in the future.
My latest home project is to make an easier-to-use lex/yacc system. Lex and yacc need two separate files (one each) but why have two files that are maintained separately when you can have one file that combines the two and eliminates double maintenance? (see ivorykite.com/ly.html)
I am writing a driver using OCI (Oracle Call Interface) in C and using the kdb+ foreign function interface to pass resultsets back to that environment.
I plan to put a preview of this driver out in the first week of December.
Cluster was born in Ericsson, and so it was born with some telco bias. Today, the large majority of subscribed customers are telcos. It’s strange how a general purpose system can end up tied to it’s original customer profile.
My task in the contracting phase of my employment was to port the Management Server to Windows. This involved improving MySQL’s common code (mysys) to handle joinable (not detached) threads on Windows and code to handle a Windows Service similarly to Unix daemons. I quoted six weeks to have the management server runnning on Windows, and at the end of the six weeks, I demo’ed the server passing all the system tests. Little did I know that ’making it work’ was a minor portion of the task.
MySQL has a rigorous review process where a developer must break down work into a series of incremental and internally consistent patches for it to be accepted and pushed up to the revision control system (bzr). After I learned this discipline, I discovered a good way to get a thorough review of my patches: make the code look a bit different. If I formatted my code “unconventionally” (eg. braces on the same line instead of a new line), people became more suspicious of the code and I got some valuable feedback.
It is important to mention that the core team of Cluster is based in Stockholm (near Ericsson). There was one other guy, Stewart, in Melbourne (or even within 6 timezones) who was also working on cluster. Stewart was an invaluable source of advice and witticisms as I learnt the MySQL way.
At the end of my contracting period, the company brought me on permanently in preparation for the acquisition by Sun.
Key points:I continued my work with MySQL Cluster on Windows after joining Sun.
A highlight was an "All Company Meeting". In 2008, we met in Riga, Latvia (which was just a ferry trip for most employees). I was able to demonstrate the whole of Cluster running on Windows and passing all the system tests. Well, actually, it passed all system tests except the ones where we knew the cause of the problem but we had not decided on a fix. This was to be the last "All MySQL Meeting" and people started to leave MySQL as they felt the MySQL community was coming apart. Even more were to leave after the aquisition by Oracle.
In this role, I also helped improve the Perl testing framework as I found limitations when running on windows. The porting process uncovered bugs in Cluster and so I was also maintaining the system.
Key points:In 2010, Oracle aquired Sun (and hence MySQL) and I worked for Oracle in the same position as at Sun, porting MySQL Cluster to Windows. The new tasks in this ongoing role were to produce a Windows Installer package (.msi) and write a GUI for the installer in C#. This required talking to the loosely called "Windows Team" and finding out about the new "bundled installer" that they planned to use for all MySQL products. In the end, they had nothing concrete I could use, so I simply made a .msi using WiX and a GUI in C# that called the .msi to install and remove files.
Key points:This seemed like a good time to leave Oracle: I had some work with First Derivatives lined up, and the task was solved.
I developed reports in SQL and Actuate and integrated systems using MEA and XML over HTTP.
The main tools I used were Oracle PL/SQL (with TOAD), shell scripts, Excel, Access, Actuate and Python.see ivorykite for an example chart.
I travelled to US frequently to support STR’s product performing ETL and OLAP at the Department of Commerce, Bureau of Census. Also, pre-sales to Canada and Poland. Post-sales to Canberra in government such as Australian Taxation Office. Here, I taught myself Perl.
I optimised STR’s analytical database server for RS2000 on AIX.
I also investigated new direction for product suite, promoting a replacement for STR’s proprietary database system.
I experienced the dot.com crash here.
I supported the product in Europe and US by loading customer data into STR’s database using Python, Java and our new ETL tool.
I helped the developers of the database server to get their code compiling on AIX’s xlC compiler. Eventually, this required close cooperation with the IBM compiler development group to debug their C++ compiler, and idebug debugger.
"Only short programs have any hope of being correct." (Arthur Whitney)I can elaborate: Arthur is saying that the shorter programs can be, the more likely they are to be correct. Many people take the attitude that simple solutions are simplistic hence they don’t look for the simple solution. Of course, some problems are hard, but it’s always possible to be more concise. I think Arthur said it better!
Here are the details of my 6 years with Python in the commercial world:
Jan 1999 — Nov 2000
I led a team of four to build a tool designed to import data
into a proprietary database engine from various data
sources. One of the requirements was that it run on AIX,
Solaris and Windows. The importer had to extract data from
the mainstream DBMSes and pull in data from peculiar text
formats.
To solve this problem, JDBC was the obvious choice as all the DBMSes provided a Java driver and most were wire-protocol which avoided the need for user configuration of ODBC, particularly on Unix. So we wrote a JDBC driver for the text format and for our proprietory DBMS. This left Python to glue everything together or, more precisely, Jython was used as the glue.
A GUI was written in Jython for the user to choose the tables and columns to be imported and specify any data cleansing actions. I wrote the engine that did all the work. This engine had to run as efficiently as possible because customers routinely had gigabytes of data. How to do this in Jython? Well, I wrote python code that produced a Java program that was compiled and executed. This provided the fastest possible implementation to move data from a JDBC source to another JDBC target.
Jun 2001 — Oct 2002
I travelled extensively supporting the application described
above. Most of the work involved using Python to extract
data from disparate sources, describing them so that the
JDBC driver that we wrote for text could provide the data to
the application.
Jan 2005 — Jan 2006
In the first half of 2005, I wrote a CORBA server in Python
using omniORBpy. This server generated SQL as directed by
the client that was sent to an ODBC connection and the
results collated.
In the second half of 2005, my brother was working at Bosch in Stuttgart and he told me about the work his laboratory was doing which sounded repetitive. I was able to automate their work by interfacing with their oscilloscopes, analysing the data to isolate ’points of interest’ and charting the result.
I developed the solutions in Python on Linux and deployed on Windows. The publication quality charts and graphs were made with matplotlib. Optimisation of the analysis process was written in C/C++ and communication with the ’scope was done via RS232 and pySerial. The GUI was developed in wxWindows.
Most communication was through email and all features were implemented within a few days of them being requested.
Jul 2006 — Aug 2006
Originally, Intrepid wanted a LAMP+Python programmer to
assist and bring a project to conclusion. On the same day I
was interviewed, the project was suspended after missing
milestone after milestone. I proposed that I be contracted
to assess the health of the development and the team and
report my findings. The report was not favourable,
primarily because they had accumulated 17MB of source code
(that’s the size of the text in four bibles) for a
straightforward forms based web application.
Aug 2006 — Aug 2007
I was contracted to Boeing to migrate data from their existing
asset management systems to Maximo, an IBM product.
My primary tool was Python and ODBC.
I studied Computer Science and Maths at University of Melbourne. I completed a third year subject in Software Engineering during which I worked in a small team and our task was to add HTML TABLE support to the Lynx web browser. We were one of only two groups who actually made it work and we got 24/25 for the project.
Then, my first job at STR (a BI/OLAP tool maker) was maintaining an ETL tool in MFC and developing a converter from their old database format to the new format. The converter I wrote in C++ and designed it using the patterns used in the containers of the STL. I developed data access routines that considered a table to be a container of rows so a table had an iterator.
I also maintained the ETL tool which required a deal of refactoring. I described working on this software as hacking through dense scrub, going in circles and finding the path that had been cut had already grown back. It wasn’t pleasant, but I learnt, through other people’s mistakes, how software can go bad, even before it was released. This led on to my role as Team Leader for the redevelopment of this tool.
As lead on the new ETL tool, I made the decision to choose JDBC for the extraction and load processes. This decision fell out of the requirement for a cross platform product: it had to run on Windows, Solaris and AIX. ODBC was too Windows-centric and Java promised the same (or significantly the same) behaviour on all the targetted OSes. Java’s performance at the time was considered acceptable, and it only got faster by the time the tool was released. Most of the time spent on the CPU was in the database core written in C++.
My team’s task was to build a basic SQL layer on top of the C++ core (which had no query language, as such). I directed and directly contributed to this work in C++. This component was the most time consuming and complex task on our path to release.
The GUI was written in MFC in Visual Studio to look best on Windows, and CORBA was chosen to mediate between the GUI and the engine. I developed this interface along with the code to glue the GUI onto the CORBA interface all in C++. Meanwhile, the ’Server Team’ (those developing the OLAP engine) were running into trouble…
The Server Team were developing using Visual Studio and porting to Solaris and AIX was a task assigned to one person. It was decided to use gcc on all Unix platforms to minimise the trouble of multiple compiler targets. While this worked well on Solaris, at the time, the port of gcc to AIX was unreliable. Each week, I would see the ’porting guy’ flounder around trying to partition source files into chunks that the compiler could manage without running out of memory, and then when it did compile, it would dump core. To add to the problem, gdb would fail while trying to isolate the bug.
I told the development manager that this needed urgent attention, and that I had some spare time as the ETL tool was running ahead of schedule. I took a hold of this problem and, after a number of conference calls with the gcc guy on AIX (who was in IBM’s employ) and with the xlC (Visual Age) compiler support team, it became clear that gcc was not going to do the job, and IBM pledged resources to get the code compiling and running on AIX with xlC. We received the latest xlC by CD, and I rapidly started raising PMR’s (IBM’s problem management request) against xlC. Initially, I had to port STLport to xlC, and a lot of the code I ported found bugs in the compiler which were promptly fixed by the team in Toronto. Once I had STLport working, I had to address the myriad of ways that the server used templates (and ways that the compiler didn’t like). Eventually, a compiler support person from IBM in Toronto came to Melbourne and he helped me isolate compiler bugs and get speedy response from his colleagues. Then, AIX’s new debugger (idebug) wouldn’t work… To cut a long story short, I ported the server to AIX and I learnt that C++ is a complicated language that needs to be treated with some respect (if not suspicion), particularly in relation to templates.
In 2001, I left STR for Ericsson and the Lodbroker project. Lodbroker was a load balancing web server. Lodbroker was a mediator that decided which backend server should receive an HTTP request. At the time I joined, they were worried about the product’s performance. I attacked this issue by looking at writing a kernel module that did the routing work in-kernel. I based my work on IPfilter and initial results showed a 2x speedup. Unfortunately, Lodbroker was wound up in May 2001.
After taking time out to renovate our house, I went back to STR to help them on a research project to evaluate the potential to use their products as an OLAP UI with Oracle or other RDBMS at the backend. I wrote this in Python, optimizing in C++ and using CORBA to glue the UI onto ODBC on a Windows platform.
I developed a tool to help Bosch analyse results from oscilloscopes. Again, I used Python with C interfacing hardware with RS232.
My work at MySQL gave me three more years of C/C++ experience. I ported MySQL Cluster to Windows and the code base was entirely C/C++.