Q&A: Philip Newcomb on MUMPS and VistA refactoring

Tools

Rarely is agency use of a particular programming language the subject of discussion at senior levels of agency leadership or during congressional hearings, but the Veterans Affairs Department's usage of MUMPS has, for better or worse, gained that distinction.

Most recently, VA announced selection of a "custodial agent" that will oversee an open source ecosystem of developers seeking, among other projects, to refactor modules of the VistA electronic health record system with modern languages.

This isn't the first time that the VA has looked into replacing MUMPS, however; it ran a pilot project in 2005 to look at the automatic conversion of MUMPS into J2EE Java.

Philip Newcomb, chief executive officer of the Kirkland, Wa.-based The Software Revolution Inc., which performed the pilot, recently wrote a case-study about it in "Information Systems Transformation," a 2010 book he co-wrote with William Ulrich, president and founder of the Tactical Strategy Group , of Soquel, Calif.

We're excerpting the case study here but we also recently talked with Newcomb about MUMPS and the pilot he helped execute.

FGIT: What are the limitations of MUMPS as a programming language, as you see them?

Newcomb: There are a number of things about MUMPs that are quite limiting, as a language for the development of major systems. It's a 30-year old language; it predates many of the types of approaches that are available in modern programming languages. In fact, it defies a lot of the conventions of modern languages, entirely.

It has no concept of data visibility, in that it treats all data as global. That is very, very bad from an information systems perspective and also from a software assurance perspective, because it results in very brittle code that has vast amounts of inter-connectivity between the various modules. By treating all data as global, it does not make use of some of the more modern approaches to the distinction between local and global variables. It also stores virtually all global data in a persistent fashion, whether it needs to be or not.

The programs tend to program using many executable statements per line of code. The code itself is almost indecipherable--it essentially is indecipherable. It looks Greek to anyone who is not a MUMPS programmer, which means that only a small handful of people are able to program in the MUMPs language. It creates a huge barrier to entry for people who are coming out of universities. Most of them have no interest whatsoever in even touching MUMPS. The quality of programmers who are willing to work with the language is relatively low. The higher quality people coming out of universities with modern understanding of modern programming concepts aren't interested in it.

As a consequence, it's maintenance costs are astronomical. TRICARE has spent over $1.1 billion maintaining roughly 2 million lines of MUMPS code since 1994. That's just the software costs. The VA costs have been equally astronomical.

We also have multiple baselines, so instead of being able to create systems that have shared components, instead we have a proliferation of silos. There's AHLTA--that's the newer name for CHCS--and it and VistA share a common heritage and have a large amount of software in common between them, yet they're maintained by completely different maintenance staffs, which are, without knowing it, duplicating a lot of effort between the two systems. The VA and TRICARE have been directed to find a means by which the information that is in the two systems can be interchanged between them so that the active duty military can move into the VHA hospital system without loss of information. The problem with that is that because of the way in which each of these two systems maintains their electronic records on the patients, they use the MUMPs database, which is a very odd kind of a database.

It doesn't have any of the characteristics of a modern DB. It's highly dynamic, allowing programmers to construct data structures of virtually any kind, on the fly. Reconciliation between those VistA and CHCS system electronic records is made extremely difficult by the degree to which the data structures that store those records have taken on many different dimensions over the years.

The goal is to have a common electronic healthcare record that is shared between both those systems, and that's a difficult thing to obtain.

FGIT: Proponents of MUMPS talk about the inherent limitations of relational databases for medical information. What do you say to that?

Newcomb: I think there's a lot of truth to that. Over time, though, there have evolved a number of new concepts in which data can be stored in relational DBs, and in particular to object-relational model and the concept of Enterprise JavaBeans, which are containers--the design pattern for the storage of data, which allows you to take on the multiplicity of forms that MUMPS programmers enjoy today.

Modern database technology, such as Coherence and Hibernate have solved these problems of the limitations of the relational DB. Intrinsically, relational DBs are not limited in their ability to represent the vast diversity of information. They are slightly less flexible, but the newer technologies that provide for object-relational mapping and Enterprise JavaBeans overcome those kinds of limitations. Along with overcoming those, we can now move into modern repositories, which provide much higher data assurance. They bring along with them very powerful processing platforms that allow these systems to persist data, store it in multiple locations simultaneously, and are indefinitely scalable, so that the scaling of the systems--especially putting them into the cloud--is vastly facilitated.

FGIT: In the chapter we're excerpting, you discuss  pilot project in which MUMPS code was automatically converted Java. You also discuss automatic capture of business rules--can you discuss how you were able to do that?

Newcomb: What we did went way beyond just capturing business rules. The concept of business rules have a lot of different kinds of definitions. We have an operative definition for business rules, which is based on the ability of our technologies in the course of the transformation of a system to extract the design of the systems. Amongst the different types of design formalisms that we support, one is a formalism called the "State transition table," which comes out of the Schlaer-Mellor object oriented analysis and design. We automatically defined the means by which a procedural presentation of software can be transformed into a product rule representation, which is used by business rule engines.

FGIT: How did the converted code work?

Newcomb: The pilot achieved operability for a module of the VA system. We didn't continue with that particular pilot, but the intention had been there. There was a change in the management at the VA and also some budget issues. The funding that we would have been  a part of a follow-on to the pilot was a very large new development budget that the VA had sought, but did not obtain. We had been put into a budget that Congress was unable to provide to VA because it got swept away. It was in 2006, and Iraq pretty much swept up everything.

We didn't put it into production; it was done as a demonstration. But, we decided we didn't want to leave this lying where it was. We decided on our own to take the approach we had and apply it to the open source version of the VistA system. We also decided that we would convert it not into Java alone, but into a higher-level language, which is easier to maintain than Java. We converted the entirety of two open source VistA systems--WorldVistA and OpenVista--into EGL, as well as Java.  

This provided a means of demonstration that the automated process could scale and accommodate millions of lines of code, which it did, and that the conversion could be worked into a modern programming language. The second conversion, into EGL, took MUMPS to even a higher level, where we had a much higher language than Java. Essentially, that high-order language would generate Java from it. So we can go more conceptually, easier-to-understand code in the course of the transformation, and generate the Java from that code. We've done that for two variants of VistA that are currently open source, and we intend to do it as soon as the custodial agent makes the VistA code available.  

We published version of both on our website. There's close to 150 GB of documentation that's currently available that anyone has free access to. It's free--no cost.

FGIT: How long do you believe it would take to fully complete a process of conversion from the MUMPS base to a modern language?

Newcomb: If the funding were available, it would be possible to complete the transformation to a fully operational state within a one year period of time. It could be done for under $10 million.

FGIT: The VA has said that MUMPS will be around for a long time...

Newcomb: They are unaware of this technology. There's musical chairs all the time within these agencies. The history and knowledge of what's come before is lost, because agencies are constantly having new senior leadership, because of the way that the government rotates its leadership. IT's a problem with all the agencies--they don't have any memories. They keep making new mistakes.

Related Articles:
Spotlight: VA awards contract for iEHR open source custodial agent
Guest Commentary: Tom Munnecke on VistA lessons learned
House Appropriations reduces VA fiscal 2012 IT budget
VA, DoD to deliver iEHR user interface by July