My goal in this paper is to point out that software development today is a labour intensive cottage industry and that our approach to developing software needs to be industrialised. Indeed, I propose it can be mechanised. I believe that code-generation from software requirements is possible for a large class of applications. Rather than model the implementation in UML, a better approach is to model requirements that are consumed by an extensible code generation platform to produce the final application.
Yes, it’s a hard problem and I don’t have the solution worked out. Rather than solve this for the general case of any application, I believe it can be solved readily for the class of business information systems that mostly read and write data to a database invoking a thin layer of business logic and process. To be sensational, writing code is stupid for this type of application. From my experience as a Microsoft Consultant, there are a great many of these kinds of systems in corporations.
For those in a hurry, in the rest of this paper I’ll be elaborating on the following key points:
- The current method of writing many enterprise applications is a mess.
- Developing a large entangled code base is the fundamental problem.
- Software development is a labour intensive industry moving offshore.
- For most business applications, software development is not an art.
- Methodologies don’t address this problem.
- Current trends just try to reduce the code base.
- Tools are emerging that allow for code generating entire applications.
- We need to model requirements rather than implementations.
- The implementation can be derived and generated by the computer.
- You don’t need objects or components.
- This may be a future paradigm shift in software development.
It’s a Mess
Software development in an enterprise can be characterised as co-ordinating a small army of generally not well skilled developers in churning out a mass of ill-specified code through a poorly managed process. I’m not jaded; it’s the truth!
Writing software is difficult but it can be done well. Unfortunately, in my experience, it’s quite rare that it’s done well. There are many reasons for this and there are whole books on the topic so I won’t dive into any kind of analysis here. I believe much of it is due to poor process, lack of skills and inadequate sponsorship by the business. A common statistic batted around the industry is that 80% of projects fail to meet their success criteria.
But still, there are even more fundamental problems involved.
The Code Base Problem and Entanglement
The approach today involves handcrafting a large code base and tightly mixing the logic needed by the application platform with the business logic. The domain logic gets entangled and obscured by the platform logic.
Eventually, the code base grows to a very large size and slowly entropies as it gets maintained and enhanced by many developers. At some point developers become afraid to make changes as it becomes next to impossible to understand how it all works. Clever developers leave and avoid working on such systems leaving only those who really value job security and technical obsolescence.
The business is either forced to repeat the process through an expensive rewrite of the system or accept the constraints on their business and slowly work around it. This is the state you will find most enterprises in.
A Cottage Sweatshop Industry under Threat
If you step back for a minute, writing software must be the worlds’ largest cottage industry. Many IT departments seem like open plan sweatshops with legions of developers mesmerised by technical complexities and working to unreasonable deadlines. My point is “Where is the industrialisation of software development?” We use computers to help us write the code but why are not computers used to write the code itself? It seems odd that we can automate other industries but we can’t automate our own. Visual Studio is fundamentally just a text editor.
As an aside, there is now a very real trend to using cheap offshore development in order to cut costs. Software development is following the exodus of other labour intensive industries to countries that can provide cheap labour. India is the prime beneficiary.
It’s Not Art
Some developers like to promote the notion of the “Art of Programming” with the implication that they are artists. I agree that you can find elegance and art in software programs but it’s an exception. Artists like to follow a nebulous artistic process. I agree ground-breaking software needs this approach.
The majority of enterprise software development is not art. They are typically CRUD information systems with a small amount of domain logic. Many developers want to find creativity and art in their development since there is joy in the craft. This is misguided and enterprise software doesn’t need it nor does a business value it.
Methodologies Don’t Help
The popular development methodologies in use are based on some iterative cycle between specification and coding. They largely vary based on the specification artefacts and duration of the cycle time.
Extreme programming is the most artistic process and is great for small teams dealing with many unknowns and learning on-the-fly. The code is the main artefact. Model-based methods (RUP et al) scale better and try to keep the design separate from the implementation. However, these projects are often highly challenged to keep the model and implementation synchronised and the model often gets abandoned. With a lot of skill and discipline, these model-based methods can work but I have not been witness to it. Most enterprises bumble along with something in-between these extremes. The process is often loosely based on a traditional waterfall method but with more iterations.
But at the end of the day, you still wind up with a massive code base of entangled logic that gradually entropies. Current methodologies won’t solve this problem; they just help you get there in a more consistent fashion.
Reducing the Code Base
There are a few trends occurring that help alleviate the pain described so far.
1) Increased Application Model Abstraction
2) Declarative Programming
3) Aspect-oriented Programming
As always, there is an evolution in software development to work at higher and higher levels of abstraction. It is a slow process when the application model must accommodate a broad range of applications. There are numerous examples of this such as going from Win32 to .NET or the rise of Java and J2EE. The next horizon is Long Horn. A very interesting extreme is the use of Executable UML for embedded systems. Raising the abstraction of the programming model is the main approach to increasing developer productivity.
Declarative programming tries to acquire behaviour by stating what is needed rather than forcing a programmer to call a procedural API. The application platform then takes responsibility for providing it. Gradually, the application platform is becoming more extensible so that domain-specific declarations can be made. The attribute class in .NET is a good example of this mechanism.
Aspect-oriented programming tries to disentangle platform logic from the domain logic at a programming language level in order to preserve the original intentions of the developer. It’s still very much a research topic but shows promise. Intentional programming is another related approach that tries to preserve the original design intentions rather than encode them into a text-based language. Neither approach is in mainstream use yet.
All these approaches essentially address the problem by expressing the solution using less code. The marketing slogan for Visual Studio .NET is “Do more; write less”.
Dump the Code Base
A more radical approach is to get rid of the code base entirely: capture requirements and generate the code instead.
This approach is being championed by a growing number of tools on the market. They allow you to generate either all or part of an application:
http://www.togethersoft.com
http://www.clientsoft.com
http://www.wildetechnologies.com
http://www.ironspeed.com
http://www.alachisoft.com
http://www.deklarit.com
http://www.compilex.com
This approach is not new. In the 1970’s it was referred to as “Automatic Programming”. Eventually this became “Computer Assisted Software Engineering” or CASE in the 1980’s. It currently goes by the name of “Model Integrated Development” (MIC) or Automated Software Engineering (http://www.ase-conference.org/).
Meanwhile, the Object Management Group (OMG) is pushing a concept called Model Driven Architecture TM which involves creating a platform independent application model and then using a model compiler to create an instance of the application for a particular platform. While this is seductive and has merit as a general approach, I think it’s a mistake to model the implementation detail in UML.
Why can’t a computer derive much of the implementation?
An Optimisation Problem
As an analogy, consider SQL queries. A SQL query essentially describes the data you want and the query engine in the database figures out how to fetch it. You don’t specify indexes, files, sectors, join algorithms or any low level details. The database derives a query plan (program) by looking at the execution costs, decides on the best approach and then runs it. We can do the same thing at a higher abstraction level.
Suspend your disbelief for the moment and assume you had a complete description of the database (e.g. Entity-Relation Model or an Object-Relational Model) and a description of all the HTML forms for the user interface. Given no middle-tier logic, I propose that it would be straight-forward to generate an optimal middle-tier for such an application. As proof, look no further than the tools I listed in the previous section.
However, the application we just generated will be too simplistic. For real applications, we’re going to need to add other models into the mix such as
- Validation or Business-Entity integrity model
- Navigation Model
- Business Process Model
But rather than modelling implementations in UML, let’s capture the requirements of the model and then extend our code generator to consume the new model and generate appropriate code. You can argue whether we’re actually capturing requirements or just creating a very abstract design model. The point is to model the highest level abstraction that we can code generate from. This cycle of model and generate becomes the primary activity of software development.
For some requirements, this model and generate paradigm might not be cost effective and it would be better to invoke binary components or external web services. The trick then is to specify the conditions under which the external service must get called so the code generator can weave in calls to it at the right points. Again the key is how to capture the requirements for invoking it.
Who Needs Objects?
Why bother with objects and components at all? The purpose of objects is to provide encapsulation and implementation reuse. With code generation, we reuse the requirement or design and not the code. It doesn’t matter if code gets duplicated and spread about because a human won’t be maintaining it. Objects and components are a human artefact for versioning and managing work among developers. They exist to help us humans deal with software complexity. Let’s use computers to deal with software complexity instead.
Object-oriented middle-tiers are generally terrible because they normalise your access to the data you want and create massive inefficiencies. For example, you may only want to access one property of an object but most systems will make you retrieve all the properties of the object in order to avoid an explosion of combinatorics. This creates a massive waste of data access and memory allocations.
Using a computer to search and calculate all the data interdependencies and optimise data access means we can get radically better performance than a hand-crafted object-oriented middle-tier. Worrying about object design is like worrying about which register gets used on a microprocessor!
Solve the Real Problem
I admit the problem is obviously hard or it would have been solved by now. How can you model software requirements so that you can derive and generate the code for it? Another way to think of it is how do you create a set of extensible requirement models and then extend a code generation platform to create your application? I don’t know but I do think it’s solvable.
There are two barriers I would like to acknowledge. Today’s application platforms are geared for humans to work against. A programming model geared for use by code-generating tools would probably look quite different. I also recognise that debugging will be difficult because of the huge semantic gap between the generated code and tracing it back to the requirements model.
A Paradigm Shift?
So why hasn’t this been done already? I don’t know; I’m still in the process of researching the problem. I believe we might be rapidly approaching an inflection point in software development where the paradigm shifts over to code generation. Some of the contributing factors include:
- Broad adoption of implementation standards around HTML, XML, SQL and a consolidation of application platforms: .NET, Java, J2EE.
- A greater emphasis on modelling applications and the rise of UML
- Massive computing power available at the desktop and better programming tools.
- Programming models are getting more declarative, more abstract and easier to code generate against.
Conclusion
So let’s restate how we got here. The current real-world state of writing enterprise applications is grossly inadequate for numerous reasons. However, it is fundamentally flawed due to the creation of large code bases in which the domain logic gets entangled and obscured by the platform logic. Current technology trends try to address this by implementing the same functionality in less code against a higher level application model.
In this paper, I propose that we need to industrialise and mechanise the software development process to a much greater degree. The process should follow the following pattern:
1) Iteratively gather requirements and evolve a way to capture them in models.
2) Extend a code generation platform to derive and generate the application code for a specific application platform.
3) Where it isn’t cost effective to model and generate, write components that get weaved into the generated code and called at the appropriate points.
I believe some of the benefits of this approach are:
- Requirements are explicitly captured and kept separate from implementation making it easier to inspect them and ascertain correctness.
- The application could be rapidly changed and regenerated.
- Application development gets more rapid as requirements models and code generators get developed.
- The application could be retargeted at a new application platform or to take advantage of new platform features more quickly than evolving a code-base.
- The application could be developed with an extreme programming like process enabling rapid customer feedback.
- Quality of code should be increased due to mechanisation and consistency.
This approach would finally bring software development into the industrial age by creating a true software factory.
Addendum: I've copied this article into my Wiki if you are interested in arguing specific points or evolving it.