The current trajectory of the OpenStack community runs the risk of creating a vertically siloed solution that will actually inhibit true commoditization of cloud computing. If we continue to ignore the need to create formal definitions and standards in this space, cloud computing will flounder. With the current browsers wars of today, the winner is the user of the “open web.” With the current stack wars of today, there is no clear winner, but the whole cloud compute space is losing in that innovation is hampered.
This post is a follow on to a previous post in which I sarcastically ranted about the lack of commoditization in the open source IaaS space. Surprisingly it got a good amount of attention, so I felt the need to give a more substantial argument and not a rant. You may wish to read that post to get some context.
Let me first introduce myself. I’m Darren Shepherd. I’m a Cloud Architect. I have been writing, running, and using IaaS systems for about 6 years. I absolutely love this space and am really quite passionate about IaaS. I come from an almost neutral perspective and have no favorite open source IaaS stack. To be quite honest, I really don’t like any of them.
For full disclosure I should say that I’ve actually written and run (with the help of my team) my own IaaS stack. From an outside perspective one could easily view me as just a bitter engineer who likes his widget the best. It’s really not that. I’ve had a really unique opportunity not afforded to many over the last three years with my current employer. We started with CloudStack but have since abandoned that platform. My employer put enough trust in me to allow me to go on my own and architect and build my own solution. I’ve very closely payed attention to the open source communities and have learned from their successes and failures. Since I only have one production deployment of my stack to worry about, it has allowed me to iterate really quickly on different architecture and design ideas. So I’ve ran production clouds and additionally have been given the playground of a couple million dollars in hardware to do load and scale testings. So really I’ve gotten to do quite a bit. I honestly believe these IaaS systems are really simple to build, but I’ll explain that later.
Why do I pick on OpenStack?
This is really a commentary on the entire open source IaaS space but I do single out OpenStack for a specific reason. There seems to be this growing trend in which “open cloud” has become synonymous with OpenStack. If one was to draw the “open cloud” ecosystem, many put OpenStack in the center with all things rotating around it. So quickly everybody seems to forget that OpenStack is a single implementation of X. But what is X?
As I previously mentioned, OpenStack’s original goal was to commoditize the cloud compute space. So what does that mean? Commoditization of software is a somewhat tricky thing to define in that that term really comes from the business/economy world and describes an idealistic state that is rarely obtained. Go and google commoditization and commodity and you’ll see all those definitions. When people talk about commoditization it often requires a little more than a dictionary definition and more of an interpretation of how it applies to a specific market. I hate to get hung up on semantics, so I’ll explain the way I see commoditization of software. If you don’t agree, at least you’ll understand the premise on which I’m operating.
One aspect of a commoditized piece of software is that it is a well understood and defined unit that can be obtained from multiple sources (like vendors or communities). So you say you want X and you can obtain A, B, C from difference sources and they are all basically the same in that they provide the functionality of X. So I want a web server. I can use apache, nginx, or lighttpd. They all provide the basic functionality of a web server. Obviously they still have differentiating features but if you don’t care too much and just need a web server, any will do.
Once software has been commoditized, typically the industry will shift focus “up the stack” to add more differentiated features. Those features eventually get commoditized and the cycle continues.
So cloud compute is commoditized?
Following my terse explanation the argument goes as follows. I need an IaaS system. I can use OpenStack, CloudStack, Eucalytus, OpenNebula, etc. Therefore IaaS has been successfully commoditized. Maybe you’re not interested in building your own cloud, so the argument is as follows. I need a public IaaS. I can use AWS, GCE, RackSpace, HP, etc. Therefore public cloud has been commoditized.
For many people this level of abstraction is fine and those assertions are true. Many users of IaaS systems have simple requirements. So maybe for some people or some simple use cases we’ve succeeded. But remember, as a somewhat tangential point, CloudStack, Eucalyptus, OpenNebula, AWS, and RackSpace all existed before OpenStack was conceived. So maybe we need to look for a better form of commoditization.
Cloud compute is not commoditized
You need to go one level deeper than the simple use case of running a VM. For the simple cloud user they may not care. If you work in the industry, building, extending, and running these systems, you see the vast differences that exist between the current open source stacks, and the proprietary public clouds. Frankly, many of the companies out there just specialize in trying to bridge these gaps. Since everyone is so keen on saying OpenStack is the new Linux lets draw that parallel. The current landscape of IaaS systems is more on par with saying we currently have Windows and Linux. Sure they are both operating systems but they are so different that attempts at developing solutions across the two becomes a great impediment. I know its possible, but instead imagine we say the current landscape is more like Linux, FreeBSD, and Solaris. With that batch cross platform solutions are much easier. Why? Because they are UNIX-like. So in the current landscape we can say we have many IaaS systems, but to get to a more useful level of commoditization we need to further define IaaS. To put it simply, if we are calling IaaS a Cloud Operating System, then where is my Cloud POSIX? This is truly what we lack today.
At this point you may be thinking, wow, that’s a long winded way of just saying the APIs don’t match. It goes much, much deeper than to just say we need matching APIs. Randy Bias just recently blogged on a similar topic pointing out that “APIs usually reflect the underlying architectural assumptions of a given system.” So we need to look further than APIs and also at architectural and conceptual definitions. Additionally, since IaaS is inherently a composition of many distributed technologies we need to define architectural building block which provide useful boundaries in which vendors can plug in and layer their technologies. I have plenty of ideas in this area which I will address later in this blog. At this moment I just want to say that we have not reached a useful level of commoditization. IaaS has yet to be “well defined” in a formal manner such that moving “up the stack” to produce further innovation is severely hampered.
The dangers of OpenStack’s success
I’ve always been told that I should use one of the stacks out there. Frankly, I’m quite happy with what I have and have always taken the stance that it shouldn’t really matter. In 2 to 3 years this should all be commoditized. It really shouldn’t matter what stack I run. In 2 to 3 years if there is a better one out there I’ll switch to it. Right now mine is better, faster, stronger, etc for what I need and the TCO is lower than picking up a different one. Lately the argument of why I should use a different stack (specifically OpenStack) has changed. The argument goes as follows:
“You need to switch to OpenStack because the community will continue to layer on more and more value that you can not keep up with. So even if your system to manage the basics of compute, networking, and storage is better, it does not matter because OpenStack will create many XaaS components that you will lose out on. So you need to give up your framework and use nova, cinder, quantum so that you can continue to leverage the rest of the community work like Trove (Red Dwarf) and Heat. Any complaints you have about OpenStack, you should just fix in nova, cinder, and quantum.”
When I hear this argument, the architect inside me cringes. Somewhere an architectural fairy dies every time this argument is made. It was the frustration with this argument that made me write my original rant. This argument assumes and asserts that OpenStack is a vertical silo. It basically assumes that the unit we are trying to commoditize is AWS as a whole. So you can pick up the commoditized AWS from OpenStack or Apache CloudStack. Why can I not use EC2 from CloudStack and RDS from OpenStack. Why is it an all or nothing proposition? I argue its the lack of proper formalization in architecture and standards that leads to the monolithic implementations of AWS. AWS is composed of about 25 different services. We should be creating a landscape in which I can get implementations of those services from various sources that have true architectural independence from each other.
As OpenStack continues to grow and if they continue the way they are, they are just creating a very large locked in solution. While it may be open source, it is not too far off from buying a proprietary solution from VMware. I know people will say that all components in OpenStack have their own databases and APIs and are decoupled. So theoretically you could replace any one. So you could pick up Heat and as long as you implement nova, cinder, and quantum APIs it will work. That is a very selfish and self serving approach. It is quite different than attempting to formalize an API and proper architectural layers. No other IaaS has the same delineation as nova, cinder, and quantum. For AWS, that is all EC2, for CloudStack, they really only have one API. That is a trivial argument, but the point being is that you are taking your specific view of how things should be done and expecting all to conform to that. Other implementation may not map so easily to the OpenStack API.
The entire concept of an OpenStack ecosystem reinforces the problem. As I mentioned before, there is this growing notion that Open Cloud is synonymous with OpenStack. Wouldn’t it be better if we had an Open Cloud ecosystem?
So why did I equate OpenStack to IE6? Well for one, that definitely invokes an emotional response, but additionally I believe there are some parallels. IE was well known for charging forward with their market dominance with complete disregard to standards. I don’t see how having one ubiquitous implementation that conforms to no standards can actually be a beneficial thing. I feel like we are on the path to creating the thing that got 90% market share, and then we spend the next 15 years trying to get rid of that nightmare. Everyone will equate OpenStack to Linux and say Linux is ubiquitous and therefore its not so bad. But thats not an apples to apples comparison. There are plenty of standards and abstraction layers with Linux. Linux couldn’t really have so quickly succeeded if it wasn’t for GNU, POSIX, and the history of UNIX in general. So if you are working on Linux, unless you are a kernel programmer, you’re probably doing something that is easily ported to a different operating system. Additionally Linux is just a kernel (not that that is easy to create), OpenStack represents a large composition of distributed technologies.
I feel this statement is going to get A LOT of criticism and I honestly don’t understand why more people don’t see the danger of OpenStack dominance. If you look at the browser wars of today, I don’t think anybody (except Microsoft and Google) would really like to see one browser get a 90% share of the market. The competition breeds better browsers and standards create a better web. As OpenStack continues to gain momentum it impedes the ability for others stacks to enter the market and we are left with a single implementation on which to innovate. I think this actually gives the upper hand to proprietary clouds. AWS still is light years ahead of OpenStack. GCE will become a dominate player in the market and basically catch up to AWS as fast as they choose (if thats the route they go). RackSpace is really so-so, but they make up for it with their “fanatical support.” Look at DigitalOcean as an example. They have entered the market and are succeeding really quickly. I think they have a simple proven formula and they will be quite successful. They are not on OpenStack. So basically we’ve created a landscape in which it’s difficult for an open source IaaS stack to enter the market, yet it is still quite simple for a proprietary one.
As a user of XaaS, the dangers are less. Client side libraries often provide a simple enough abstraction layer. The real danger is in the ability to layer on more XaaS services. As OpenStack layers more and more XaaS services that only work in OpenStack it just digs a deeper and deeper hole.
Everyone in the IaaS space should be concerned about this. For those companies that do not wish be in the OpenStack Foundation you will continue fight an uphill battle as OpenStack fights for a monopoly. Those companies involved with OpenStack you will see that your business will be closer and closer tied to a single entity. As a storage or networking vendor, do you really care about OpenStack itself? No, you just need a platform on which to sell your product. But as your product is inherently “cloud” oriented, you need a larger orchestration platform to take full advantage of it. Do you really want to be the Zynga to OpenStack’s Facebook? If you come from the more altruistic standpoint of making the world a better place, or from the more selfish business perspective, either way, this is not a healthy environment. I will single out IBM. Of all the companies in the OpenStack Foundation, IBM should know we are not creating a healthy ecosystem.
It’s not like we are currently at this moment in a terrible place. If we can correct course a bit, realize there is a great good than OpenStack itself, we can easily get to a healthy point. If we ignore the need to define an “open cloud,” and naively think OpenStack is it, we will certainly head down a painful path.
On to constructive criticism
This mostly ends the opinion portion of this post. I’d like to now present a couple ideas on how I think we could make the world a better place. Don’t take this as a comprehensive solution but just some random thoughts.
But standards are hard and often fail
It is often difficult to agree on a standard and there definitely have been attempts to standardize the cloud in the past. I would argue that most of the standards have either come too early, were ivory tower-ish, or too enterprise-y. I think at this point we could create a practical standard for cloud compute. If you looks at AWS, since the release of VPC there really hasn’t been any groundbreaking features. GCE has entered the market with a little different spin, but still nothing mind blowing. If you look across AWS, GCE, OpenStack, CloudStack, and OpenNebula you can find a common base set of functionality. We don’t need to tackle everything immediately, but just some simple stuff would at least be nice.
Decomposing an IaaS system
I really believe writing an IaaS system is quite simple. To understand my point, I think I should break down the system a little further. At the lowest level you have functionality that is on par with EC2 and a little bit of VPC. I’ll call this the kernel. All other functionality, like ELB, RDS, CloudFormation, etc, can be layered on top of that fundamental base. If you break the kernel down further you basically have an API, light orchestration, compute, storage, and networking. Anything above the kernel we will say is user space. I personally care the most about getting the kernel to be rock solid and well defined.
Compute, Storage, Networking
Inside the kernel of a cloud system are the fundamental units of compute, storage, and networking. Don’t think about this as nova, cinder, and quantum. Those are user level abstractions. I honestly don’t think delineating those services to the user is really productive. I feel storage and networking were broken out of nova as more of an implementation woe, and the current path of quantum has blurred what I feel to be the proper layers in the architecture. Instead think of compute, storage, and networking in terms of the driver interfaces that exist in nova, cinder, and quantum.
The technologies involved in assembling a cloud system have been around for a long time. Take AJAX as an example. The basic units of AJAX existed for awhile before the term was coined and the paradigm revolutionized the web. IaaS is similar. IaaS is just built on top of things like KVM, iptables, vhd’s, haproxy, dnsmasq, etc. When we started writing these systems a couple years back you had to piece together a lot of different technologies to form the compute, storage, and networking subsystems. So it was somewhat complicated to orchestrate these bits into a cloud system. So let say if you looked at the code in storage, maybe you had 10000 lines of code. As time has progressed vendors and other frameworks have come into the space and have started creating holistic solutions for storage and networking that fit the cloud paradigm. So when you look the 10 KLOC of storage you had before, when you integrate something like SolidFire, you realize 9 KLOC of that aren’t needed. So what is really core to the “kernel” for storage is really 1 KLOC. So these days you can pick up a storage vendor and most of the work is done by them, or for example, libvirt handles most aspects. So what is actually in the core IaaS is really not much, just some orchestration and metadata management.
The need for a Service Provider Interface for compute, networking, and storage
If you recognize this trend in which frameworks and systems external to the core IaaS are really doing the heavy lifting, you begin to see the need for a service provider interface. A SPI is just an API that something implements. So very much like a driver interface. The key thing here though is that the SPI should be REST or some other form of external interface. A python or java API is not useful because that ties it to a specific implementation or framework choice.
So one of the big things of IaaS systems is what they support. KVM, Xen, Hyper-V, VMware, Nicira, Midokura, SolidFire, etc. So CloudStack or OpenStack may tout better support for technology X. This current landscape really isn’t useful for two reasons.
One, it puts more burden on storage, networking, and compute vendors. They create their product and their native API and then they need to figure out how to write the glue code to plug into all the IaaS systems. If they weren’t careful at the time they created their native API, they may find it is not so simple. The assumptions they made may not have matched the way the internals of IaaS X works. Additionally vendors now need to be experts in OpenStack and CloudStack and less popular frameworks like OpenNebula (sorry guys) may not get the same attention because its not economically feasible. A lot of time in the community is spent just integrating technology X.
Second, this impedes the ability for other IaaS systems to enter the market. As I keep saying, its really simple to create one of the these systems. So I personally can create a system that scales much further than the current stacks, is faster, more stable, and easier to operationalize. From experience it takes me about 6 months to really create a production ready system. Here’s the rub though. Even though I can create an awesome and super flexible system, I don’t have time to integrate all the vendors. So my system will really only have a driver for Xen, because thats all I use. So it takes me 6 months to create an awesome IaaS system and then 2 years (probably an exaggeration of time) to support all the various storage, networking, and compute vendors. And that 2 years of work is time just doing glue code that is specific to my framework. So you can see all the wasted time in the community trying to integrate technology X into their IaaS.
Lets take compute for an example. A compute driver at its simplest form would have start(), stop(), and list(). Imagine we had a simple, well defined REST interface for that. I’ll ignore what the argument to the methods are at the moment. One could then implement that simple interface for Xen, KVM, Hyper-V, and VMware. So now all the IaaS systems could use that simple interface and have immediate access to multiple implementations. This is very close to what libvirt is, but I think we need a little simpler IaaS oriented interface. So if you then had the concept of “capabilities,” you could then group and define more functionality like attach/detachVolume() or attach/detachNic(). I know I’m presenting an overly simplified view of things but trust me in that I know the technical details that are required to get to a real interface.
Compute and storage are pretty easy to define an interface for. Networking is a little more complicated, but still very doable. For networking I really think the cloud kernel should really only know how to create and use L2 segments and then be able to attach to an arbitrary L3 segment. All other network services like DHCP, NAT, Firewall, Gateway, Load Balancing, etc can then be in “user space.” This deserve a blog article itself in how this can work, but believe me in the fact that I’ve already done this to a certain degree.
API and well defined model
In order for all this to work we do need a consistent kernel API and more importantly a well defined model. A model would be standardizing on certain terms, their definitions, and their relationships. So we create definitions like instance, nic, volume, network, image, etc and specifically what those are, their assumptions, and their properties. For example, the definition of network in CloudStack and OpenStack really aren’t the same and that fundamentally changes the way you can layer more network services on top. I don’t think this is that difficult at this point in time. The concept of what an IaaS is really pretty defined and fairly mature. It is just not well defined in a manner that all can agree on it. I’m aware of things like DMTF but I haven’t seen those specs as being too useful. I really think its worth wild to get the leaders from the different stacks together and start from scratch on a consistent API and model.
LifeCycle Event Handlers
In addition to an API and model, well defined lifecycle events and the concept of event handlers is very useful. When you do integrations into an enterprise you often need to integrate with many third party systems. Often these systems really only need to plug into the lifecycle of a VM or some other entitiy so that they can read the metadata from the IaaS system and then do something else in the third party system. So the ability to register blocking and non-blocking event handlers in an IaaS system makes integrations far easier. This implies the need to create well defined life cycles for all entities. These event handlers can also be used for other integration of compute and networking vendors. For example, let say you subscribe to the “attaching” phase of the IP attach function. You could then set up custom routes in your switch and then signal back that you are done once the route is applied.
And much much more
I really have a lot more to say in this realm. In my perfect idealistic world I’d love to work on defining a real architectural standard for IaaS systems and develop some reference implementations. I really want this space to be commoditized. I want it so that some caffeine crazed hacker can go and create a new XaaS or even the kernel itself in a weekend, post it on hacker news and create a splash in the market. Look at docker.io, that was really neat to see that take off. The barrier to entry in the webmvc/js market is so ridiculously low at the moment. Yet open source IaaS is getting to a point where you are almost seen as stupid to try to enter that market. I know js has the “framework of week” problem, but a lot of innovation is happening really fast regardless.
Hit me up on linkedin if you want to talk seriously about this further. Really people, lets make the world a better place.