Tuesday, December 30, 2014

To Share or Not To Share

Sharing, with respect to software, exists on many different levels. Much has been written about the sharing of source code and this continues to be an interesting topic. However, I wish to take a look at a different level of sharing with an eye on a trend in "recent history" towards less sharing, or partial sharing, or no sharing at all. Primarily this will concern containers and more specifically the much hyped Docker framework around containers as well as the recently introduced Rocket tool-set. Thus, I will be looking at sharing on the binary level.

Lets start out by taking a ride on the way back train when the computing world was very different. Way back when sharing, at runtime, did basically not exist. Binaries were compiled code and everything was pretty much one big blob. Every application carried it's own set of standard functionality. Translated into today's terminology an Independent Software Vendor (ISV) effectively shipped an appliance. The biggest issues with this are fairly obvious. Everyone ships more or less code that they don't want to ship, i.e. standard functionality. If there is a security issue the ISV has to rebuild and ship the big ugly blob again and again and again. The ISV is responsible for much more than is desirable, from the ISVs point of view.

Now fast forward to an intermediate point from way back to today and the common use of shared libraries. Sharing at runtime allows ISV application blobs to be significantly smaller. Standard functionality is pulled in at runtime from a library that resides on the system where the application is installed. Not only does the ISV need to ship less stuff, but the ISV also has to worry a lot less about code and associated issues outside the ISVs field of expertise. The responsibility to worry about security issues in standard functionality moved from the ISV to the customer. The customer is responsible for maintaining the system. This division of responsibility is especially common for problems that have no real solution. For security issues the real solution would be to have no vulnerabilities. However, this is not possible and thus the responsibility for worrying about the security issues is divided. ISVs worry about the code in their application and the customer worries about the system where the application is installed. The introduction of sharing thus provided benefits to the ISV in that some responsibility moved to the customer. The customer gained the benefit of control, in that the customer does not need to sit around and wait for a new application release to get a security issue in standard functionality fixed. The customer also addresses issues of many applications with only one update. Overall a win-win for ISVs and customers.

One pain point introduced with the sharing is an increase in difficulty of application portability from system A to system B. Where system A and system B my have the same OS but have different patch levels. The effect may be that the application runs on system A but not on system B.

With this as the background fast forward to the IT landscape as it exists today. We now live in a world where it is likely that system A runs distribution A and system B runs distribution B, making the portability problem a bit more complicated. Additionally, probably as much code is delivered to customers that is written in a scripting language and sharing for dynamic languages takes on a different but similar problematic. For binaries a partial solution to the portability problem is symbol versioning. Symbol versioning resolves the basic underlying problem of "same name but different behavior", but obviously if a given system does not ship the needed version then the ISV is once again left holding the bag, i.e. the ISV cannot support the system that does not deliver the proper symbol. This also implies that as an ISV one has to take great care about the support matrix and picking the build system. Generally an ISV compiling on the oldest distribution customers ask for, provides binaries that work on more modern distributions as well. I am aware that I am papering over many details, but I do want to get too far away from the topic at hand, sharing. Thus, we have arrived in a world where application portability has pitfalls and complications with respect to managing dependencies. However, these issues are mostly well understood and ISVs deal with these issues on a more or less routine basis.

Enter appliances, in the form of VMs or containers, or other form factors. An appliance allows an ISV to solve the portability issue by shipping along the tested run time environment. The crux is that for appliances in a VM the ISV also ends up shipping the OS with  many moving parts. Once the ISV ships it there is an implied responsibility not just for the application in the appliance, but for everything delivered with the appliance. As previously discussed ISVs are not really interested in having this responsibility.

Containers on the other hand, at least according to the proclaimed idea of implementation and use, require an ISV to only package run time dependencies. This would be a lot less than an appliance that is delivered as a VM. However, this is still very much comparable to the pre-sharing days, where an ISV shipped a lot of code that is outside the ISV core competency.

Rightfully so container proponents claim that an application installed in a container can be moved from one container host to the next and it will work just the same. The container encapsulates all the necessary run time libraries for the application and thus is an independent unit that can be moved about. Effectively  the container replaces what used to be a big ugly binary blob provided by the ISV in the pre sharing days with a system that has the concept of sharing at runtime on the inside. For the ISV this implies that the application is still as small as in the shared world and for the customer it means that applications can be isolated from each other with relative ease and mobility.

So far so good. However, while this is very neat, easy to talk about, and looks good on paper none of the problems were really solved. As a matter of fact it is just a reversion to the problems that existed prior to using shared standard functionality on a system. If an ISV ships a container, the ISV takes implicit responsibility for the content of the container, which is beyond the ISVs core competency, the application. While the footprint of responsibility compared to a full blown VM is less for the ISV shipping a container, the footprint is still way bigger than the ISV would like. The solution would be for an ISV to deliver a container to a customer and take responsibility only for the application in the container. But this goes against human nature. When one buys a car one expects the dealer to take responsibility for the whole car and not just certain parts of it. If the dealer would come up with the proposition that the starter is made by some other manufacturer and is also shared with other brands and model and therefore the car owner has to deal with the starter manufacturer if something goes wrong with it we'd all have a hissy fit and would tell the dealer to go.....

Thus, containers, while solving the portability issue in a less resource intensive way than VMs, still suffer the same issue as VMs when it comes around to getting a portable unit delivered with an application by an ISV.

A logical step is for customers to build their own containers. Building a container is significantly less effort than building a full blown VM as only the run time dependencies for an application need to be considered. However, this creates the next problem. Each container has it's own set of runtime libraries which suit the application that is inside the container. This clearly creates a version tracking problem. With no system in place to track the content of containers and relate this back to potential security issues in each version this certainly very quickly becomes a systems management nightmare.

One solution would be to enforce that libraries that are used at runtime by applications and end up in containers all come from the same pool, i.e. the OpenSSL library that ends up in multiple containers is of the same version in all containers. This re-introduces the portability issue for applications. If application A wants OpenSSL version X and application B wants OpenSSL version Y the "same version in all containers" policy will get tested. Yes, the container itself is still portable, but from an ISV perspective nothing is gained and from the customer perspective a more complicated and difficult systems management layer is introduced.

With no introspection system available, correlation of version to security issues being of utmost importance, the best option to get at least a partial handle on version proliferation is to include an update stack in each container. With update stack I mean a stack that manages updates via the tried and true package mechanism used by all Linux distributions. This of course is a major departure from one of the primary benefits of containers, advertised as "runtime dependencies only." With the inclusion of the update stack the container just grew significantly which makes it much less likely that an ISV woudl want to take on the responsibility of distributing such a container.

Last but not least, if there exist a group of ISVs that do ship containers to deliver their application and a security vulnerability is discovered with a runtime library in the container the customer has to wait until the ISV ships a new container. Unless of course the container contains an update stack in which case the customer may have a chance to fix the vulnerability. In any event the customer can easily get into a "run vulnerable or turn a critical app off" situation.

Thus, the situation overall is not pretty. For customers using containers that are delivered by ISVs the choice may well be to either turn an important container off when a security vulnerability is disclosed or to run in a vulnerable state. That is of course if the customer is even aware of the issue. Without an update stack or other introspection system customers have almost no chance of knowing when they are exposed. For those containers built by the customer the situation is not much different due to the library proliferation problem. At least with careful selection of the library pool from which the container builder is allowed to draw the customer has some chance of knowing about vulnerabilities and getting on to of them. The choice between "run vulnerable or turn it off" is mitigated as the customer has control of the container content and the containers effected can be rebuilt with a version of a fixed library in relative short order. But, as indicated previously, this does not really solve the portability problem for ISVs. The ISV still ends up having to test with a number of incarnations of standard libraries.

There are, of course, use cases where containers shine. However, touting application portability as one of the primary use cases and proclaiming that containers can solve the application portability problem is from my point of view misleading. There are too many issues, as outlined above, that have no solution or the solution to the problems leads us right to where we are, or have been in the past. Being able to increase the application density on a machine, as opposed to using VMs, while achieving isolation of those applications is a much more compelling use case for containers, from my point of view. Yes, containers are portable, but they do not solve the application portability problem, and they create another set of systems management problems unless each container also contains an update stack, or new introspection and management capabilities are developed.

Looking at the status quo that brought us the "application portability problem" one has to conclude that we are actually not in bad shape. On a system where things are shared, a fix of one vulnerable library fixes many applications. While ISVs have an un-easy relationship with the "a library changed underneath me" concept, ISVs have come to the conclusion that it is a necessary and unavoidable mode of operation for customers. Shipping an application in a container resolves the "a library changed underneath me" situation, but creates a plethora of other issues at the customer level. I claim that these new problems are much worse. Solving the new problems with the logical choices leads us back to the "a library changed underneath me" problem for an ISV.

Containers are not the only solution proposed to the "application portability problem." The proposition of linked systems exhibits the same basic problems as outlined for containers. It is far too easy for a customer to end up in a situation where a critical system would either have to be turned off or be left running in a vulnerable state. In a linked systems approach where an application prescribes a certain tree of the linked system the customer has no option to swap the tree for a new tree with a fixed library unless the ISV provides a new application that accepts the new linked system tree that includes the fixed library. Same result, different approach to getting there.

For both cases containers and linked systems the basic problems go away if all applications are open and can be built and delivered at the speed of the disclosure of vulnerabilities. This maintenance can then be performed by customers themselves or by dedicated companies. However, the prospect that all applications a business would ever want and need are open is way of in the future, if realistic at all. Thus, any system proclaiming to solve the "application portability problem" has to take into account that the world does not fit into a neat bucket. Reality is that ISVs are not in a position to chase after every vulnerability in every library they may depend on. With applications not being open source this implies that the ideas of creating a new container or linked system in a hurry have serious issues and thus the problem is not really solved.

Sharing and versioning problems also exist at the language level. Take node.js with the npm management framework as an example. Each application pulls the versions specified by the application developer into a directory structure used by that application only. This basically creates the same management nightmare as discussed. Python and Ruby also have certain issues, although there is sharing as opposed to NPM where the is no sharing. I could reminisce about these issues, however, this would certainly detract from my primary topic which was to look at the sharing issue by focusing on containers and the much hyped Docker solution to containers.

In summary, sharing is an important concept that solves a number of problems and provides control to customers when managing issues. One can argue that on some level containers solve the application portability problem that arises with the concept of sharing. After all a container can be moved about many hosts with no problems. However, as shown, the portability problem is not really solved. For the application the same issues exist inside a container as they do on the outside. Any library used by an application has the potential of requiring an  updated/fixed version for security issues. Therefore, from the ISV perspective little changes. In cases where the application is under complete control of the container builder the basic premise of containers can work. Rebuilding the container for a security fix in an application runtime library is relatively easy and the container can then be quickly pushed around to testing, staging, and production. However, even in this case great care must be taken concerning library version proliferation and the understanding and management of the content of containers. Indiscriminately rebuilding all containers one might operate for every security fix is certainly not a feasible solution.

Thursday, April 3, 2014

LSB Reloaded

For probably 10 years or more I have been involved with the LSB  Working Group. When I initially got involved I was working on the application side of the world and in the ISV world we needed to answer the "do you support distribution X" question on a more or less frequent basis. Often the answer was "no" and sometimes the answer was "how big of a check are you willing to write?" At that time penetration of Linux into the commercial world, what we now call the "Enterprise market" was a fraction of what it is today. Enterprise distributions existed but were not as entrenched as they are today. Support of Linux for our application suite was  maybe 2 or 3 years old, at best. Wow, that's a long time ago, I am getting old..... well at least I am in good company with that problem. Anyway, from an ISV point of view the LSB at the time offered a great value proposition. Build once run on many distributions, use these tools for application checking, and there are also some build tools you can use. If practicable, this would of course drastically change our answer to the question, "do you support distribution X?" and would help our application with some customers. Thus, the idea of certifying against the LSB and simply stating that all LSB certified distributions was just the ticket. I had the support of Development management to pursue the goal. Putting the LSB tools to work on the application made the tools fall over relatively quickly. Dealing with symbols in hundreds of shared libraries with a large surface area of interfaces and analyzing dependencies and extracting the external dependencies is not necessarily something that can be considered easy. Tooling was eventually fixed and what was left was a list of interfaces that were not in the LSB. Therefore, the original idea was not tenable with the LSB release that was current at the time. This is probably in the LSB 3.0 time frame, I remember buying the LSB book that was published at the time.

Fast forward, I am still involved with the LSB Working Group, but now looking at the world from the distribution side of things. The LSB in principal still holds the same value proposition for ISVs than it did 10 years ago, and for those ISVs that use the LSB it works really well.

The world around us has changed tremendously and the LSB itself has grown significantly. The number of interfaces that are covered by LSB 5.0 is very large compared to the number of interfaces that was covered when I first got involved. At the same time the Enterprise distributions delivered by SUSE and RedHat have established themselves as the distributions to use in a commercial environment. With a large certified application catalog, which is extremely important to end customers it is difficult for other distributions to make a dent. The establishment of two primary vendors in the Enterprise world has led many ISVs to simply test on both distributions, call them supported and rarely consider other distributions. Taking a peak into the crystal ball as to why this may be the case produces some insights:
  • There is a very large number of applications that has migrated from UNIX to Linux and thus it is not surprising that ISVs brought along their "tried and true" mental model. In the UNIX world ISVs used to build for Irix, True64, HPUX, AIX, and Solaris and thus, a new world where you build on one Linux distribution and test on two is a tremendous improvement and saves a chunk of money.
  • The "tried and true" thinking stands in the way of the paradigm shift at the heart of the LSB value proposition; build on one distribution, test on an LSB reference implementation, and then support all LSB certified distributions. Although on the surface this is a money saver as well, such a change in approach makes people feel extremely uneasy. In a way this discomfort is understandable. Support costs are in general high and claiming support for a distribution that was not actually tested, only indirectly via reference platform, just sounds like a crazy idea. Having had many conversations about this topic over the last 10 years I have come to the conclusion that this is partially a case of support phobia, without looking at actual data, and partially there exists justifiable fear.
  • Even if the "test against a reference platform" idea does not make the hair on the back of ISVs necks stand up, there is still cost associated with having machines with many Linux distributions available just in case something does go wrong and bug chasing is required. Thus, some of the savings the LSB provides get eaten up by indirect costs.

I could ponder this some more but that's not actually the topic, I do want to get to the reloaded LSB soonish, and I will, I promise.

Another issue for ISVs is of course that the LSB can never have enough libraries and interfaces defined. Basically the problem that I had way back when still exists despite us adding thousands of interfaces over the years. There is always this one missing interface for ISV X.

Therefore, as we enter the endgame for LSB 5.0 the next steps were not immediately obvious. There is always the option of, "continue what you are doing and hope for the best." Quite frankly this was from my point of view the worst possible direction. Because the LSB has so many interfaces and the target, Linux distributions, is extremely fast moving it is more than difficult for the Working Group to produce a specification that is not behind when it is released. For example, LSB 5.0 contains Qt4 while both Enterprise distributions that are expected to release this year will have Qt5 available. The Enterprise distributions are of course the slow moving part of the field. One potential solution to this problem would be to find more fingers for keyboards, but the contribution model is rather tricky and lets face it, "formal standards" work is not very sexy. Basically even with a better documented and easier way to contribute we would probably still get little attention from those needed fingers. In an effort to be more responsive the LSB could certainly dump a large number of interfaces from the specification, but this would increase the "cannot certify because interface Y is missing" problem again,. Therefore, going in the wrong direction. Over the years distributions have moved closer together in the surface area that the LSB covers today. The times were a particular distro releases its own compiler that happens to be incompatible with what everyone else releases are behind us. I claim, that today for the most part distributions are closer together than they have ever been before, in the areas that matter for applications.

During the LSB working group annual face to face meeting at the Linux Foundation Collaboration Summit we took a long look at all the various angles, those describe above and a few more, and came to the conclusion that getting LSB 5.0 out the door is top priority, that of course should not be a surprise and is not news worthy. Nor would this decision in and of itself get the world's most infrequent blogger, me, to write a blog. Getting LSB 5.0 out the door ASAP will allow the current crop of distributions to certify and provide a long transition period into a world where the LSB as we know it today will no longer exist. The leading Enterprise distributions are both set to releases in 2014 and at the current pace new releases can probably be expected in 3 to 4 years. This constitutes the transition period into a world without a formal LSB specification and certification. LSB 5.0 will enter maintenance mode once the release is out the door. This implies bugs will be accepted and we'll try to get them fixed in a reasonable amount of time. Accumulation of fixes may result in releases of LSB 5.0.1, 5.0.2, and other minor releases. What we will not do is make grand plans for LSB 5.1 or 6.0 specifications. Instead the LSB working group will change it's focus. That's how we left it at the end of the day after basically spending all day on just this topic.

Going forward the Working Group wants to focus on real world problems that makes live for ISVs and system administrators difficult. A big part of Linux penetration into the commercial world is that for the most part ISVs and administrators can treat Linux as one platform, no matter who the distribution vendor is. Yes, especially on the admin side there are some differences but in the over all picture they are rather minor, and that's a good thing. We at the LSB work group would certainly like to think that with the work we have done over the years we have contributed to the current state of the art. As a work group on "neutral ground" we would like to become the place where distributions can work together to resolve cross distribution issues. In order to facilitate this conversation and problem resolution we started to create new infrastructure on Github. So far we have started by defining a work-flow, providing an explanation of what's going on and some guidelines about how we see the contribution stuff working. There is also a first "Problem Statement" that can probably use some additional polish and examples, hint hint. Things that people think the LSB work group should tackle should be filed as issues in Github. Bugs in the existing standards should still be filed in the existing Bugzilla.

There is much to be worked out of course. The LSB as we know it today has valuable parts, there are thousands of tests locked up in the current LSB certification framework that should be preserved and be made easily consumable by distributions and or upstream projects. There are valuable tools such as the app-checker and some accompanying backend database stuff that is really helpful to ISVs. We also have to see how we can "transport" ISVs that answer the "do you support distribution X" question with "use an LSB certified distro" into a post certification world. There's no clear cut answer about the meaning of a Linux platform when there is no formal specification that promises at least a certain set of interfaces, nor are we certain whether this actually matters at this point in the progression of Linux as a platform. As fingers diligently scrape away the remaining lumps on LSB 5.0 we are gearing up to step into a different role and I personally haven't been as excited about the LSB as I currently am in a few years.

So, get involved. Lets make the LSB a happening place again. Bring your cross distribution issues to the fore, add them to the Github issue tracker. If we can all keep pulling in the same direction and keep the core distribution bits close together ISVs and admins can continue to treat Linux as one platform which will grow the cake for everyone involved.