Building an Application Platform: Goals and Strategies

Organizations in the business of delivering software products can often benefit from developing an application platform.  An application platform has the potential to enable significant reuse of business logic across multiple products, it can provide an innovation sandbox for both internal and external developer communities, and it can enable the rapid addition of new features into existing products.  However, without a clear definition of the business goals, the right architecture strategy and an effective organizational structure, efforts to create an application platform may result in little more than a technology transition, and can negatively impact software performance or diminish interoperability.

The software industry does not have a common concept of an application platform, so there is no consensus interpretation on which to rely, and application platforms are often confused with other reuse concepts, such as software frameworks and toolkits. There are a number of software “infrastructure” companies that market their products as “enterprise application platforms”, but when an infrastructure company creates a platform, they are typically providing a combination of frameworks, toolkits and run-time environments — Oracle and RedHat are examples.  However, when an application company, such as Facebook or Amazon, creates an application platform, they are usually concerned with enabling reuse of common business logic across their application portfolio, such as Facebook’s social networking applications and Amazon’s shopping applications.

This article addresses the platform concerns that are relevant to an application company.  The topics discussed below are not intended to provide a definitive recipe for creating an application platform, but rather to articulate common challenges that organizations are likely to encounter, and architecture strategies and organizational tradeoffs they may choose to consider.

The sections below outline technical efforts that a platform “organization” may need to undertake to achieve success.  For the purposes of this article, a platform organization may be anything from an entire company, wishing to develop an enterprise-wide platform, down to a single domain team, seeking to improve their productivity.  The strategies discussed below must be tailored for the business goals and scope of each organization.

Application Platform Goals

“You’ve got to be very careful if you don’t know where you’re going, because you might not get there.” –Yogi Berra

There isn’t a single motivation for creating an application platform, so different architecture strategies may have varying appeal, depending on the business goals.  Two common motivations for an application platform are described below, but even within these two categories, different value may be placed on different objectives.  Before making significant architecture decisions and investments, organizations would be well served by spending time characterizing their interests in the application platform concept, and the organizational tradeoffs that may be necessary to achieve their goals.

Building an External Developer Community

Some businesses, like Twitter, Facebook and Amazon, are able to bring additional users to their applications by creating a platform for external developers, which is typically utilized through an externally facing API.  These businesses benefit by enabling external developers to find innovative ways to bring their products to new user communities.  The external API may be a subset of the internally facing API, but for reasons such as performance or security, it may be entirely separate.

For this purpose, an application platform is a means to expose business logic in ways that enable unique composition models to promote innovation. Businesses that are primarily interested in developing a platform to support external developers will likely place a high value on architecture strategies that enable interoperability with disparate client programming languages, platforms and technologies.  There will also likely be requirements to prevent abuse of the external API through security, rate limiting and administrative control mechanisms.

Improving Internal Productivity

For any business that exists within a competitive marketplace, survival is inextricably linked to the ability to bring new products to market quickly, or to rapidly add new features into existing products.  In keeping with lessons learned from the Mythical Man Month, “men and months are not interchangeable,” so development schedules cannot be infinitely compressed by stacking more engineers into cubes, and even if that were possible, it wouldn’t be cost effective.  Businesses also face tough choices about where to invest their limited resources to maximize business discriminators, so they cannot further constrain those resources by developing the same features in multiple products.  Instead, businesses must build products faster, with fewer engineers, and with less redundancy in feature development.

For this purpose, an application platform is intended to expose business logic for internal reuse.  When the objectives are internally focused, organizations may have more flexibility in selecting architecture strategies that enable more extensive reuse.  Under the previous scenario, the organization owns the single business logic implementation and exposes access to external communities.  Under this scenario, the reuse strategy could include re-hosting the common implementation itself into multiple applications using a standard component framework.  Issues such as interoperability across languages and technologies may not be as much of a concern because of the ability to mitigate technology diversity within the organization, although flexibility in making technology decisions will still likely be valued.

When the business motivations are primarily concerned with productivity and time-to-market, organizations typically have additional technical objectives, such as developing repeatable design patterns and creating composable presentation widget libraries, but these are in addition to, or in support of, the primary objective of reusing business logic.

Although the architecture quality attributes may be valued differently, the common theme for both of these business motivations is to reuse business logic.  The choice of architecture strategies is largely dependent upon whether the goal is to expose business logic for external developer communities, over which businesses have little control, or to facilitate reuse of business logic for internal developers, which may open up a broader set of options.

Scoping the Application Platform Strategy

If you know neither the enemy nor yourself, you will succumb in every battle.Sun Tzu

The software industry is riddled with overloaded terms – just ask any “Architect” to define their role and the diversity of opinions will become obvious.  Similarly, there are widely varying interpretations of the application platform concept, but any organization attempting to create an application platform should at least define what it means to them, and they should know how they measure its efficacy against their business goals.  The business goals should be articulated in terms of the value to the business (e.g. 10% reduction in development days per feature), and the technical scope of the application platform should be defined so that the relationship to the business goals is clear.  To ensure that application platform efforts remain focused, metrics and collection methods should be designed to track the progress of the platform strategies towards achieving the business goals.

The section above identified the reuse of business logic as a key objective of an application platform.  The sections below describe the technical scope of several activities that are commonly associated with application platform strategies to support the reuse of business logic and to facilitate productivity improvements.  There are many other technical issues that organizations must address to enable enterprise software development, such as security protocols, system management and logging.  However, these topics are not unique to application platform efforts, so they are not addressed here.  This is also not an article on the merits of different SOA methodologies (e.g. REST vs. SOAP) and industry standards, which are also subjects that organizations must address when they pursue a SOA strategy.

Figure 1. (Click to enlarge) A notional illustration of applications composed of reusable business logic from an application platform.

Defining the Architecture

Most organizations choose to pursue an application platform as a strategy for exposing and reusing business logic, which is based on the premise that many of a business’ products can be built by reusing the implementation of common application features (e.g. user registration, geo-filtering, map integration, etc). Figure 1 notionally illustrates this concept, where applications are integrated by reusing business logic deployed as services, which are logically exposed as part of the application platform.  The figure includes (simplistic) depictions of internally-developed applications composed of presentation logic and reusable business logic, and 3rd party applications utilizing the exposed business logic, which are outside of the organization’s control.

An application platform is unlikely to be successful without a good understanding of the basic feature building blocks that constitute the business’ products, and an understanding of how features need to vary across products (commonly known as variation points).  This is typically the purview of Product Managers (aka Systems Engineers), and is often a deficiency in many organizations because of the unique combination of technical and social skills and market awareness required to be effective in these roles.  If Product Managers cannot identify significant commonality across products, and the variation points within the common Use Cases and User Stories, then software architects are unlikely to be able to design an effective application platform.  It is also worth noting that, although understanding the platform requirements is an essential step, it is equally important that the information be effectively communicated to engineering teams.  Organizations that want to benefit from an application platform must invest in a product management team that cannot only distill the right business information, but also has the tools and skills to communicate that information to their architecture and engineering counterparts.

Once requirements are understood and architecture quality attributes (extensibility, security, availability, usability, etc.) are evaluated, architects can decompose the software architecture into its constituent elements – e.g. services, components, databases, middleware, etc. – and define the supporting deployment configuration.  The importance of this process in achieving the application platform goals cannot be overstated.  The decomposition of the architecture, along with the definition of the interfaces, establishes the model for reuse. Therefore it is essential that organizations creating an application platform invest in an architecture staff that has experience employing architecture strategies that enable reuse and that satisfy the business’ quality priorities.  Architects must also have the ability to set expectations with their product management counterparts for the types of requirements needed to address extensibility aspects of the architecture design.

Frameworks

Software frameworks provide their own reuse model, but the reuse is of generic code, not business logic, which is inherently specialized to a business domain.  In addition to providing for reuse of generic code, frameworks often provide a development and deployment model for building component-based business logic on top of the generic code provided by the framework.  So, not only do frameworks improve productivity, within the business tier of a software architecture they also provide the foundation for component-level reuse.  If an organization chooses to adopt a component-level reuse strategy, they will almost certainly need to adopt a common (3rd party) framework technology.

Even if the goal is not to reuse business logic at the component level, organizations creating an application platform would be well served by including an effort to “templatize” development of certain layers of the software architecture.  For example, an organization may want to utilize a presentation framework to allow developers to reliably and repeatedly create standard web pages with the same look and feel and with consistent behaviors and performance.   Similarly, there is significant productivity value in creating conventions for developing business logic (e.g. components and services) to increase the efficiency of software engineers and to raise the quality of the software products through standardization.  Technology frameworks, tooling and established design conventions can all support these productivity and quality goals, so an organization may want to consider them to be part of an application platform strategy.

Business Logic Frameworks – Architecture strategies, like service orientation, require significant commitment to languages, standards and technologies to implement solutions for object marshaling, exposing endpoints, lifecycle management and cross-cutting concerns (e.g. security, logging).  In addition to service orientation concerns, component-level reuse of business logic is only possible if organizations adopt a common component technology (e.g. Spring Beans, Enterprise Java Beans, .NET).  Most component-based business logic frameworks now support extensions for Web Services standards and other service orientation implementation concerns, so the two architecture strategies can typically be addressed with a single framework technology if so desired.  Once a framework technology is selected, many organizations will need to perform some in-house tailoring to integrate cross-cutting concerns (e.g. security, logging) to accommodate the business’ operational constraints.

Presentation Frameworks – Although not a business logic concern, organizations can enable teams to rapidly create user experience products by employing presentation technology frameworks (e.g. SEAM, Spring Web Flow), and by creating reusable presentation widget libraries built on top of those frameworks.

Integration

Different Use Cases will dictate different integration patterns between software architecture elements, such as asynchronous messaging (e.g. Publish-Subscribe, Point-to-Point) or synchronous RPC-type interactions.  To eliminate reuse barriers, organizations adopting an application platform will likely want to identify specific solution templates for specific integration patterns as part of a platform execution strategy.  For example, an organization might standardize around JMS as the messaging API, select a specific broker implementation that meets performance requirements, establish a broker deployment configuration that meets quality of service (Qos) requirements and create design conventions for dividing message traffic into queues and topics around QoS and performance constraints.  In addition to providing a common integration substrate for reuse purposes, standardization efforts such as these provide an architecture and development template that all engineering teams can follow, which reduces development time and improves software quality across teams.

In addition to selecting integration patterns and the supporting standards and technologies, most organizations choose to standardize some of the semantics and conventions of the service interfaces in their platform.  As an example, consider the consequences if each of Twitter’s publicly exposed REST resources used different representations for user ids, screen names and status updates.  Their platform API would probably be the subject of derision due to the additional burden placed on the developer community to reconcile the inconsistencies.  The same would be true if their APIs randomly switched between REST with JSON payload formats and SOAP with XML formats.  It is not in Twitter’s business interests to allow such diversity to emerge in the name of agility.  Successful platform efforts will include some standardization of the interface object models and design conventions to reduce the burden placed on service consumers.

Organizations may choose to dictate integration technologies and prescriptive patterns across the board, or they may choose to allow teams autonomy within their own applications or domains and only mandate enterprise level (i.e. cross-domain or cross-technology) integration solutions.  However, as more technical diversity arises within teams and products, some reuse opportunities may be lost because of the integration challenges.  The significance of this risk will be unique to each organization.

Orchestration

To ensure that development teams are achieving reuse objectives, organizations may want to standardize an approach to orchestrating business processes.  Separating the implementation of business process flows from business logic is necessary to preserve composability and to maximize reuse of business logic across multiple flows.  When workflow orchestration is intertwined with business logic, specific Use Cases become hard coded into the business logic, making it difficult to compose and reuse services for new Use Cases.  There is also value in keeping complex orchestration out of the presentation tier so that different application development teams can reuse common flows, which means that orchestration services (aka application services) are typically deployed at the top of the architecture’s business logic tier.   Orchestration services are sometimes known as application services because application teams use them directly to integrate complex business processes into applications.

Summary

Some organizations will want to carefully control the introduction of languages, standards, technologies and design conventions into the development and deployment environment to facilitate reuse as part of an application platform strategy, while others may prefer to promote creativity and innovation by letting a thousand flowers bloom.  Every organization has to decide the level of oversight they will exert based on their objectives and values.  However, it should be acknowledged that significant heterogeneity in an organization’s technology stacks can cause interoperability and compatibility challenges, making reuse difficult.  It can also make it difficult to migrate engineering staff between projects, and it can create an unmanageable number of dependencies on the lifecycles of 3rd-party products and SDKs.  If heterogeneous framework products need to coexist on a single deployment platform, each new version upgrade of each product can have consequences on the other products, which can limit an organization’s flexibility and their ability to rapidly evolve their systems.  So, if left unmanaged, the proliferation of disparate technologies can have the opposite effect from what decentralized organizations intended.  If decentralized technical decision-making is important to an organization, then the software architectures and deployment configurations must be designed to minimize dependencies and potential conflicts between products developed by autonomous teams.

Allowing small teams to move rapidly by decentralizing technical decision-making is an appealing organizational model.  However, organizations often find that, rather than innovating, these autonomous teams spend considerable time solving the same technical challenges already addressed by other teams (or other companies), and they can be susceptible to a “not invented here” mentality, preferring to build rather than reuse.  Enabling rapid product development often requires a more nuanced organizational solution, including some prescriptive technologies and design patterns, an effort to disseminate knowledge about reuse opportunities across engineering teams and a streamlined process to enable deviation for truly unique technical challenges.   Most of all, it requires architects and engineering managers who understand the value of applying repeatable design patterns and common industry solutions as part of a broader platform strategy.

Platform Oversight

“A camel is a horse designed by a committee.” —Unknown

As mentioned above, many software organizations choose to decentralize their technical decision-making process so that teams close to the product development activities have the freedom to innovate and move rapidly to make choices that are best suited for the circumstances on the ground.  This decentralized organizational model can pose some challenges to developing a platform, which is inherently horizontal in that it cuts across multiple products.  Some of the challenges follow below.

  • When a decentralized organization attempts to create a horizontal solution, the technical decision-making process can degenerate into a consensus-driven democracy across product teams, which is a poor model for creating high quality, innovative software products on aggressive schedules.
  • Vertical product teams can be prone to parochialism when they become responsible for building software that is intended to be reusable beyond the products for which they are responsible, because their incentives have historically been driven by product successes, not platform successes.
  • The success of a platform is highly dependent upon the correct decomposition and allocation of requirements to the specific layers and software elements of the architecture.  In some decentralized models, there may be no independent architect to evaluate requirements and business goals to decompose the software into its constituent parts.  If all architects are decentralized to the product teams, then everyone in the organization already represents some particular organizational interests, so the architecture may become biased to reflect the org structure rather than business goals.

The significance of these challenges largely depends on the reach of the platform efforts.  In some organizations, collections of related products could be readily aggregated under a few major domain-centric teams, allowing most or all of the platform efforts to be localized to each team.  In effect, each team would be building its own platform supporting its own products, perhaps with a few services exposed to the broader enterprise.  Under these circumstances, since most of the reuse objectives are targeted at the products managed by each team, there may be no need for a broader organizational solution.

In cases where there is a need to reconcile platform efforts across multiple decentralized teams, one possible approach is to divide teams differently, where some teams are focused on creating applications and other teams are focused on the “platform as an internal product.”  Teams that are focused on building true end-user applications would create the user experience (i.e. presentation), create business logic that is unique to the application (i.e. not part of the platform) and integrate platform business logic (e.g. services, components, etc) into the application.  Teams responsible for the platform would be disassociated from any specific application ownership and instead work to satisfy requirements that are applicable to multiple application teams, as well as employing architecture strategies to enable reuse and rapid extensibility.  It is worth noting that, although this approach would likely produce better results for the platform, it does diminish the appeal of decentralized teams, which are intended to be self contained with minimal external dependencies.

Regardless of the organizational strategy, it is important for businesses to be mindful of Conway’s Law, which is roughly interpreted to mean that a software product will reflect the organization that created it in the way the architecture is modularized and in the quality of the interfaces.  With that in mind, if creating an application platform is a high priority, it is probably better to organize teams around the desired architecture structure rather than hope the desired architecture will emerge from the existing organizational structure.

Org structures are the subject of holy wars, and the pros and cons of platform organizational strategies is a topic worthy of a separate article, but a couple of conclusions are abundantly clear.  First, any org structure that creates ambiguity about roles and responsibilities, and creates confusion around who has authority to make decisions, is a bad model.  Second, every org structure model needs to be tailored for the people that fill its ranks.  It is tempting to try to copy an organizational model from a highly successful company, but there are differences in the talent pools available to different companies.  Not all companies can be staffed like Google.  Such companies have access to a large pool of top talent, so it is difficult to distinguish the role played by the organizational model versus the benefits of “individual heroism” that come with having an abundance of experienced, highly-qualified staff.   When organizations have less access to top talent, they may achieve better results from centralizing some critical technical decisions and providing more top-down guidance to implementation teams, possibly at the expense of some agility.  Achieving agility at the expense of quality is a self-defeating recipe for becoming overwhelmed by technical debt.

Governance

SOA Governance, an extension of traditional architecture and IT governance, is a frequent topic of discussion in the SOA literature.  Design-time governance provides a mechanism to ensure consistency in the decomposition of the architecture into its constituent elements (services, components, databases, etc) and oversight of the design and evolution of contracts and interfaces.  Run-time governance provides a mechanism for ensuring compliance with enterprise policies and SLAs (security, performance, etc) as new services are deployed into an operational environment, or as new service consumers are added.

Organizations working to enable the reuse of business logic across products and domains will have to institute some degree of architecture governance to ensure that architecture elements are designed and evolved to meet the needs of multiple product teams.  This typically requires some prescriptive architecture decisions with respect to the topics discussed above (e.g. layering, control flow models, message exchange patterns, standards and technologies).  Organizations that fail to govern the mapping of requirements and other architecture quality attributes to a coherent set of services often end up with a proliferation of fine-grained services, each providing only a parochial interpretation of a narrow set of requirements, which defeats the purpose of service orientation for broad reuse.

A prerequisite of architecture governance is an agreement on the common platform Use Cases and the way the implementation will be exposed to consumers for reuse.  For example, if a business wants to develop a shopping cart service for reuse across multiple products, each product team must accept the common shopping cart workflow and the separation of features into architecture layers and elements.  Although it sounds easy, this can be a bitter pill to swallow for teams wanting to retain full autonomy over the interpretation of their requirements.

Inherent within the governance topic is the need for a discovery solution.  Large, distributed, and potentially decentralized, organizations need a way to communicate a great deal of information about the software asset inventory and its lifecycle so that teams understand what software assets are available for reuse, to retrieve contract and interface specifications and to acquire the specific service endpoints for integration with the consumer’s application.  Organizations whose primary motivation is internal reuse will have to decide how much effort they will invest in creating and maintaining enough documentation to enable completely self-service reuse, versus just enough public documentation to put internal application teams in touch with the internal service provider.  The former approach is always the ideal, but the latter approach is often the default solution because of the burden of maintaining documentation.

Automated tools can be used to satisfy many of the policy compliance governance concerns, but for much of design governance, there is no substitute for some degree of architecture oversight.  An organization building an application platform will have to make decisions about their governance structure with their eyes open about the relationship between governance and the application platform goals.  If related applications are grouped together under a single domain or product team, then it may be sufficient to structure governance within each of these product groups rather than centralizing governance across all teams.  This could strike a reasonable balance between reuse and decentralization, where most reuse is expected to be achieved within each decentralized domain team.

Application platforms come with an inherent dependency management risk, where some applications may be slower to market because the development schedule for a required platform service is held up for another, slower-moving application with poorly defined requirements.  This dependency risk is significant, but it can be managed with incremental releases and careful governance to ensure backwards compatibility.  Regardless of the organizational strategy used to develop the application platform, it should include specific governance mechanisms to address dependency management.

Metrics

A cottage industry revolves around the notion of software process improvement through measurement.  Without measurement, process efforts have a tendency to take on a life of their own, detached from any business value.  The same can often be said about technology efforts.  Service orientation initiatives frequently result in a new architecture with a great many services, but no new business benefits.  To avoid this fate, the business goals must be clear and universally understood, and the enabling technical decisions and tradeoffs should be evaluated based on the degree to which they achieve the business goals.

There is considerable artistry involved in taking a high-level business goal, like reducing the time to market, and decomposing it into measurable objectives, such as increasing the number of SLOC written (or features produced) per development day and reducing the testing and deployment time.  Once those objectives are defined, they must be measured, tracked over time and used to tailor the application platform approach.  Measurements for objectives like engineering productivity are never perfect.  There is always some variance across teams and projects due to differences in the technical scope of the work or the skills of the staff, so it is important that metrics be carefully collected and analyzed over a number of projects to avoid making decisions based on anomalies.

Figure 2. (A) illustrates a common reuse and scalability approach for service-oriented architectures, where services are deployed independent of a specific application. (B) illustrates the reuse of components across applications, which are then deployed and scaled in their entirety.

Platform Architecture Strategies

“When one has finished building one’s house, one suddenly realizes that in the process one has learned something that one really needed to know in the worst way – before one began.” —Friedrich Nietzsche

Modularity, encapsulation and decoupling are key architecture tactics for developing reusable, extensible software and increasing developer productivity.  If business logic is appropriately modularized, application developers can compose it in unique ways to build new products or to enable the extension of existing products.  If business logic is modularized too coarsely, then some business functions will be unreachable and incapable of being used or modified independently (failure to separate concerns).  If business logic is modularized too finely, the programming model can become overly complex or performance may suffer because of the verbosity of the software interfaces.

Encapsulation is, in part, a mechanism for hiding complexity from consumers of reusable business logic, which is critical for improving developer productivity.  Architects strive to hide the complexity of a module behind its public interface so that consumers are only required to understand how to use a software module, but they are not required to understand how it is implemented.

Decoupling is a means of isolating changes in a system.  For large systems, introducing reusable business logic can lead to a significant number of dependencies.  If multiple applications depend on one reusable software module, any changes to that module can drive volatility into every consuming application.  However, there are architecture strategies and design patterns that can help decouple application consumers from the business logic implementation, which, in many cases, can enable the reusable business logic to evolve at a different rate than the consuming application.

The sections below discuss some of the key aspects of modularity, encapsulation and decoupling with respect to the roles of service orientation and component architectures in creating a reuse strategy for an application platform

Services and Components

Service orientation has received a lot of attention over the last few years, but its purpose is often misunderstood and the architecture concepts are frequently applied too broadly.  When organizations discover service orientation, many embrace it as their single architecture strategy for reusable software, only to be disillusioned when goals are not achieved.    Service orientation provides specific tactics for modularization, encapsulation and decoupling, and it can be an effective architecture strategy for achieving reuse of business logic, but it should not be viewed as a replacement for traditional component architecture strategies.

There are many differing opinions about the distinctions between services and components, such as coarse versus fine-grained interfaces.  However, most of these characteristics do not do much to eliminate ambiguity, since, for example, there is nothing to preclude a coarse-grained component or a fine-grained service (other than good sense!).  The most distinct differences have to do with the application developer’s perspective relative to components and services and how they are used at development-time and run-time.

Components are tangible units of code developed using a language-specific framework (e.g. Spring Beans, Enterprise Java Beans, .NET), and they are typically compiled and integrated into an application as a native binary library or file.  Component frameworks provide a convention for structuring a component’s interface, and they provide standardized solutions for common enterprise requirements (e.g. persistence, security, management, etc).  A consumer of a component typically must be aware of the component framework technology, and the consuming application is usually written in the same language as the component.  Therefore a component entails a development model.  Alternatively, services are system interface boundaries, but their underlying implementation and deployment architecture is hidden from the consumer by the contract and interface, so a service is strictly a run-time model – it’s there at runtime when the consumer hits its endpoint, but its physical location is irrelevant to the consumer.  A developer has to care whether a component is implemented in Java or C# and has to understand the component’s deployment relationship to the application.  Conversely, developers do not have to be concerned with a service’s implementation technology or its deployment relationship.  It is invoked through an interface over a network.

Since services are invoked over a network, their interfaces must be designed to exchange “chunky” messages to avoid a “chatty” network interaction.  Those chunky messages typically violate object-oriented best practices related to cohesion, which is a tradeoff made to mitigate performance penalties. The same tradeoff would not be made for a locally bound component, which would instead be designed with a cohesive object model exchanged across the intra-process interface boundary.

The line between services and components can still be blurred, especially with remote component technologies (e.g. Enterprise Java Beans with RMI), which can be used to achieve some degree of decoupling between the component’s consumer and the component deployment architecture.  In the sections below, various characteristics and use cases are explored to help distinguish the appropriate role of services and components in an application platform architecture strategy.  Figure 2 illustrates two opposite ends of an architecture spectrum, where an application platform is implemented with a service orientation strategy (a), and implemented with a component architecture strategy (b).  With the service-oriented platform approach, a single implementation of the business logic is exposed for reuse by applications, and services are deployed and scaled independently from any specific application.  With the component-based approach, copies of components are reused within applications, but then entire applications are integrated and deployed together, and they are scaled through load balancing at the application level.

Interoperability

Service orientation can be an effective strategy for supporting a wide variety of different consumers using different languages and platforms. To eliminate platform and language dependencies, services are deployed independently from their consumers and invoked over a network using technology-agnostic protocols and formats.  Web Services standards have formed around HTTP, XML and JSON to meet this interoperability need.  Components, on the other hand, are language-specific and they are invoked through native language interfaces.  Cross-language component interface technologies, like CORBA and JNI, have not been widely adopted because of a variety of limitations.

There are limits to what interoperability can be achieved with Web Services technologies, which are most suitable for synchronous request-response or request-acknowledge integration patterns.  Although there are Web Services standards efforts to accommodate event-driven patterns (WS-Eventing, WS-Notification), the standards landscape is fractured in this area, and none are widely adopted.  Other technologies, such as the Java Messaging Service (JMS), are often necessary to support asynchronous, event-driven requirements (inside an enterprise) and the corresponding quality of service constraints.  However, using these technologies often comes at the expense of interoperability and the additional complexity of bridging disparate, language-specific solutions.

Change Isolation

One of the most significant differences between components and services is the deployment model.  In most scenarios, consumers include a binary copy of a reusable component in their own application deployment, which tightly couples the consumer to a specific version of the component implementation.  If the component provider changes the implementation, but does not change the interface (e.g. bug fix), the component must still be redeployed to consumers, who must repackage and redeploy their own applications.  Conversely, service consumers have no dependency on the service deployment, which is a characteristic often referred to as location transparency.  If a service implementation changes without affecting the interface, consumers should be unaware of the change, which means it is incumbent upon the architect to make design choices that minimize the changes that must be propagated through the service interface to the consumers.  Careful separation of a service’s interface from its implementation is a key tenet of service oriented design.

There are component architectures that support remote binding, such as Enterprise Java Beans, which can provide location transparency.  However, such component architectures require a significant, enterprise-wide commitment to a set of framework technologies (e.g. JEE App Server, JNDI) and a programming model. Historically, remote binding in component architectures has not been very successful, in part because remote invocation requires explicit thinking about the component’s interface granularity and handling of the inherent unreliability of distributed systems. Since component designers are defining interfaces in a native language, they generally create their components with local binding in mind, which means they create chatty interfaces that don’t perform well when invoked remotely. Most architects choose to scale component-based architectures by bundling complete applications together with locally bound components (Figure 2b), and then they load balance by replicating instances of the applications across multiple hardware nodes, as opposed to the alternative of deploying remote pools of components and integrating applications through remote component binding.

Because of the decoupled deployment model, service orientation can be an effective architecture strategy when a business places a high value on change isolation.  This decoupling enables the evolution of implementation technologies and lifecycle strategies in different parts of a system without driving volatility into the consumers.  However, architects often fail to achieve this objective because they make service design choices that do not emphasize decoupling, such as decisions that tie a service’s internal domain model to its external interface, which couples the clients to the internal design of the services and unnecessarily propagates changes that should be isolated.  These types of decoupling abstractions do have an impact on performance (e.g. object transformations), and although often negligible if implemented carefully, an organization that places more value on extreme performance optimization (beware of the Cult of Performance!) over change isolation should probably focus on modularity and reuse through locally bound component architectures rather than through service orientation.

Business Complexity

Aside from the decoupling and interoperability considerations discussed above, services can be used effectively to expose complex business functionality across business domain boundaries, or in cases where a business wants to enable the orchestration of complex business processes from services in multiple domains.  Under these circumstances, the complexity and scope of the business logic and the orchestration process would make it difficult to deploy components that could be compiled into consuming applications.  In short, the model depicted in Figure 2(b) is difficult to sustain when business logic becomes large, complex and spans domains.

Figure 3.  Services and components are both part of the application platform.

Figure 3. Services and components are both part of the application platform.

Summary – Components vs. Service Orientation

Service orientation and component architectures are not mutually exclusive reuse strategies.  They will likely coexist within the same architecture, and, at a minimum, services are often implemented with components, providing an opportunity for additional reuse if the organization has standardized around a component framework.  Service orientation is a valuable architecture strategy to achieve interoperability when consumers use heterogeneous technologies.  It can also provide a mechanism to isolate the impact of change from consumers, and to abstract away the complexity of business processes.  However, service orientation comes at the expense of some latency across the interface boundary and some additional performance load on the service implementation.  This is in part due to the remote deployment, in part due to the technology agnostic messaging used to achieve interoperability and the marshaling required to serialize the interface object model, and, to a small degree, due to the layering used in the service design to provide the decoupling of the interface and the implementation.  It should be clear at this point that a service is not simply a remote component wrapped with HTTP and XML.  That line of thinking is a recipe for failure.  Architects must create a comprehensive service-oriented architecture decomposition that minimizes the impact of the performance penalties while maximizing the reuse benefits.

Aside from situations where organizations are highly concerned with optimizing performance, there are still many cases where modularity is desired, but the inherent chattiness of the interaction across the interface boundary lends itself to a locally bound, fine-grained component interface, such as within the development of a service implementation.  In short, that means it is unwise to attempt to expose all potentially reusable business logic as services.  Service-orientation alone only provides for reuse of major business functions that are amenable to coarse-grained service interfaces with limited back-and-forth interaction across the network boundary. If the organization’s Use Case analysis reveals that significant component-level reuse could also be achieved, then such a strategy may be worth the effort of establishing a more homogeneous technology environment to enable component sharing across products or domains.  In general, there should be many more components in a system architecture than there are services, but most of the components will be used within service implementations (Figure 3).  Based on a business’ particular Use Cases and goals, the reuse strategy may be strictly at the service level, strictly at the component level (Figure 2b), or a combination of both.

Other Architecture Considerations

In addition to the way business logic is structured to achieve the platform goals, there are other efforts that can be made to improve reuse and increase productivity through consistency in software design, by considering the need for extensibility up front and by recognizing the impact of design complexity on development productivity.

Common Representations

A domain model represents an organization’s common business lexicon, and it heavily influences software design and implementation (Object & Data Models).  Domain models provide the foundation for consistency in an organization’s software structures, making software easier to read and understand, which increases developer productivity.  Investing in the creation of a domain model provides the basis for common software implementations, but it cannot be adequately defined without a good understanding of product Use Cases through a partnership with product management teams.

Common Object Models – Much has been written about the utility of structuring software so that it reflects the business domain.  Creating common object model representations of domain entities and relationships (e.g. user, address, common measurement units, etc) allows developers to reuse object models across applications and can improve performance in some cases by eliminating run-time transformations.  Significant commonality in software structures across business logic implementations makes the developer experience more consistent across projects, which can improve productivity.

Common Data – It can be difficult to reuse business logic across applications if the underlying business data is fractured across different data models and databases.  Unifying business logic into common services or components will almost certainly entail significant alignment of data models.

Although efforts to develop common implementations of common domain entities can have significant productivity benefits, it is also worth noting that organizations can take this too far by overloading common implementations with many orthogonal application concerns.  This often manifests as bloated database table structures or object models with poor cohesion, which can result in tight coupling that pushes the impact of software changes to an unnecessarily wide audience.  There is no blanket rule that can substitute for skillful software design.

Extensibility

Part of the art of designing software is to anticipate the different dimensions of potential future change, which are often referred to as variation points.  These are the areas where development teams may need the ability to tailor features or behaviors for off-nominal application workflows, regional differences or future scope growth.  Architects enable rapid extensibility through strategies for modularity and decoupling that build variation points into the software architecture.  Extensibility considerations play a significant role in determining the decomposition of a software architecture into its constituent services and components.

Simplicity

In the pursuit of elegance, architects occasionally lose sight of the cost that design complexity has on the development, test and deployment processes.  If an architecture strategy or design pattern brings only minimal benefit at the cost of significant complexity, then it is probably not a good solution.  In all that architects do to realize an application platform, making the job of developing software easier should be at the forefront of their minds, and finding ways to abstract away complexity should be a driving goal.  In design patterns, tooling, documentation, etc., simplicity should be a key objective.

Conclusions and Recommendations

“Wise men don’t need advice. Fools won’t take it.” —Benjamin Franklin

The previous sections provided an overview of topics related to application platform business motivations, reviewed the potential technical scope of an organization’s platform strategy and discussed architecture approaches for creating reusable business logic and improving productivity.  The following is a distilled set of recommendations for organizations that want to create an application platform.

  • Businesses must clearly articulate their goals along with the metrics they’ll use to evaluate the success of the application platform in achieving those goals.  The goals should be expressed in terms of adding business value, not technology achievements.  The metrics measure the efficacy of the platform technology achievements in providing value to the business.
  • The desired architecture outcome is more likely to be achieved if the organizational model is biased toward the architecture.  However, any potential organizational change must be evaluated with respect to the the business’ overall priorities and the available talent pool.
  • Invest in creating a Product Management organization capable of defining platform Use Cases and variation points.  Ensure that the staff has the skills needed to communicate to engineers in a parlance they understand.
  • Invest in software engineering personnel with the skills needed to correctly employ service orientation and component architectures to enable broad reuse and rapid extensibility without sacrificing other architecture quality attributes.
  • Determine the level in the organization where common technical approaches are important to achieving the platform goals, and then enable developer productivity and promote reuse at that level through prescriptive selection of languages, framework technologies and industry standards.
  • Define repeatable design conventions for common integration and orchestration requirements.
  • Invest in experienced architects and engineering managers who can promote the reuse of successful technical solutions across the organization, and streamline a process to empower fast-moving teams to innovate when necessary to solve unique problems.

References

Service Design Patterns: Fundamental Design Solutions for SOAP/WSDL and RESTful Web Services

Applied SOA: Service-Oriented Architecture and Design Strategies

Why Services Are Not Components, and Vice Versa

Services vs. Components

The Rise and Fall of CORBA

Decomposing a Service Oriented Architecture

One of the questions I’m most frequently asked is how to go about decomposing a system into services to develop a service-oriented architecture (SOA).  To understand how to divide a system into specific services, we first need to think about what we are trying to achieve with our software architecture.  In most large development projects, experience has taught us that we rarely ever fully understand the problem we are designing for at the outset of the project.  So, first and foremost, we want to design for change, because we’ll need to adapt our software design over the life of the project. Much of the software industry has embraced Agile development processes because they enable software teams to quickly adapt to their evolving understanding of requirements and design objectives.  Similarly, we have found it beneficial to embrace architecture practices that enable flexibility throughout the design lifecycle, which is a major motivation for SOAs.

SOA design is in many ways an extension of object-oriented (OO) design, so we can apply much of what we already know to SOA design.  Let’s start with the concepts of coupling and cohesion.

Coupling

It is commonly accepted in the software engineering community that loosely-coupled classes make for better OO designs, because, in general, loose coupling can minimize the scope of changes that must be made to components within the system over time and can enable component reuse. Over the past several years, new schools of thought have emerged that expand on this concept to separate some aspects of a software application’s control flow from the application’s business logic. This Inversion of Control concept (aka Dependency Injection) has been embodied in the Java language (though annotations) and is supported by some application component frameworks (e.g. Spring and EJB containers). The same loose-coupling concept should be applied to services, and one key way to limit coupling is to separate business process workflows from business logic. Services should provide business logic, but we don’t want to bake workflows into the service logic because to do so would reduce the flexibility of the architecture. We want the option of changing the way business process workflows unfold, we want to promote reuse of business logic across multiple business processes and we want to enable our system to expand to meet new business demands, all without the need to significantly change the services.  Extracting the workflow logic into a separate “orchestration” layer enables us to do just that.

Figure 1. An example of a business process implementation where the workflow orchestration is embedded within the service logic.

Figure 1 illustrates the idea of embedding workflow orchestration into services.  Service A is the entry point for the start of a business process — assume it is invoked by a client application — and Service A then calls Services B and C in order, blocking for each response.  Service C then represents a subordinate business process, which calls and blocks for other services (D & E).  The point to make from this simple diagram is that the business logic becomes intertwined with workflow if we don’t take steps to explicitly separate it.  Service A has explicit knowledge of Services B and C, and its business logic is wrapped around invocations of those services, which locks that business logic into a specific business process.  We can do better by extracting the business process orchestration into a layer above the business logic.

Figure 2. An example of a business process implementation where the workflow is orchestrated by separate, specialized services.

Figure 2 illustrates an alternative approach, where we utilize separate orchestration services to create the workflow across the services, which are design to behave independently.  I’ve omitted any notion of ordering from the diagram to avoid comparison with the original workflow in Figure 1.  When workflow is omitted from the services, we must design them to operate independently around a cohesive set of business functions, which could change the invocation strategy altogether to avoid the need to manage state across transaction boundaries.  So, although there would certainly be an invocation order for the architecture illustrated in Figure 2, it would not necessarily look like the original workflow in Figure 1, and the important point is that the services should have no idea of the order in which they are invoked or of the other services in the business process.

Scripting grammars such as BPEL and BPMN, and supporting orchestration engines (e.g. ODE, Maestro), are commonly used to develop orchestration services such as those illustrated in Figure 2.

Cohesion

Just like we desire high cohesion in OO designs, we also want it in the design of services.  Each service should encapsulate a collection of strongly-related functionality, which will promote modularity, enable reuse and reduce coupling with other services.  Before we talk about how to group functions into services, it is useful to talk a bit about the different types of services that we can expect to develop in an enterprise system.  My favorite conceptual reference on this subject is the book Applied SOA [Rosen, et al.].  The authors describe a variety of different services, but the most significant are illustrated in Figure 3 below.

Figure 3. Different classes of services have different levels of interface granularity.

The most significant services — and probably the most challenging to interpret — are the Business and Domain services.  The authors of Applied SOA define these service types in terms of granularity and accessibility.  Business services are the coarsest (encompassing the most scope) and have the broadest accessibility (enterprise wide), followed by Domain services, which are medium-grained and are limited in accessibility to support Business services inside the enterprise.

To understand the role of a Business service, it is illustrative to examine the public service interfaces of prominent systems, such as the Twitter REST API.  Twitter’s REST API constitutes the system’s publicly-exposed (enterprise) service interfaces, categorized into their major business functional areas (e.g. Tweets, Timelines, Users, Local Trends, Friends & Followers, etc), so these interfaces can be considered examples of Business services, which invoke Domain and Utility services that are not publicly exposed.  For example, the basic service for updating a user’s status (aka posting a tweet) involves functions such as user validation, geolocation, URI shortening, linking references to other users and hashtags, conversation threading, etc.  Some of these functions are satisfied by Domain and Utility services that are not publicly accessible.

The scope of Twitter’s Business services do not limit the range of potential business processes.  The independent and functional nature of Twitter’s public interfaces enables external applications to retrieve and correlate data using multiple interfaces in many different ways, which may not be possible if the services were conflated with business process logic.  The multitude of third-party Twitter clients and data analysis tools is a testament to the value of decoupling business logic from business processes.

Utility services and Integration services are more straightforward to understand.  Utility services in our Twitter example might be used to perform simple, reusable functions, like sorting a list of tweets or returning a human-readable city name from a lat/long.  Integration services are used to integrate legacy applications and generally provide a mediation facility between the interface technologies used in the new system and the interfaces of the legacy application.

Business Process Definition

So, how do we go about decomposing our system into a loosely-coupled, cohesive set of Business and Doman services?  It is important to understand that we really have two goals.  The first is to develop an understanding of our system’s business processes so that we can understand the behaviors and workflows the software must support.  Understanding the business processes is what will allow architects to identify cohesive sets of behaviors that should be aggregated into individual services.  However, our second goal is to avoid “over fitting” our architecture to our known set of business processes.  We want to continuously think about the ways in which our system might need to be extended to meet evolving business objectives.  Remember what I said at the beginning of this article — we rarely ever understand the full requirements for the end product at the outset of the project, so we need to design for change.

The first step in decomposing our services is to understand our known business processes, and there are established Business Process Modeling techniques that can help, such as Use Case Analysis.  Whether you call them System Engineers, Business Analysts or Product Managers, there is likely a role within your organization with responsibility for defining business processes.  Architects are key stakeholders in the business process development activity, because identifying and modeling business processes allows architects to identify cohesive sets of behaviors and to define entry points (i.e. Business services) into those behaviors to support business process orchestration.

Domain Modeling — described in detail in Eric Evan’s excellent book, Domain-Driven Design —  is another useful analysis technique for defining service scope.  Domain Modeling can be used successfully at different levels of abstraction. At the lowest level, Domain Modeling can be used by software engineers to define an object-oriented class structure that represents the entities and relationships in the business domains, and it can be used to inform the creation of a data model (e.g. XML messages, database tables) representing those same entities and relationships.  At a more abstract level, architects can use Domain Modeling to define a system’s major conceptual entities (e.g. Customer, Account, Invoice, etc) and understand their behavioral lifecycle through the various business processes, which can help to identify cohesive sets of domain entities and their behaviors that we will encapsulate within domain services.

Related Topics

There are a few other related topics that are worth briefly mentioning.

Data Encapsulation — In my experience, it is not uncommon for software engineers to want to share data between services through a common database.  In my view, this is a bad idea and conflicts with our loose coupling goal.  Services should encapsulate their data, and access to that data should only happen through the service interface.  Software engineers will argue that this can lead to redundant data being stored in the system.  To me, redundant data should probably lead us to conclude that either the services are decomposed incorrectly, perhaps by misunderstanding the business process or domain model, or the services should not have been decomposed at all because the behaviors spread across the two services that need to share data were so cohesive that they should have been kept together.

REST vs. SOAP — Much has been written about the relative virtues of RESTful web services and SOAP-based (WSDL) web services, so I won’t repeat all of that here.  However, in my experience, SOAP-based services are more complex to develop than RESTful services.  The creation of WSDLs, designing complex XML schemas, working with a web service stack (e.g. Metro) and integrating the language-specific object-marshaling technologies (e.g. JAXB) can be challenging.  There are simply more moving parts with SOAP-based services than with RESTful services.  So, in my opinion, it is better to default to RESTful services unless there is a clear and compelling need for some of the advanced capabilities offered through the web service specifications.  However, it is worth verifying that any given WS-* specification is supported in a variety of cross-language web service stacks (many are not), or you could be trading portability for capability.

I have also come to prefer RESTful web services because I feel that they abstract away more of the software architecture details from the consumers.  SOAP-based services are often described as RPC services, because consumers reference specific methods on specific services through the WSDL.   With RESTful services, architects design interfaces around the concept of a resource and use standard HTTP methods to manage operations on those resources, which can fit together nicely with Domain Modeling to help identify and organize resources.

Conclusions

SOA is a set of architecture concepts, not a set of enabling technologies.  It is entirely possible to adopt the technologies without realizing any of the SOA benefits (beware of systems comprised primarily of legacy software that has been SOA-ized with XML wrappers).  Above all, the way we go about decomposing our system into services is motivated by a desire to instill extensibility into our software architecture and to enable reuse so that we can reduce the cost of adapting to changing business needs.  We achieve these SOA concepts by investing in developing an understanding of a system’s business processes to identify cohesive behaviors, by enforcing a separation of business process workflows from business logic and by driving our development teams to think about and design software in terms of the system’s Domain Model.

Building a Glassfish Monitoring Client Using REST

The GlassFish 3.1 Application Server provides a REST web service interface that enables service consumers to access administrative and monitoring features.  Although GlassFish’s web-based admin console is generally regarded as one of the better app server consoles, in my experience with previous versions of GlassFish, there are some cases where the web console can be difficult to use.  For example, during clustered server failover scenarios, the web console will freeze until the network connection times out, which can take a minute or two.   To allow more control over the client-side response to server admin events, I’ve recently started a side project to develop a client-side Java API to the GlassFish REST admin interface.  The API is decoupled from a user interface and is intended to be utilized by a UI developer, but could also be used by any consumer application that needs GlassFish server status.

The scope of the GlassFish REST interface can be explored through the WADL file available from the GlassFish server.  There are two WADL files, one for the Management resources and another for the Monitoring resources.  The URIs for the WADL files follow below.

Management REST resources: http://serverURL:4848/management/application.wadl
Monitoring REST resourceshttp://serverURL:4848/monitoring/application.wadl

Using the URIs above, just point your browser to your GlassFish installation and the WADL files will be returned.  The Management WADL file is large, so it may take a while to load.  For a detailed description of the WADL schema, check out the W3C specification.

Despite what the names of the REST interfaces might lead you to believe, I’ve developed all of the features of the monitoring API library using the Management REST resources.  The Monitoring resources seem to be undeveloped, unless I am not interpreting the sparse WADL file correctly.  Either way, the Management resources provide all of the necessary capability.  Listing 1 below illustrates a small portion of the Management WADL file.

Listing 1: Partial WADL file detailing the available Management REST resources.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<application xmlns="http://research.sun.com/wadl/2006/10">
    <doc xmlns:jersey="http://jersey.dev.java.net/" jersey:generatedBy="Jersey: 1.5 01/14/2011 12:36 PM"/>
    <resources base="http://localhost:4848/management/">

<!-- snip snip -->

      <resource path="clusters/">

<!-- snip snip -->

          <resource path="list-clusters/">
              <method id="processGet" name="GET">
                  <response>
                      <representation mediaType="text/html;qs=2"/>
                      <representation mediaType="application/json"/>
                      <representation mediaType="application/xml"/>
                      <representation mediaType="application/x-www-form-urlencoded"/>
                  </response>
              </method>
              <method id="options" name="OPTIONS">
                  <response>
                      <representation mediaType="text/html;qs=2"/>
                      <representation mediaType="application/json"/>
                      <representation mediaType="application/xml"/>
                  </response>
              </method>
           </resource>
      </resource>

<!-- snip snip -->

  </resources>
</application>

As you can see from Listing 1, the list-clusters resource is accessed through the HTTP GET method at the URI http://localhost:4848/management/domain/clusters/list-clusters and supports four different response types.  I have used the Jersey REST client-side API for all of the REST calls in the GlassFish Monitoring API library, which conveniently abstracts away the details of forming valid HTTP calls.

Before I begin to describe the GlassFish Monitoring API software, let me be clear that this is a very early prototype, largely intended to flesh out the concepts.  I’ve started small with code to monitor clusters and their member instances, and the status data is simply written to the log — no GUI yet.  It is reasonable to conclude that there will be some restructuring of the software as I tie the API into a user interface and expand the scope to monitor more aspects of the GlassFish server.

Figure 1 below illustrates the top-level package structure for the GlassFish Monitoring API.  The client package contains a sample client that starts up the monitoring and status threads (more on that below).  As you might expect, the cluster and instance packages contain the classes for monitoring clusters and their instances.  The status package contains interfaces that are implemented by classes representing a a status monitor, a server status message and a DOM parser for XML status data.  The util package contains a few odds and ends, primarily for initializing pretty-printing XML responses from the server and to manage a configuration properties map that is built from an XML config file.

Figure 1. Top-level package structure for GlassFish Monitor Client API.

Figure 1 also depicts two interfaces — Subject and Observer.  As the names imply, these interfaces are implemented by classes that utilize the GoF Observer pattern, which is central to the organization of this API.

Figure 2 illustrates the class implementation (within the cluster package) of the ClusterListMonitor (subject) and ClusterListStatusObserver (observer).  Status monitoring and status reporting are separated into two different threads.  Monitoring classes, which implement both the StatusMonitor and Subject interfaces, run in their own thread and are responsible for polling the GlassFish server for the status of a specific item of interest (e.g. clusters).  Status observer classes implement the Observer interface, and they also run in their own thread.  Status observer classes are responsible for providing a response to the status reported by the monitor.  Responses could include updating a graphical user interface, writing to a log file or executing some predefined rule.

Figure 2. Class implementation to monitor and report cluster status data.

I had some concern about the overhead of using two threads for each unique monitored item, but ultimately thought the benefits justified the overhead.  One of those benefits is the ability to establish a wait time for the status observer thread, which decouples it from the blocking status request of the monitoring thread.  If the status monitoring thread is blocking for an unresponsive server, the status observer thread can abort the wait for a status update and take some alternative action, such as update a user interface component for a monitored item to denote the “unresponsive” status.  The ability to have this type of control over blocking status requests was one of my primary motives for developing this API.  With previous versions of GlassFish, one could only watch impatiently as the admin web console froze until the network connection to a failed server timed out.  I expect the monitoring and observer threads to manage fairly coarse-grained server items, such as the set of all clusters monitored by the classes in Figure 2, so I don’t envision more than a dozen or so unique monitored items, which would keep the number of threads acceptably low.  The instance monitor and observer classes, in the instance package, are structured similarly to the cluster classes in Figure 2.

Figure 3 depicts the interaction between the Monitor and Observer threads described above.

Figure 3. The monitor (subject) and observer run in separate threads.

The thread loop (run method) and the REST operation (queryStatusOfClusters method) for the ClusterListMonitor class are depicted in Listing 2 below.  The run method is simple — query the GlassFish server for status and notify the observers, then sleep for the user-specified time.  The REST operation is performed using the Jersey API to construct the GET operation and specify the response type, then the operation is executed on the GlassFish server and the XML response is passed back to be parsed.  See the full listing for all of the gory details.

Listing 2: Some salient portions of the ClusterListMonitor class.

//snip snip

public class ClusterListMonitor implements StatusMonitor, Subject, Runnable {

//snip snip

public void run() {
   while(true) {
      //query the GF rest interface for the cluster status
      clusterStatusList = this.queryGFStatus();
      //notify the observers of the new status
      notifyObservers(clusterStatusList);
      //sleeping for the user-specified polling rate
      try {
         Thread.sleep(pollingRate);
      } catch(InterruptedException ie) {
          logger.log(Level.INFO, "Cluster Monitor Thread Problem: {0}", new Object[]{ie});
      }
   }
}

//snip snip

/**
 * Poll the server for status and parse the XML response
 *
 * @return The status reported by the GF server
 */
public ArrayList<Status> queryGFStatus() {
    ArrayList<Status> statusList = null;

    if(restClient != null) {
        String response = queryStatusOfClusters(restClient);
        if(response != null)
          statusList = parseClusterStatusResponse(response);
    }

    return statusList;
}

/**
 * Performs the GET operation on http://baseURL/management/domain/clusters/list-clusters
 *
 * @param client The Jersey rest client
 *
 * @return The XML text
 */
private String queryStatusOfClusters(Client client) {

   //Construct the resource and perform the GET operation
   WebResource webResource = client.resource(restURL);
   ClientResponse response = webResource.accept("application/xml").get(ClientResponse.class);

   int status = response.getStatus();
   logger.log(Level.FINEST, "list-clusters Status =  {0}", new Object[]{status});

   String textEntity = response.getEntity(String.class);
   if(textEntity != null) {
      logger.log(Level.FINEST, "list-clusters Response =  {0}", new Object[]{textEntity});
      StatusUtilities.writeResponseToFile(textEntity, "cluster-status.xml");
   }
   else
      logger.log(Level.FINEST, "list-clusters Response =  No Clusters");

   return textEntity;
}

//snip snip

}

Listing 3 below illustrates a sample config.xml file, which allows the user to configure the base URL for the GlassFish REST interface, the status polling rate and the observer wait time. The config.xml file resides at the root of the JAR file.

Listing 3: XML config file allows users to specify configuration properities.

<?xml version="1.0" encoding="UTF-8"?>

<config>
  <baseURL>http://localhost:4848</baseURL>
  <pollingrate>5000</pollingrate>
  <responsewaittime>10000</responsewaittime>
</config>

Once I’ve expanded the scope of the API library, I expect to provide individual configuration properties for each monitored item.  So, for example, the polling rate could be set differently for each monitor thread.

Conclusion

As I mentioned above, this is an early prototype of the API software, and there are still several design concepts that I’m thinking through, but I wanted to get it out there so I could return to my primary project — so caveat emptor.  I’ll continue to provide updates to this library over time, but for now it will be moving to the back burner.  Feel free to contact me if you have any questions.  Have fun!

Resources

  • The source code for the GlassFish Monitoring API library is available at GitHub.  The source is released under the Apache License, Version 2.0.
  • The GlassFish Monitoring API library JavaDocs (still a bit rough).
  • The GlassFish REST Admin API documentation.
  • Here and here are a couple of blog entries that provide some introductory information about the GlassFish REST Admin API.

EJB, JPA and SimpleDB in “The Cloud”

As part of my personal project, I’ve been working on some basic plumbing web services to support data access.  The first of these two web services — the Registration Service — allows a user to authorize my application to retrieve protected resources on behalf of the user.  The second of the services — the Resource Manager Service — simply uses the access granted by the user to retrieve a user’s protected resource.  Together, these two services provide the initial foundation I’ll be using to enable my data mining software to access a user’s social networking data and mine for patterns.  Initially, I’ve developed these services to work with Twitter — for no other reason other than I was more familiar Twitter’s REST API — but I will soon add support for Facebook and LinkedIn.

Each of the services follows the same, simple design approach. The RESTful resource is exposed using Sun’s Jersey API through a servlet adaptor, and the business logic is provided by a backing stateless EJB.  The EJB’s business logic for both services interacts with Twitter’s REST API (using Scribe) to exercise the oauth workflow and interacts with Amazon’s SimpleDB REST API to persist and retrieve user credential data in Amazon’s cloud.

Registration Service

The sequence diagram below illustrates the basic control flow of the Registration Service.

Figure 1. Registration Service Control Flow

 

Step 1: The interaction starts when a client accesses the service with a Restful web service request (GET): http://base-url/register?screenname=twitterScreenName

Step 2:  The Registration Service utilizes the Scribe OAuth library to get a request token from Twitter, using the keys that have been previously establish for the requesting Twitter client.

Steps 3 & 4: Once the request token has been retrieved from Twitter, it is associated with the Twitter screen name in a user credential object and stored in Amazon’s SimpleDB.

Steps 5 & 6:  The Twitter authorization url is created using the oauth request token, which is then returned to the invoking client (out-of-band method).

Steps 7, 8 & 9: The user must independently (hence the out-of-band) use the authorization url to retrieve the oauth verification token from Twitter, which is then provided back to the Registration Service using the following RESTful invocation (POST):  http://base-url/register/settoken?screenname=twitterScreenName&token=verificationToken

Listing 1 below illustrates the client’s entry point into the Registration Service, which is a resource class hosted in the web container with a servlet adaptor. For this service, the resource class exposes two methods via RESTful web service interfaces. The getAuthorizationURL method provides the entry point for the user to begin the registration process, and the setVerificationToken method allows the user to provide the verification token back to the service as part of the OAuth process.

Listing 1 (org.engineeringnotebook.registrationservice.resource.RegistrationResource):

//snip snip
@Produces("text/plain")
@Path("/register")
public class RegistrationResource {

//snip snip

    @GET
    @Produces("text/plain")
    public String getAuthorizationURL(@QueryParam("screenname") String twitterScreenName) {
       logger.log(Level.FINE, "Registration resource: getAuthorizationURL");
       String result = null;
       if(twitterScreenName != null){
          connectEJB();
          result = registrationServiceEJB.requestRegistration(twitterScreenName);
       }
       return result;
    }

//snip snip

    @POST
    @Consumes("text/plain")   
    @Path("/settoken")
    public Response setVerificationToken(@QueryParam("screenname") String twitterScreenName, @QueryParam("token") String verificationToken) {
        logger.log(Level.FINE, "Registration resource: setVerificationToken");
        
        URI uri =  uriInfo.getAbsolutePath();
         
        if(twitterScreenName != null){
            connectEJB();
            registrationServiceEJB.setVerificationToken(twitterScreenName, verificationToken);
        }  
        //need to figure out how to set the response if input parms are invalid
        return Response.created(uri).build();
    } 
}

Line 3 is an annotation enabled by Jersey and defines the base path for the service. Line 8 associates the HTTP GET operation with the specific method, and line 9 defines the content MIME type that can be consumed by the method. The @QueryParam annotation on line 10 associates a query parameter from the URL string with a specific method parameter. Line 15 invokes the backing EJB, which provides the business logic. I prefer to cleanly separate the web service interface from the business logic, and I feel that hosting the interface in the web container separate from the business logic is a good way to enforce that pattern.

JPA and SimpleDB

As part of this project, I wanted to utilize the Java Persistence API (JPA) to manage data in Amazon’s SimpleDB, since JPA is the standard Java mechanism for object-relational mapping of POJOs.  SimpleJPA is the only JPA implementation for Amazon’s SimpleDB that I could find with any significant user base, so that’s what I’ve used in my two services to persist the user credential POJOs.  There were a few challenges that I had to overcome with SimpleJPA.  The first challenge was my own ignorance about how JPA works with respect to needing corresponding getter and setter methods in the entity classes to persist and instantiate objects. I burned a few cycles trying to figure out why my objects weren’t being instantiated only to find out I was missing some corresponding setter methods. Don’t do that! :)

The biggest challenge I had with SimpleJPA was overcoming the way in which it finds entities to manage. According to the JPA2 spec, one should be able to define the persistence unit and specify the fully qualified entity class names in the persistence.xml file, which would then be picked up by the JPA implementation at run-time.  However, in all of my tests, SimpleJPA seemed to ignore everything specified in this file.  In Listing 2 below you can see where I identified the UserCredential class on line 5 in my persistence.xml file, which had no affect on SimpleJPA.

Listing 2 (persistence.xml):




com.spaceprogram.simplejpa.PersistenceProviderImpl
    org.engineeringnotebook.snrdm.entity.UserCredential
  

So, instead of relying on the persistence.xml file to define the persistence unit, I had to utilize SimpleJPA’s EntityManagerFactory’s constructor to specify the entity classes.  With this approach, SimpleJPA requires the absolute path for the entity classes, so I wrote some code to construct the absolute path, which you can find here.  For the record, I feel this is a kludge when writing code for a JEE container-managed environment.  It is generally considered bad form to reach outside of the container’s sandbox — one should instead rely on the relative paths internal to the JAR file — but after trying every permutation I could conceive of to get SimpleJPA to search relative paths, the absolute path approach was the only option that worked.  I question the portability of this method, and I’ve only tested in Glassfish 3.1, so other app servers may complain about the absolute path references.  If anyone has found a more EJB-friendly way to use SimpleJPA, please share your insights.

Listing 3 illustrates the SimpleJPA setup process. Lines 12 and 13 rely on properties that have been injected from the EJB descriptor file (ejb-jar.xml) to provide SimpleJPA with the keys to access an Amazon account. You’ll have to set those up for yourself and add the values to the descriptor file. Line 9 defines the list of entity classes that we want SimpleJPA to manage, and line 15 creates the absolute path for those classes, which is passed into the SimpleJPA’s EntityManagerFactoryImpl constructor along with the keys and the name of my persistence unit.

Listing 3 (from org.engineeringnotebook.registrationservice.ejb.RegistrationServiceBeanImpl):

   /**
     * Sets up the simplejpa entity manager
     */
    private void initializeSimpleJPA() {

        List classList = new ArrayList();
        ClassPathBuilder cpBuilder = new ClassPathBuilder();

        classList.add(UserCredential.class);

        Map props = new HashMap();
        props.put("accessKey",amazonAccessKeyValue);
        props.put("secretKey",amazonSecretKeyValue);

        Set libPaths = cpBuilder.getScanPaths(classList);

        factory = new EntityManagerFactoryImpl("RGSPersistenceUnit", props, libPaths); 

    }

Resource Manager Service

The Resource Manager Service is very simple, and the basic structure of the code is the same as the Registration Service. The Resource Manager Service allows an application to retrieve a protected resource once a user has authorized the application using the Registration Service. Caveat Emptor!! You would never want to expose this service to the world. Anyone invoking this service would have access to a user’s protected resources. This service is intended to be used within the security environment of the application the user has authorized.

The sequence diagram below illustrates the control flow of the Resource Manager Service.

Figure 2. Resource Manager Service Control Flow

Step 1: The interaction starts when a client accesses the service with a RESTful web service request (GET): http://base-url/retrieve?screenname=twitterScreenName&twitterurl=protectedResource

Steps 2 & 3: The service retrieves the user credential from Amazon’s SimpleDB.

Steps 4, 5 & 6: The services uses the Scribe OAuth library to sign the request and retrieve the protected resource from Twitter.

Step 7: The service returns the protected resource to the client.

Conclusion

That’s it! I’ll post more information soon about the build and deployment processes for these two services, since there are some Maven dependency issues. Now that I’ve finished the basic data access plumbing, I’m moving on to the data mining work — the fun part. My next step is to review Amazon’s Elastic MapReduce service (built on Hadoop) for its applicability to the data mining algorithm that I plan to use initially.

The Code

All of my code is available at GitHub and is released under the Apache 2.0 open source license. You can browse through the POM files to get an understanding of how to build the projects, but I’ll write more about that soon.

Registration Service

Resource Manager Service

SNRDM (common dependency)

SimpleJPA is a bit of a hassle to build because it doesn’t package up all of its dependencies, so you’ll have to manage them yourself. I believe some of them can be found in public Maven repositories, but I can’t remember which ones. Listing 4 below is cut from one of my POM files and includes all of SimpleJPA’s dependencies. When you try to build, whichever of these dependencies is not found in the public Maven repositories will have to be manually added to your local Maven repo. The SimpleJPA wiki has more information on the dependencies, but lacks the version number information. You’ll need to use the version numbers specified in my POM files.

Listing 4 (from the pom.xml file for the Registration Service EJB):


        
            asm
            asm
            3.3.1
        
        
            aws
            aws-java-sdk
            1.1.1
        
        
            cglib
            cglib-nodep
            2.2
        
        
            commons
            commons-codec
            1.4
        
        
            commons
            commons-collections
            3.2.1
        
        
            commons
            commons-lang
            2.5
        
        
            commons
            commons-logging
            1.1.1
        
        
            commons
            commons-logging-adapators
            1.1.1
        
        
            commons
            commons-logging-api
            1.1.1
        
        
            ehcache
            ehcache-core
            2.3.1
        
        
            ehcache
            ehcache-terracotta
            2.3.1
        
        
            ejb3-persistence
            ejb3-persistence
            1.0.2
        
        
            org.apache.httpcomponents
            httpclient
            3.1
        
        
            kitty-cache
            kitty-cache
            1.1
        
        
            scannotation
            scannotation
            1.0.2
        
        
            slf4j
            slf4j-api
            1.5.11
        
        
            slf4j
            slf4j-jdk14
            1.5.11
        
        
            javassist
            javassist
            3.14.0.GA
        

Feel free to contact me if you have questions about the code.

Background Reading

A good overview of SimpleDB and SimpleJPA.

Amazon’s SimpleDB site.

Jersey API — Oracle’s site for Jersey.

Scribe — An awesome OAuth library.

Development Tools

REST Client — You’ll need a REST client to test any REST interactions other than GET operations. This one has worked well for me.

T-437 — A SimpleDB GUI Client, which you’ll want to use to verify that your data is stored as you expected.

SmartGit — A nice client GUI to help navigate Git, which is especially useful if you’re a newbie to Git like me.

To Do List

The UserCredential entity class is crudely structured to simplify my JPA testing and should be broken apart, after which key relationships between the classes will have to be defined for persistence purposes.

I’m using the old school JNDI method for retrieving a reference to my EJBs. I need to update to the Dependency Injection method.

It is likely that my software design isn’t as faithful to the REST concepts as it could be, so I’ll be taking a fresh look at ways to be more RESTful as I continue along with my project.

You are welcome to add your suggestions to the list.

5 Ways To Improve Netflix

I am an avid user of Netflix‘s “Watch Instantly” feature, but I believe Netflix is a market leader in danger of becoming a follower.  For a cautionary tale, revisit the history of digital music and the role of Napster.  Digital music is now ubiquitous, but Napster, the pioneer who initiated the online distribution trend and wrestled the distribution monopoly away from the brick and mortar music industry, no longer has a significant role in the distribution medium it created.  We’re now on the brink of a similar revolution in digital video distribution.  All of the technology barriers to delivering content directly over the Internet to our personal devices are gone.  Services such as Netflix and Hulu already provide the distribution capability and there are many competing set-top boxes, like the Playstation, XboxApple TV, Google TV, Boxee, etc., to bridge the last ten feet to the television.  The set-top box market is already crowded and each of the vendors has its own content delivery network (CDN), which means they are poised to compete with Netflix, even though most currently enable access to Netflix via their set-top boxes and networks.  Furthermore, the content-producing dinosaurs, who want to keep their choke hold on content distribution, aren’t going down without a fight.  They have a lot at stake.  There are undoubtedly legions of us who can’t wait to stop paying $100+ per month for a cable package just to get the small percentage of content we actually watch. To stay on top of the industry, Netflix must continue to innovate and be aggressive about making the Netflix brand synonymous with online content delivery.

The music industry lost its grip on the distribution chain when Napster and other peer-to-peer services set consumer’s expectations beyond the traditional model of buying CDs at brick and mortar stores.  Consumers were always willing to pay for the music they wanted, but they began to expect to be able to listen to that music on any device at any time, and they were no longer willing to be forced into a pricing model that was unfairly biased in favor of the music industry because of their control over the distribution mechanism.  In the end, the music industry was not able to hold back the tide of a la carte digital content delivery because consumers demonstrated that they were willing to steal the music en masse rather than acquiesce to the music industry’s demand to maintain the status quo (try as they did to maintain it).  The same outcome is inevitable for digital video distribution.  It’s here to stay and the content producers will ultimately have to get fully on board with an online distribution model.

Netflix, and Akamai, their content delivery network provider, have made some significant technical achievements to provide a quality experience for their subscribers.  They’ve built a vast network capable of staging content close to consumers  and can dynamically adjust playback quality based on available bandwidth.  Neflix has also recently jumped into cloud computing with both feet, which Computer Scientists and IT managers find interesting, but it is not really a discriminator in terms of user experience.  I don’t want to diminish any of Netflix’ technical advances, but it does not appear that it would be difficult for set-top box vendors, with their own CDNs, to replicate Netflix’ business model.  Apple has already demonstrated an ability to do just that with its music and movie delivery through the iTunes store.  So, here are a five suggested improvements where I believe Netflix has an opportunity to step up its game.

1) Add Linux Support

Netflix should support video streaming for Linux platforms (feel free to sign the petition).  The lack of Linux support has to do with the Microsoft DRM technology Netflix uses to protect their movie content from unauthorized use.  You can read all about it here.  Clearly Netflix has chosen to address the most widely-used platforms (Windows and Mac), but I believe that disregarding Linux support will prove to be a strategic mistake.  True, it is unlikely that the loss of revenue from Linux users is noticeable on Netflix’ bottom line, but it does damage their brand.  Linux users tend to be technophiles, and if they’re shut out of Neflix’ service, they’ll gravitate to an alternative that does support Linux, and where the technophiles go, the masses will follow.  The trickle of Linux users moving to an alternative service could lead to a flood of departures from Netflix, so I believe it is in their best interest to provide a Linux video streaming solution in the very near future.

2) Downloadable Movie Rentals

One of the reason’s I own an iPad is so that I can take media content with me when I travel.  Obviously airplanes are completely without Internet connectivity, and even though airports and hotels usually provide WiFi access points, the bandwidth is often so limited as to be nearly useless for accessing streaming media content, which means that Netflix is of no value to the traveler who wants to watch movies to help the time pass.  Conversely, Apple‘s iTunes store supports a downloadable rental feature that allows consumers to watch movies offline within 30 days of downloading the movie or to finish watching within 24 hours of first beginning the playback.  Netflix shouldn’t leave this discriminating feature to Apple and give an important segment of the user community a reason to build loyalty to a competitor.  Netflix should add the capability to download content to the local device for offline viewing (and they should probably start selling digital movies, just like Apple).

3) Improve Search Features

Hulu‘s search/browse experience is richer and more accessible than what Netflix provides.  With Hulu, subscribers can use a gallery view to filter movies by key words, two levels of genre, studio and production date, in addition to a few other less-interesting filters. Conversely, Netflix subscribers can only filter a gallery view by two levels of movie genre.  After some unintuitive site navigation, users can get to a sortable list (no gallery view) that supports additional filters by year, maturity and rating.  Access to the “sortable list” feature doesn’t appear until you’ve clicked through a couple of levels of pages, which means that many will not find it at all.  However, even with the sortable list, there is still no way to filter key word searches on the Netflix site.  Shouldn’t subscribers at least be able to constrain key word searches by movie genre?  I get that most people just want the simplest possible search feature, but there should at least be an advanced feature supporting multiple constraint options.  Netflix could learn a lot from IMDB‘s advanced search feature.

4) Make The Netflix Site A Portal For Movie Aficionados

I sometimes have the impression that there aren’t many true movie fanatics working at Netflix.  By making the Netflix site a portal for movie fans, Netflix can draw in potential new subscribers.  Some might argue that Netflix shouldn’t try to replicate what IMDB already does well, but I disagree.  Movie fans are a lot like music fans, who obsess over every compositional element of a song or the poetry of lyrics.  They want to explore all of the technical aspects of a musical composition and know the story behind the music and the recording artist.  Movie fans are the same, and today, IMDB is their source of information for all things movie related — for past, present and future films. Netflix should replicate the IMDB model of maintaining the current pulse of the entertainment industry (movies and TV) and provide a comprehensive database of all movies, the people associated with them (cast, crew, production team), production details, technical details and extensive biographies and linkage associations between people, movies and events (e.g. premiers, awards ceremonies, etc).  By drawing in hardcore movie fans, others will follow, which will increase the prominence of the Netflix site in both the movie industry and the fan base, ultimately resulting in more subscribers for Netflix.

5) Build A Social Network Around Netflix Subscribers

I find it remarkable that Netflix is not integrated with social networking sites like Facebook and Twitter.  It appears that Netflix started down this path with Facebook integration in early 2009, but seems to have abandoned the effort.  The link has disappeared from Netflix’ main page, although the “Facebook Connect” page is still accessible with a direct link.  However, once enabled, Facebook integration appears to be broken, as no Netflix data shows up on the Facebook wall.  Virtually every news website in the world lets readers share articles they find interesting via Facebook and Twitter.  Amazon lets shoppers share products via Facebook and Twitter.  Yet, when I rate a movie or write a review on the Netflix site, there is no way for me to easily share that information to my Facebook and Twitter accounts.  Netflix is losing a huge marketing opportunity with this omission.  It is well understood that our social networks tend to be comprised of like-minded people, so when a friend recommends a movie, it is many times more influential than the recommendation of a professional movie critic.  Aside from marketing movies, allowing Netflix subscribers to post information about the movies they’ve seen to Facebook and Twitter would help to grow the Netflix brand.  When a subscriber posts a movie rating to a social networking site, they’re not just endorsing a movie, they’re also endorsing the Netflix service.  I can only assume that Netflix is already way ahead of me on this, and a team is about to roll out their new and improved social networking integration any day now.

In addition to integrating with existing social networking sites, Netflix should build their own social network around their content, including the enhancements described in recommendation #4.  To be fair, Netflix already tried this but ultimately shuttered the site’s “Community” features to devote engineering resources elsewhere.  It also seems that Netflix’s social networking features may not have provided sufficient privacy controls to allow users to filter what they share about their viewing habits, which could have made Netflix’ social networking features unpopular.  Facebook has been slow to address the same need for granular privacy features — we don’t necessarily want to share the same things with our mom and our drinking buddies — but that doesn’t mean social networking should be abandoned.  Yes, first and foremost Netflix must flawlessly deliver content, but their competitors can do that too.  Netflix needs discriminators, and building a social network around their service could be a big one.  Today, the Netflix site is very basic.  You can search for a movie, add it to a queue and either receive it through the mail or stream it instantly.  Social networking could make the Netflix site much more engaging, which would bring in new subscribers and put them in touch with more movies they want to watch.

At a minimum, the Netflix site should allow subscribers to create a personal profile that can be shared with others.  Profiles could enable users to create and share lists of movies around genres, topics, studios, actors and directors, to subscribe to profiles for studios, actors and directors, to receive change notifications, to aggregate movie reviews and to moderate discussion forums, just to name a few potential features.  The best way to find movies we like is to find people like ourselves who like movies.  The Netflix site does support a simple subscriber profile today, which appears to serve the exclusive purpose of associating movie reviews to a subscriber, but there is no networking capability.  A crude, unaffiliated “Netflix Community” site has emerged to fill the space that Netflix is leaving unoccupied.

Apple‘s Ping social network for music is an example that Netflix could follow.  Apple understands the value of the social network — consumers buy what their friends buy — so they’ve developed Ping, which is integrated with the iTunes music store.  Netflix should also learn from what Apple has done wrong with Ping.  Households typically have one account for online storefronts, with multiple members of the household buying products through the single account.  With iTunes and Ping, this causes a confusing social networking experience.  Adults probably don’t want their children’s music purchases associated with their social network profile, and the same would be true for movie preferences.  This also tends to be a problem with recommendation systems.  If you’re sharing an account, the recommendations become polluted because each different user’s orthogonal preferences are combined into a single profile.  Netflix has an opportunity to do something new and discriminating by combining online media distribution, retail and social networking into a high quality user experience.

Conclusion

Each of these recommendations is meant to address improvements that will add discriminators to the Netflix service in an increasingly competitive market.  It goes without saying that offering current content that consumers want to watch is the single biggest discriminator for Netflix, but that is largely a business consideration to be negotiated with content producers and outside the scope of an engineer’s purview.  Netflix already offers a great service and is the clear leader in the industry, but if they are to continue to dominate, Netflix must continue to innovate in ways that enrich the subscriber experience.

Getting Started with Amazon’s Elastic Compute Cloud

I’ve recently started working with Amazon’s Elastic Compute Cloud (EC2) for my personal project and decided to capture the answers to some of the beginner questions that I worked through. Amazon provides a great deal of documentation, but I believe the organization of their documentation could use some work.  Before you begin working with EC2, you should start by reading Amazon’s Getting Started Guide.  Here are the answers to some basic start-up questions:

1) How do I launch an instance of an Amazon Machine Image (AMI)?

You can find the answer here.  Launching an instance of an AMI gives you a running “machine” that conforms to the configuration of the specific AMI, on which you can then deploy your applications.  You can find a list of available AMIs here.

2) How do I connect to my machine instance?

You’ve started your instance, now you want to open a terminal session to it from your local workstation.  You’ll need to use SSH with the key file you generated when you launched your instance.  You can find a detailed explanation here.  The SSH command will look like this:

ssh -i keyfile.pem username@hostname

The root user login has been disable, so instead each image has a default username that you must use to SSH in to the instance.  I have found that the default user name sometimes pops up in a dialogue box when you launch the instance, but more often than not it doesn’t and you have to hunt for it.  For example, the user name for Amazon’s Linux image can be found in their user guide.  Alternatively, if you just try to SSH into your instance using “root” for the username, the connection will fail and respond with the required username.  It’s the fastest way I’ve found to find the right username, but if someone knows where I can find it in the AWS management console, please let me know.

3) How do I load my own software on the instance I started?

Amazon’s documentation isn’t very clear on this.  You’ll need to use SCP to upload files from your local machine to your running instance.  The command to upload a source file on your local machine to a target file on your running instance will look like this:

scp -i keyfile.pem source-file-name username@hostname:target-file-name

You’ll use the key file that you generated when you launched your instance (the same one you used for your SSH session).  The default username is the same one you used to establish an SSH session, described above.  The host name (public DNS) is available in the AWS management console, described in the metadata for your running instance.

4) Once I’ve gotten my AMI configured the way I want it, how do I save it for future use?

Amazon’s documentation is somewhat lacking on this subject.  It can be a little difficult to decipher the difference between an S3-backed instance and anEBS-backed instance or to understand the use cases that would motivate you to choose one over the other.  According to Amazon’s documentation:

An Amazon EC2 instance can be launched from an AMI backed by Amazon EBS or from an AMI backed by Amazon S3. Instances launched from AMIs backed by Amazon EBS use Amazon EBS volumes as their root devices. Instances launched from AMIs backed by Amazon S3 use an instance store as the root device (e.g., / or C:).

There’s a bit of ambiguity here.  What does it mean to say that an S3-backed instance has an instance store as the root device?  After reading more documentation, what I think this means is that the state of the S3-backed images is strictly transient.  Once you terminate the instance, any changes you make to the instance configuration are lost.  For S3-backed instances, the root device persistence is on the local disk (instance store) of the machine on which the instance is running, which is only available for the life of the instance.  Conversely, when you stop an EBS-backed instance, its state is automatically persisted to an EBS volume.  Stopping an instance is not the same as terminating an instance.  A terminated instance cannot be restarted.  So, if you want to make changes to your instance configuration and keep it around for future use, you must create an EBS-backed instance, and you should only stop it, don’t terminate it.

Having said all of that, what you really want to do is to create a new AMI based on the running instanced you’ve configured.  Although there is a browser-based management console that will let you start instances of existing AMIs, it looks like one must use the command line API tools to save your configured instance..  Once you’ve created the new AMI, you can then instantiate it from the console at a future time.  The command to create the image from the instance follows below.

ec2-create-image instance_id [--name name, --description description, --no-reboot]

The instance_id can be retrieved from the web console for the running instance that you want to save as your new AMI.  The documentation for all of the command line tools can be found here.  It takes a little bit of effort to get your local environment set up to run the command line tools, but the documentation is clear and straightforward, so there’s no point in repeating it here.

I’ve only covered the basics in this post, which I’m turning into a HowTo that I’ll continue to evolve as I work through more complex use cases.  The next step for me is to push some of my existing code into the EC2 and investigate how I’ll use Amazon’s cloud features (databases, messaging, etc) so that I can nail down what my target architecture will look like.  I’m also still looking into what my development model will look like – i.e. how much code I can write and test in my local environment versus what I need to move into the cloud, since Amazon does not provide an emulation environment for developers.  More to come.

Architecting for the Cloud

My intention is for most, if not all, of the software I write for this project to run on the server side.  I plan to expose a RESTful Web Service API to consumers, which will authorize my software to retrieve their data from social networking sites and in return provide highly tailored recommendations for movies, books, businesses, etc, and connections to other social networking users.  I’ve outlined the project’s objectives here.  I’m now examining potential server side architectures and prototyping a few options.  The first option I’ve considered is to use Java enterprise technologies and deploy software in the Glassfish Application Server.  Why Java EE?  Because it provides the kind of capabilities that one would want in a reliable, scalable, manageable server side architecture and has support for the necessary transports baked into the API, such as Web Services, JMS and XML and JSON marshaling.  Why Glassfish?  The primary reason for selecting Glassfish is that I use it at my day job, and we’ve had good luck using it on a large, enterprise software project, so I believe it’s a quality product.  I also prefer using open source products when I can.

Although I have a high opinion of Glassfish and the JEE suite, I find the arguments in favor of cloud computing increasingly persuasive.  For me, a rather simple and compelling argument has to do with the accessibility of large-scale hardware.  This is a personal project, so I’m using personal resources.  In terms of hardware, that means using my MacBook and my PC server running Ubuntu.  I don’t want to go out and buy a scalable hardware configuration to use in my personal development environment, which  will inevitably be different from the deployment environment that I’d pay for at some Web hosting company’s data center.  We run into this dilemma at my day job all the time.  Our production deployment environment is an expensive N+1 configuration involving many high-end servers to support load balancing and clustering of JEE App Servers, Oracle DBs, JMS message brokers, etc.  We attempt to create representative hardware configurations in our distributed development centers and in multiple testing facilities.  This has turned out to be costly, and it has been difficult to maintain configuration control over multiple environments.  With multiple, external software vendors developing pieces of the system, the cost and complexity is exacerbated.  Just the planning and labor to install, configure and maintain so many resources in our development and test facilities is a large burden.  We’re always discovering a hardware shortfall somewhere, which takes time to resolve, which delays our software deliveries.  Our teams often end up developing and testing in non-representative environments, which leads to late discovery of complex bugs related to scalability and reliability.  As the chief architect on this project, I’m often asked to define the minimal representative hardware environment.  Good luck with that one.

Another reason I find cloud computing valuable is that it takes the hardware problem off of my plate.  I’m a software guy, so I really don’t give a fig about developing and managing an IT infrastructure.  Again, I’ll use my day job as an example.  Because we’re building our own data centers to host our software, I have to spend time attempting to give requirements to our hardware team well in advance of when we could definitively specify those requirements.  For example, how many servers will we need to host our software?  Well, I don’t know for sure until we have a critical mass of software to allow us to empirically create some meaningful performance profiles.  We may have to make several adjustments to the planned deployment configuration to address load balancing issues or resolve performance bottlenecks, yet our hardware team needs to procure hardware months in advance of our software deliveries to create testing environments and to install data centers at remote locations.  This will cause me endless pain when I have to ask the hardware team to change the production deployment configuration after installation.  Adrian Cockcroft recently made similar points when discussing Netflix‘ migration to Amazon’s cloud.

So, I’ve decided to develop my software to run within a cloud, and I’ve settled on the Amazon cloud platform.  I don’t know if it’s the best cloud architecture and toolset, but it does seem to be the most mature and widely used.  That’s good enough for me to move on.

I’ve started reviewing Amazon’s developer documentation, and after evaluating the SOAP and REST APIs, I have initially settled on using their Java SDK.  I’m now working through some of the examples and trying to get a better feel for the programming model.  It looks to me like traditional enterprise computing frameworks, like Java EE, are mutually exclusive with Amazon’s API.  By which I mean that Amazon seems to have built its own framework from scratch, supporting scalability (EC2), database access (SimpleDB, RDS), large-scale data processing (MapReduce),  messaging (SQS, SNS) and storage (S3, EBS).  There is some indication that Java EE applications can be deployed within the Amazon cloud, but this does not seem to be the model for developing new applications that are “native” to Amazon’s cloud platform.

My next step is to dive into porting my initial POJO code over to Amazon’s cloud, which means working to become familiar with their API.  I’ll provide progress updates within a few days.