Sunday, August 21, 2011

Vmware offers RabbitMQ as a service with Cloud Foundry

Few days back we looked into RabbitMQLink

Now Vmware has announced that they would provide RabbitMQ as a service with Cloud Foundry.

Seems like a good move which further helps the user to select his own service while building a cloud, rather than working with a single service provider.


Wednesday, August 3, 2011

Scala on VMWare's Cloud Foundry

VMWare announced Cloud Foundry(TM) support for Scala in June at Scaladays 2011. During a recent interview Ramnivas Laddad, Senior Software Engineer at VMWare said "Scala reduces the barrier between idea and application. Cloud Foundry reduces the barrier between application and scalable deployment." You can learn more about this exciting open source project in this interview with Ramnivas and Dekel Tankel, Cloud Foundry Product Marketing at VMWare. Check here for more details http://www.scala-lang.org/node/10454

Saturday, June 25, 2011

What is RabbitMQ?

To understand what RabbitMQ is, first let's try to look into some basic concepts on which RabbitMQ is built on and for .

Assume you have an online store that integrates several systems, such as the Web-based front-end system, the credit card verification system, the retail system, and the shipping system i.e. an end to end business transaction system. The origin of the control flow is usually in the front end. For example, a customer placing an order has to send a request message to the credit card verification system. If the credit card information is validated, the online store sends request messages to the various retail systems, depending on the ordered items. An order for food translates into a purchase order message for the food retailer; an order for shoes translates into a purchase order message for the shoes retailer and so on.

The control flow could also originate from the retailers to the customer. For example, when a retailer updates a catalog, the retail system sends catalog update messages to the store so that the store can display the new items. Similarly, when a shipper changes pickup times, the shipping system sends update messages to all the retailers the system serves so that they can have the shipments ready in time.

OK fine, everything works fine right?, what exactly is the problem?


Have you wondered how the applications are integrated in the example above without enforcing a common interface and also allow each application to initiate interactions with several other applications? An interesting problem isn't it?, interesting and challenging because of some of the problems noted below.
  • How to route messages to one or more of many destinations?
  • How to transform messages to an alternative representation(format) ?
  • How to perform message aggregation, decomposing messages into multiple messages and sending them to their destination, then recomposing the responses into one message to return to the user
  • How to interact with an external repository to augment a message or store it?
  • How to invoke Web services to retrieve data?
  • How to Respond to events or errors?
  • How to provide content and topic-based message routing using the publish/subscribe model?
  • How to take care of applications in an integration solution that could have conflicting quality of service (QoS) requirements?

Message broker provides a medium for resolution of these issues.

What is AMQP?


AMQP is a standard wire-level protocol and semantic framework for high performance enterprise messaging. From the AMQP website: AMQP is an Open Standard for Messaging Middleware. By complying to the AMQP standard, middleware products written for different platforms and in different languages can send messages to one another. AMQP addresses the problem of transporting value-bearing messages across and between organizations in a timely manner.If you are interested in why AMQP check this.


What is the Open Telecom Platform (OTP)?


The Open Telecom Platform (OTP) is a battle-tested library of management, monitoring, and support code for constructing extremely high-performance, reliable, scalable, available distributed network applications. It is written in Erlang.

OK, lets connect the strings and see what and why RabbitMQ
RabbitMQ is an open source message broker software, using the standard AMQP protocol. The RabbitMQ server is written in Erlang and is built on the OTP framework for clustering and failover. The source code is released under the Mozilla Public License.

RabbitMQ offers a reliable, highly available, scalable and portable messaging system with predictable and consistent throughput and latency.

RabbitMQ is 100% open source and 100% based on open standard protocols freeing users from dependency on proprietary vendor-supplied libraries.

RabbitMQ is designed with interoperability capability with other messaging systems: it uses the leading AMQP protocol, the open standard for business messaging, and, through adapters, supports SMTP, STOMP and HTTP for lightweight web messaging.

RabbitMQ is supported by a thriving community of active contributors. A full range of commercial support services are available through the SpringSource division of VMware. Do check here for more details. Also here is the FAQ for RabbitMQ.

Saturday, June 18, 2011

What is Vmware vFabric5?

VMware recently announced vFabric 5, an integrated application platform for virtual and cloud environments. Let's try to understand this in simple terms and also look into the innovation and value add by this product.

What will
VFabric 5 provide?

Combining the Spring development framework for Java and the latest generation of vFabric application services, vFabric 5 will provide the core application platform for building, deploying and running modern applications.

But what is so special, can't I just use Spring framework directly?

vFabric 5 introduces a flexible packaging and licensing model that will allow enterprises to purchase application infrastructure software based on virtual machines, rather than physical hardware CPUs, and to pay only for the licenses in use. In a move to eliminate excess software purchases in anticipation of peak loads, the new model only charges enterprises for the licenses in use.

The model in vFabric 5 more closely aligns to cloud computing models that directly link the cost of software with use, consumption and value delivered to the organization.

Over 3 million developers use the Spring framework to build enterprise Java applications. With vFabric 5, users can gain insight into the performance of their Spring applications with the new Spring Insight Operations.

From business perspective what is the problem that VFabric 5 will solve?

What we can see is Cloud computing is reshaping not just how IT resources are consumed by the business, but how those resources are purchased, licensed and delivered. While application infrastructure technologies have advanced to meet the needs of today's enterprise, to date the business models have remained rigid and out of date.So VFabric 5 seems to be an effort in resolving this issue.

Tell me more how Vmware is implementing the product?

VMware's new model is a new approach in licensing middleware, and it highlights the first time that middleware software is marketed, sold and to some degree engineered on a VM-first basis.

As virtualization technology has matured, one can see most software vendors have shied away from recommending a virtual machine as the preferred way to run software for fear of degraded performance or support complications.

With VMware pushing its own new application platform stack through the SpringSource division and the set of associated acquisitions over the last couple of years, it was perhaps only a matter of time before we started seeing optimizations for virtualized deployment and then a strategy built around a VM-first approach.

Engineered specifically to leverage the server architecture of VMware vSphere, vFabric 5 features the new Elastic Memory for Java capability. Elastic memory allows applications to safely grow and shrink the memory heap as needed to survive load peaks and memory leaks. The feature works to optimize memory management across Java applications via memory ballooning. The envisioned result is greater application server density for Java workloads on vFabric.

Sunday, May 29, 2011

What is MapReduce ?

MapReduce is a parallel programming technique made popular by Google. It is used for processing very very large amounts of data. Such processing can be completed in a reasonable amount of time only by distributing the work to multiple machines in parallel. Each machine processes a small subset of the data. MapReduce is a programming model that lets developers focus on the writing code that processes their data without having to worry about the details of parallel execution.

MapReduce requires modeling the data to be processed as key,value pairs. The developer codes a map function and a reduce function.

The MapReduce runtime calls the map function for each key,value pair. The map function takes as input a key,value pair and produces an output which is another key,value pair.

The MapReduce runtime sorts and groups the output from the map functions by key. It then calls the reduce function passing it a key and a list of values associated with the key. The reduce function is called for each key. The output from the reduce function is a key,value pair. The value is generally an aggregate or something calculated by processing the list of values that were passed in for the input key. The reduce function is called for each intermediate key produced by the map function. The output from the reduce function is the required result.

As an example , let us say you have a large number of log files that contain audit logs for some event such as access to an account. You need to find out how many times each account was accessed in the last 10 years.
Assume each line in the log file is a audit record. We are processing log files line by line.The map and reduce functions would look like this:
map(key , value) {
// key = byte offset in log file
// value = a line in the log file
if ( value is an account access audit log) {
account number = parse account from value
output key = account number, value = 1
}
}

reduce(key, list of values) {
// key = account number
// list of values {1,1,1,1.....}
for each value
count = count + value
output key , count
}
The map function is called for each line in each log file. Lines that are not relevant are ignored. Account number is parsed out of relevant lines and output with a value 1. The MapReduce runtime sorts and groups the output by account number. The reduce function is called for each account. The reduce function aggregates the values for each account, which is the required result.

MapReduce jobs are generally executed on a cluster of machines. Each machine executes a task which is either a map task or reduce task. Each task is processing a subset of the data. In the above example, let us say we start with a set of large input files. The MapReduce runtime breaks the input data into partitions called splits or shards. Each split or shard is processed by a map task on a machine. The output from each map task is sorted and partitioned by key. The outputs from all the maps are merged to create partitions that are input to the reduce tasks.

There can be multiple machines each running a reduce task. Each reduce task gets a partition to process. The partition can have multiple keys. But all the data for each key is in 1 partition. In other words each key can processed by 1 reduce task only.

The number of machines , the number of map tasks , number of reduce tasks and several other things are configurable.

MapReduce is useful for problems that require some processing of large data sets. The algorithm can be broken into map and reduce functions. MapReduce runtime takes care of distributing the processing to multiple machines and aggregating the results.

Apache Hadoop is an open source Java implementation of mapreduce. Stay tuned for future blog / tutorial on mapreduce using hadoop.

Tuesday, May 24, 2011

Comparison of different Paas providers

Found this interesting link on comparison of different Paas providers. Check the link here

Monday, May 23, 2011

Vmware CloudFoundry Architecture

Few days back we discussed a brief introduction to CloudFoundry . Lets now try to explore the architecture of VMware's latest Paas offering called CloudFoundry.

Just to clarify, this article is totally based on my understanding and is not an official document about CloudFoundry in any way, feel free to let me know if my understanding is wrong.

Let's first try to answer a question, since we learned previously that Paas is a super cool solution, one might wonder..

Why aren't lot of companies providing Paas solutions?


Building Platform as a Service (PaaS) is fairly complicated since it involves various complicated processes of building, deploying, or maintaining of various activities like orchestration of all the services internally, then abstracting all of that work, and finally, having to market, sell it, and maintain it. Due to the involvement of heavy investment, very few companies have considered building their own Paas solution. Vmware has interestingly made Cloud Foundry service open source.

What has CloudFoundry been orchestrated in?

Interestingly it is orchestrated entirely in Ruby! No Erlang, no JVM's, all Ruby under the hood.
For a nice technical overview, checkout this webinar

Who orchestrates all the components in CloudFoundry?


To orchestrate all of these moving components, the "brain" of the platform is a Rails 3 application (Cloud Controller) whose role is to store the information about all users, provisioned apps, services, and maintain the state of each component. When you run your CLI (command line client) on a local machine, you are, in fact, talking to the Cloud Controller. Interestingly, the Rails app itself is designed to run on top of the Thin web-server, and is using Ruby 1.9 fibers and async DB drivers - in other words, async Rails 3!

Rails application works hand in hand with is the Health Manager, which is a standalone daemon, which imports all of the CloudController ActiveRecord models, and actively compares to what is in the database against all the chatter between the remaining daemons. When a discrepancy is detected, it notifies the Cloud Controller - simple and an effective way to keep all the distributed state information up to date.

How is Orchestration of the CloudFoundry platform done?

The remainder of the CloudFoundry platform follows a consistent pattern: each service is a Ruby daemon which queries the CloudController when it first boots, subscribes to and publishes to a shared message bus, and also exposes several JSON endpoints for providing health and status information. Not surprisingly, all of the daemons are also powered by Ruby EventMachine under the hood, and hence use Thin and simple Rack endpoints.

The router is responsible for parsing incoming requests and redirecting the traffic to one of the provisioned applications (droplets). To do so, it maintains an internal map of registered URL's and provisioned applications responsible for each. When you provision or decommission a new app server instance, the router table is updated, and the traffic is redirected accordingly. For small deployments, one router will suffice, and in larger deployments, traffic can be load-balanced between multiple routers.

The DEA (Droplet Execution Agent) is the supervisor process responsible for provisioning new applications: it receives the query from the CloudController, sets up the appropriate platform, exports the environment variables, and launches the app server.

Finally, the services component is responsible for provisioning and managing access to resources such as MySQL, Redis, RabbitMQ, and others. Once again, very similar architecture: a gateway Ruby daemon listens to incoming requests and invokes the required start/stop and add/remove user commands. Adding a new or a custom service is as simple as implementing a custom Provisioner class.

What glues all these moving pieces together?


Each of the Ruby daemons above follows a similar pattern: on load, query the Cloud Controller, and also expose local HTTP endpoints to provide health and status information about its own status. But how do these services communicate between each other? Well, through another Ruby-powered service, of course! NATS publish-subscribe message system is a lightweight topic router (powered by EventMachine) which connects all the pieces! When each daemon first boots, it connects to the NATS message bus, subscribes to topics it cares about (ex: provision and heartbeat signals), and also begins to publish its own heartbeats and notifications.

This architecture allows CloudFoundry to easily add and remove new routers, DEA agents, service controllers and so on. Nothing stops you from running all of the above on a single machine, or across a large cluster of servers within your own datacenter.

Distributed Systems with Ruby? Yes!

Building a distributed system with as many moving components as CloudFoundry is no small feat, and it is really interesting to see that the team behind it chose Ruby as the platform of choice. If you look under the hood, you will find Rails, Sinatra, Rack, and a lot of EventMachine code. If you ever wondered if Ruby is a viable platform to build a non-trivial distributed system, then this is great case study and a vote of confidence by VMware. Definitely worth a read through the source!

Another interesting read would be How Cloud Foundry works when a new Application is Deployed

Next time will try to explain CloudFoundry dynamics with a use case and go more into technical depth of each block.



Wednesday, May 18, 2011

What is Infrastructure as a Service?

Infrastructure as a service(Iaas) describes the distribution of access to a computer infrastructure through one management console. The computer infrastructure will be traditionally hosted offsite and companies will pay per use or account to access the service. Wait a minute.. I am getting all confused. First Paas, Saas and now Iaas. Give me a wider picture of where each of these fit.

Here you go, hope this picture answers all the questions..



Okay fine I get it, but what could one possibly gain from using Iaas?


In a traditional computer space in a company, computer infrastructure included personal machines, a server room full of computer systems, storage space and cooling devices, and specialized staff who can maintain the computer structure. This can run into the tens of thousands of dollars and until recently has been a required part of almost every business. Infrastructure as a service now provides companies with the chance to set up their own computing system without the initial expenditure involved in purchasing it. Workers simply login to their own computer and access everything they need to and site company at a pay per use price.


Looks interesting, can I benefit from Iaas if I want to run my own startup?

The benefits of using infrastructure as a service clearly make this a desirable option for many businesses. For a start, there is no initial cost in setting up computer infrastructure, which saves the company thousands of dollars a year. There are no costly upgrades or maintenance fees, as this is all included within the package price and is taken out of the hands of the company using the service. Staffing costs are kept down, as there is no need for an intensive IT department.


Is the server room in my office for me to play basket ball?

Physical space within the office is maximized, with large server rooms becoming obsolete and workers being able to log onto the infrastructure from their own laptops. This also means that infrastructure is a service makes it easier to work from home or when traveling. There are also environmental benefits, with less companies running their own resource intensive server rooms, and many of the companies who offer infrastructure as a service sell the space in a way that there is very little unused computer space or idle server systems.

The benefits of infrastructure as a service greatly outweigh the negatives, and even some previous issues such as information protection, security and downtime have been completely revolutionized, meaning that those working on infrastructure as a service not only save money, but can also increase their productivity. Many systems are being developed as green virtualization solutions so that clients using IaaS can know that they are doing their best to help the environment while they work.

The idea here is to build a base to understand what VMware vCloud Director(will try to explain in next post) is in simple terms.





Saturday, May 14, 2011

What is Cloud Foundry??

One can see lot of enthusiasm about Cloud foundry, Vmware's latest offering.

Okay cut it short, in layman's terms what is it?

To first understand this let's look at some terminologies

What is Saas and Paas?, I hear these too often these days

Using Salesforce.com as an example - they offer the Force.com platform, which provides a database, a programming language, integration features and so on. You can use this platform to build whatever you need/like.

Salesforce also offer their own, prebuilt CRM applications - this is software-as-a-service as the application has been built for you, you simply start using it.

PaaS provides you with the components and tools to build something; SaaS provides you a prebuilt application you can pick up and use straight away. The line can be blurred - again, using the Salesforce example, you can tailor their SaaS offerings by using some of their PaaS technologies.

Okay fine we get an idea of Paas and Saas,

But what exactly are these public and private clouds?

A public cloud is offered as a service via web applications/web services( usually over an Internet connection). Private cloud and internal cloud are deployed inside the firewall and managed by the user organization.

There is another type of cloud, hybrid cloud. A hybrid cloud environment consisting of multiple internal and/or external providers will be typical for most enterprises.
Check this interesting link

Okay this warmed me up a bit, so what is Cloud Foundry now?, should I even care?

Well you should definitely care because of vmware's midas touch :) on any product. But on a serious note it is a great effort to bring multiple frameworks, services, clouds etc under a single roof. Let me explain..

For development of applications running in the cloud there are various options. Google App Engine, Force.com and Microsoft Azure are some big names. They all use their own development platform which means customers on that platform cannot switch to another cloud provider without having to rewrite the application code. The code is stuck to the platform of the provider. It is a bit like having an electric appliance which can only be used with a specific electricity company. If the electricity company has many outages or raises the price of power, you cannot switch to another company. The common name for this is vendor lock-on or Hotel California. You can check in but you cannot check out. The provider will lure you with low prices and when enough customers are in, prices are raised to be able to have a healthy economic model and deliver good return for investors and shareholders.

VMware announced it’s own application development platform called Cloud Foundry in April. It is based on open source software. It can be described as a layer which sits between the cloud provider infrastructure (computing resources) and the software used for development.

As it is open source the application can run on cloud providers which support Cloud Foundry. Even when the cloud provider does not run VMware vSphere as it’s underlying virtualization platform. So if the cloud provider does not comply to SLA’s or raises the price, the app can be moved quite easily to another provider without having to rewrite the app. Wow isn't this super cool, I feel it is kind of moving from dictatorship to democracy :).

Cloud Foundry supports MySQL, MongoDB and Redis [and in] coming months, they claim they would add support for other application services. In the initial release, Spring for Java, Rails and Sinatra for Ruby and Node.js are supported. The system also supports other JVM-based frameworks such as Grails.

Is it Open Source? But Of Course

Cloud Foundry is an open source project with a community and source code today at www.cloudfoundry.org.

You can also check this link

In the next post will try to go in more details about the Cloud Foundry architecture.

Again aim here is to explain in simple terms the technology. Feedback is always appreciated.