SOFTWARE ENVIRONMENTS FOR DISTRIBUTED SYSTEMS AND CLOUDS

SOFTWARE ENVIRONMENTS FOR DISTRIBUTED SYSTEMS AND CLOUDS

Popular software environments for using distributed and cloud computing systems.

1.4.1 Service-Oriented Architecture (SOA)

In grids/web services an entity is a service, in Java an entity is a Java object, and in CORBA an entity is a CORBA distributed object in a variety of languages.
These architectures build on the traditional seven Open Systems Interconnection (OSI) layers that provide the base networking abstractions.
On top of this we have a base software environment, which would be
- .NET or Apache Axis for web services,
- the Java Virtual Machine for Java, and
- a broker network for CORBA.
On top of this base environment one would build a higher level environment reflecting the special features of the distributed computing environment.
This starts with entity interfaces and inter-entity communication, which rebuild the top four OSI layers but at the entity level and not the bit level.
Figure 1.20 shows the layered architecture for distributed entities used in web services and grid systems.

1.4.1.1 Layered Architecture for Web Services and Grids

The entity interfaces correspond to the
- Web Services Description Language (WSDL),
- Java method, and
- CORBA interface definition language (IDL) specifications.

These interfaces are linked with customized, high-level communication systems: SOAP, RMI, and IIOP.

These communication systems support features like
- message patterns (such as Remote Procedure Call or RPC),
  - built on message-oriented middleware (enterprise bus) infrastructure
  - provide rich functionality and support virtualization of routing, senders, and recipients

fault recovery and specialized routing
- abstractions (such as messages versus packets, virtualized addressing)
- security – such as Internet Protocol Security (IPsec) and secure sockets in the OSI layers
- higher level services for registries, metadata, and management of the entities

discovery and information services – example
- CORBA Trading Service,
- UDDI (Universal Description, Discovery, and Integration),
- LDAP (Lightweight Directory Access Protocol), and
- ebXML (Electronic Business using eXtensible Markup Language)

Management services include service state and lifetime support – example
- CORBA Life Cycle and Persistent states,
- the different Enterprise JavaBeans models,
- Jini’s lifetime model, and
- a suite of web services specifications

1.4.1.2 Web Services and Tools

Loose coupling and support of heterogeneous implementations make services more attractive than distributed objects.
two choices of service architecture:
- web services or REST systems.

Both web services and REST systems have very distinct approaches to building reliable interoperable systems.

Web services approach
- Aims to fully specify all aspects of the service and its environment.
- This specification is carried with communicated messages using Simple Object Access Protocol (SOAP).
- The hosting environment then becomes a universal distributed operating system with fully distributed capability carried by SOAP messages.

REST approach,
- Delegates most of the difficult problems to application (implementation-specific) software.
- In a web services language, REST has minimal information in the header, and the message body carries all the needed information.

1.4.1.3 The Evolution of SOA

As shown in Figure 1.21, service-oriented architecture (SOA) has evolved over the years.
SOA applies to building grids, clouds, grids of clouds, clouds of grids, clouds of clouds (also known as interclouds), and systems of systems in general.
A large number of sensors provide data-collection services, denoted in the figure as SS (sensor service).
- A sensor can be a ZigBee device, a Bluetooth device, a WiFi access point, a personal computer, a GPA, or a wireless phone, among other things.

Raw data is collected by sensor services.
All the SS devices interact with large or small computers, many forms of grids, databases, the compute cloud, the storage cloud, the filter cloud, the discovery cloud, and so on.
Filter services (fs in the figure) are used to eliminate unwanted raw data, in order to respond to specific requests from the web, the grid, or web services.
A collection of filter services forms a filter cloud.
SOA aims to search for, or sort out, the useful data from the massive amounts of raw data items.
Processing this data will generate useful information, and subsequently, the knowledge for our daily use.
Finally, we make intelligent decisions based on both biological and machine wisdom.

Most distributed systems require a web interface or portal.
Raw data is collected by a large number of sensors
This raw data stream is passed through a sequence of compute, storage, filter, and discovery clouds to transform it into useful information or knowledge.
The inter-service messages converge at the portal, which is accessed by all users.

1.4.1.4 Grids versus Clouds

The boundary between grids and clouds are getting blurred in recent years.
For web services, workflow technologies are used to coordinate or orchestrate services with certain specifications used to define critical business process models such as two-phase transactions.
- The BPEL Web Service standard is the general approach used in workflow
- Other important workflow approaches include Pegasus, Taverna, Kepler, Trident, and Swift.

In all approaches, one is building a collection of services – which together tackle all or part of a distributed computing problem.

In general, a grid system applies static resources, while a cloud emphasizes elastic resources.
For some researchers, the differences between grids and clouds are limited only in dynamic resource allocation based on virtualization and autonomic computing.

One can build a grid out of multiple clouds.
This type of grid can do a better job than a pure cloud, because it can explicitly support negotiated resource allocation.
Thus one may end up building with a system of systems: such as a cloud of clouds, a grid of clouds, or a cloud of grids, or inter-clouds as a basic SOA architecture.

1.4.2 Trends toward Distributed Operating Systems

The computers in most distributed systems are loosely coupled.
Thus, a distributed system inherently has multiple system images.
All node machines run with an independent operating system.
The distributed OS –
- Promotes resource sharing
- Fast communication among node machines
  - message passing and RPCs for internode communications
- Manages all resources coherently and efficiently.
- Improves the performance, efficiency, and flexibility of distributed applications.

1.4.2.1 Distributed Operating Systems

Three approaches for distributing resource management functions in a distributed computer system.
- The first approach is to build a network OS over a large number of heterogeneous OS platforms.
- The second approach is to develop middleware to offer a limited degree of resource sharing.
- The third approach is to develop a truly distributed OS to achieve higher use or system transparency.

Table 1.6 compares the functionalities of these three distributed operating systems.

1.4.2.2 Amoeba versus DCE

DCE is a middleware-based system for Distributed Computing Environments.
The Amoeba was academically developed at Free University in the Netherlands.
The Open Software Foundation (OSF) has pushed the use of DCE for distributed computing.
However, the Amoeba, DCE, and MOSIX2 are still research prototypes that are primarily used in academia.
No successful commercial OS products followed these research systems.

1.4.2.3 MOSIX2 for Linux Clusters

MOSIX2 is a distributed OS, which runs with a virtualization layer in the Linux environment.
This layer provides a partial single-system image to user applications.
MOSIX2
- Supports both sequential and parallel applications,
- Discovers resources and
- Migrates software processes among Linux nodes.

MOSIX2 can manage a Linux cluster or a grid of multiple clusters.
Flexible management of a grid allows owners of clusters to share their computational resources among multiple cluster owners.

1.4.2.4 Transparency in Programming Environments

Figure 1.22 shows the concept of a transparent computing infrastructure for future computing platforms.
The user data, applications, OS, and hardware are separated into four levels.
Data is owned by users, independent of the applications.
The OS provides clear interfaces, standard programming interfaces, or system calls to application programmers.

In future cloud infrastructure,
- Hardware will be separated by standard interfaces from the OS.
- Users will be able to choose from different OSes on top of the hardware devices they prefer to use.
- Users can enable cloud applications as SaaS, to separate user data from specific application programs – hence, users can switch among different services.
- The data will not be bound to specific applications.

1.4.3 Parallel and Distributed Programming Models

Four programming models for distributed computing with expected scalable performance and application flexibility.
Table 1.7 summarizes three of these models, along with some software tool sets developed in recent years.

MPI is the most popular programming model for message-passing systems.
Google’s MapReduce and BigTable are for effective use of resources from Internet clouds and data centers.
Service clouds demand extending Hadoop, EC2, and S3 to facilitate distributed computing over distributed storage systems.

1.4.3.1 Message-Passing Interface (MPI)

This is the primary programming standard used to develop parallel and concurrent programs to run on a distributed system.
MPI is essentially a library of subprograms that can be called from C or FORTRAN to write parallel programs running on a distributed system.
The idea is to embody clusters, grid systems, and P2P systems with upgraded web services and utility computing applications.
Besides MPI, distributed programming can be also supported with low-level primitives such as the Parallel Virtual Machine (PVM).

1.4.3.2 MapReduce

This is a web programming model for scalable data processing on large clusters over large data sets.
The model is applied mainly in web-scale search and cloud computing applications.
The user specifies a Map function to generate a set of intermediate key/value pairs.
Then the user applies a Reduce function to merge all intermediate values with the same intermediate key.
MapReduce is highly scalable to explore high degrees of parallelism at different job levels.
A typical MapReduce computation process can handle terabytes of data on tens of thousands or more client machines.
Hundreds of MapReduce programs can be executed simultaneously.

1.4.3.3 Hadoop Library

Hadoop offers a software platform that was originally developed by a Yahoo! group.
The package enables users to write and run applications over vast amounts of distributed data.
Users can easily scale Hadoop to store and process petabytes of data in the web space.
Hadoop is
- Economical – comes with an open source version of MapReduce that minimizes overhead in task spawning and massive data communication.
- Efficient – Processes data with a high degree of parallelism across a large number of commodity nodes,
- Reliable – it automatically keeps multiple data copies to facilitate redeployment of computing tasks upon unexpected system failures.

1.4.3.4 Open Grid Services Architecture (OGSA)

The development of grid infrastructure is driven by large-scale distributed computing applications.
These applications must count on a high degree of resource and data sharing.
Table 1.8 introduces OGSA as a common standard for general public use of grid services.
Genesis II is a realization of OGSA.
Key features include
- a distributed execution environment,
- Public Key Infrastructure (PKI) services using a local certificate authority (CA),
- trust management, and
- security policies in grid computing.

1.4.3.5 Globus Toolkits and Extensions

Globus is a middleware library jointly developed by the U.S. Argonne National Laboratory and USC Information Science Institute over the past decade.
This library implements some of the OGSA standards for resource discovery, allocation, and security enforcement in a grid environment.
The Globus packages support multisite mutual authentication with PKI certificates.
The current version of Globus, GT 4, has been in use since 2008.
In addition, IBM has extended Globus for business applications.