SOFTWARE ENVIRONMENTS FOR DISTRIBUTED SYSTEMS AND CLOUDS
- Popular software environments for using distributed and cloud computing systems.
1.4.1 Service-Oriented Architecture (SOA)
- In grids/web services an entity is a service, in Java an entity is a Java object, and in CORBA an entity is a CORBA distributed object in a variety of languages.
- These architectures build on the traditional seven Open Systems Interconnection (OSI) layers that provide the base networking abstractions.
- On top of this we have a base software environment, which would be
- .NET or Apache Axis for web services,
- the Java Virtual Machine for Java, and
- a broker network for CORBA.
- On top of this base environment one would build a higher level environment reflecting the special features of the distributed computing environment.
- This starts with entity interfaces and inter-entity communication, which rebuild the top four OSI layers but at the entity level and not the bit level.
- Figure 1.20 shows the layered architecture for distributed entities used in web services and grid systems.
1.4.1.1 Layered Architecture for Web Services and Grids
- The entity interfaces correspond to the
- Web Services Description Language (WSDL),
- Java method, and
- CORBA interface definition language (IDL) specifications.
- These interfaces are linked with customized, high-level communication systems: SOAP, RMI, and IIOP.
- These communication systems support features like
- message patterns (such as Remote Procedure Call or RPC),
- built on message-oriented middleware (enterprise bus) infrastructure
- provide rich functionality and support virtualization of routing, senders, and recipients
- message patterns (such as Remote Procedure Call or RPC),
- fault recovery and specialized routing
- abstractions (such as messages versus packets, virtualized addressing)
- security – such as Internet Protocol Security (IPsec) and secure sockets in the OSI layers
- higher level services for registries, metadata, and management of the entities
- discovery and information services – example
- CORBA Trading Service,
- UDDI (Universal Description, Discovery, and Integration),
- LDAP (Lightweight Directory Access Protocol), and
- ebXML (Electronic Business using eXtensible Markup Language)
- Management services include service state and lifetime support – example
- CORBA Life Cycle and Persistent states,
- the different Enterprise JavaBeans models,
- Jini’s lifetime model, and
- a suite of web services specifications
1.4.1.2 Web Services and Tools
- Loose coupling and support of heterogeneous implementations make services more attractive than distributed objects.
- two choices of service architecture:
- web services or REST systems.
- Both web services and REST systems have very distinct approaches to building reliable interoperable systems.
- Web services approach
- Aims to fully specify all aspects of the service and its environment.
- This specification is carried with communicated messages using Simple Object Access Protocol (SOAP).
- The hosting environment then becomes a universal distributed operating system with fully distributed capability carried by SOAP messages.
- REST approach,
- Delegates most of the difficult problems to application (implementation-specific) software.
- In a web services language, REST has minimal information in the header, and the message body carries all the needed information.
1.4.1.3 The Evolution of SOA
- As shown in Figure 1.21, service-oriented architecture (SOA) has evolved over the years.
- SOA applies to building grids, clouds, grids of clouds, clouds of grids, clouds of clouds (also known as interclouds), and systems of systems in general.
- A large number of sensors provide data-collection services, denoted in the figure as SS (sensor service).
- A sensor can be a ZigBee device, a Bluetooth device, a WiFi access point, a personal computer, a GPA, or a wireless phone, among other things.
- Raw data is collected by sensor services.
- All the SS devices interact with large or small computers, many forms of grids, databases, the compute cloud, the storage cloud, the filter cloud, the discovery cloud, and so on.
- Filter services (fs in the figure) are used to eliminate unwanted raw data, in order to respond to specific requests from the web, the grid, or web services.
- A collection of filter services forms a filter cloud.
- SOA aims to search for, or sort out, the useful data from the massive amounts of raw data items.
- Processing this data will generate useful information, and subsequently, the knowledge for our daily use.
- Finally, we make intelligent decisions based on both biological and machine wisdom.
- Most distributed systems require a web interface or portal.
- Raw data is collected by a large number of sensors
- This raw data stream is passed through a sequence of compute, storage, filter, and discovery clouds to transform it into useful information or knowledge.
- The inter-service messages converge at the portal, which is accessed by all users.
1.4.1.4 Grids versus Clouds
- The boundary between grids and clouds are getting blurred in recent years.
- For web services, workflow technologies are used to coordinate or orchestrate services with certain specifications used to define critical business process models such as two-phase transactions.
- The BPEL Web Service standard is the general approach used in workflow
- Other important workflow approaches include Pegasus, Taverna, Kepler, Trident, and Swift.
- In all approaches, one is building a collection of services – which together tackle all or part of a distributed computing problem.
- In general, a grid system applies static resources, while a cloud emphasizes elastic resources.
- For some researchers, the differences between grids and clouds are limited only in dynamic resource allocation based on virtualization and autonomic computing.
- One can build a grid out of multiple clouds.
- This type of grid can do a better job than a pure cloud, because it can explicitly support negotiated resource allocation.
- Thus one may end up building with a system of systems: such as a cloud of clouds, a grid of clouds, or a cloud of grids, or inter-clouds as a basic SOA architecture.
1.4.2 Trends toward Distributed Operating Systems
- The computers in most distributed systems are loosely coupled.
- Thus, a distributed system inherently has multiple system images.
- All node machines run with an independent operating system.
- The distributed OS –
- Promotes resource sharing
- Fast communication among node machines
- message passing and RPCs for internode communications
- Manages all resources coherently and efficiently.
- Improves the performance, efficiency, and flexibility of distributed applications.
1.4.2.1 Distributed Operating Systems
- Three approaches for distributing resource management functions in a distributed computer system.
- The first approach is to build a network OS over a large number of heterogeneous OS platforms.
- The second approach is to develop middleware to offer a limited degree of resource sharing.
- The third approach is to develop a truly distributed OS to achieve higher use or system transparency.
- Table 1.6 compares the functionalities of these three distributed operating systems.
1.4.2.2 Amoeba versus DCE
- DCE is a middleware-based system for Distributed Computing Environments.
- The Amoeba was academically developed at Free University in the Netherlands.
- The Open Software Foundation (OSF) has pushed the use of DCE for distributed computing.
- However, the Amoeba, DCE, and MOSIX2 are still research prototypes that are primarily used in academia.
- No successful commercial OS products followed these research systems.
1.4.2.3 MOSIX2 for Linux Clusters
- MOSIX2 is a distributed OS, which runs with a virtualization layer in the Linux environment.
- This layer provides a partial single-system image to user applications.
- MOSIX2
- Supports both sequential and parallel applications,
- Discovers resources and
- Migrates software processes among Linux nodes.
- MOSIX2 can manage a Linux cluster or a grid of multiple clusters.
- Flexible management of a grid allows owners of clusters to share their computational resources among multiple cluster owners.
1.4.2.4 Transparency in Programming Environments
- Figure 1.22 shows the concept of a transparent computing infrastructure for future computing platforms.
- The user data, applications, OS, and hardware are separated into four levels.
- Data is owned by users, independent of the applications.
- The OS provides clear interfaces, standard programming interfaces, or system calls to application programmers.
- In future cloud infrastructure,
- Hardware will be separated by standard interfaces from the OS.
- Users will be able to choose from different OSes on top of the hardware devices they prefer to use.
- Users can enable cloud applications as SaaS, to separate user data from specific application programs – hence, users can switch among different services.
- The data will not be bound to specific applications.
1.4.3 Parallel and Distributed Programming Models
- Four programming models for distributed computing with expected scalable performance and application flexibility.
- Table 1.7 summarizes three of these models, along with some software tool sets developed in recent years.
- MPI is the most popular programming model for message-passing systems.
- Google’s MapReduce and BigTable are for effective use of resources from Internet clouds and data centers.
- Service clouds demand extending Hadoop, EC2, and S3 to facilitate distributed computing over distributed storage systems.
1.4.3.1 Message-Passing Interface (MPI)
- This is the primary programming standard used to develop parallel and concurrent programs to run on a distributed system.
- MPI is essentially a library of subprograms that can be called from C or FORTRAN to write parallel programs running on a distributed system.
- The idea is to embody clusters, grid systems, and P2P systems with upgraded web services and utility computing applications.
- Besides MPI, distributed programming can be also supported with low-level primitives such as the Parallel Virtual Machine (PVM).
1.4.3.2 MapReduce
- This is a web programming model for scalable data processing on large clusters over large data sets.
- The model is applied mainly in web-scale search and cloud computing applications.
- The user specifies a Map function to generate a set of intermediate key/value pairs.
- Then the user applies a Reduce function to merge all intermediate values with the same intermediate key.
- MapReduce is highly scalable to explore high degrees of parallelism at different job levels.
- A typical MapReduce computation process can handle terabytes of data on tens of thousands or more client machines.
- Hundreds of MapReduce programs can be executed simultaneously.
1.4.3.3 Hadoop Library
- Hadoop offers a software platform that was originally developed by a Yahoo! group.
- The package enables users to write and run applications over vast amounts of distributed data.
- Users can easily scale Hadoop to store and process petabytes of data in the web space.
- Hadoop is
- Economical – comes with an open source version of MapReduce that minimizes overhead in task spawning and massive data communication.
- Efficient – Processes data with a high degree of parallelism across a large number of commodity nodes,
- Reliable – it automatically keeps multiple data copies to facilitate redeployment of computing tasks upon unexpected system failures.
1.4.3.4 Open Grid Services Architecture (OGSA)
- The development of grid infrastructure is driven by large-scale distributed computing applications.
- These applications must count on a high degree of resource and data sharing.
- Table 1.8 introduces OGSA as a common standard for general public use of grid services.
- Genesis II is a realization of OGSA.
- Key features include
- a distributed execution environment,
- Public Key Infrastructure (PKI) services using a local certificate authority (CA),
- trust management, and
- security policies in grid computing.
1.4.3.5 Globus Toolkits and Extensions
- Globus is a middleware library jointly developed by the U.S. Argonne National Laboratory and USC Information Science Institute over the past decade.
- This library implements some of the OGSA standards for resource discovery, allocation, and security enforcement in a grid environment.
- The Globus packages support multisite mutual authentication with PKI certificates.
- The current version of Globus, GT 4, has been in use since 2008.
- In addition, IBM has extended Globus for business applications.