SOFTWARE ENVIRONMENTS FOR DISTRIBUTED SYSTEMS AND CLOUDS

SOFTWARE ENVIRONMENTS FOR DISTRIBUTED SYSTEMS AND CLOUDS

 

  • Popular software environments for using distributed and cloud computing systems.

 

1.4.1 Service-Oriented Architecture (SOA)

  • In grids/web services an entity is a service, in Java an entity is a Java object, and in CORBA an entity is a CORBA distributed object in a variety of languages.
  • These architectures build on the traditional seven Open Systems Interconnection (OSI) layers that provide the base networking abstractions.
  • On top of this we have a base software environment, which would be
    • .NET or Apache Axis for web services,
    • the Java Virtual Machine for Java, and
    • a broker network for CORBA.
  • On top of this base environment one would build a higher level environment reflecting the special features of the distributed computing environment.
  • This starts with entity interfaces and inter-entity communication, which rebuild the top four OSI layers but at the entity level and not the bit level.
  • Figure 1.20 shows the layered architecture for distributed entities used in web services and grid systems.

1.4.1.1 Layered Architecture for Web Services and Grids

 

  • The entity interfaces correspond to the
    • Web Services Description Language (WSDL),
    • Java method, and
    • CORBA interface definition language (IDL) specifications.

 

  • These interfaces are linked with customized, high-level communication systems: SOAP, RMI, and IIOP.

 

  • These communication systems support features like
    • message patterns (such as Remote Procedure Call or RPC),
      • built on message-oriented middleware (enterprise bus) infrastructure
      • provide rich functionality and support virtualization of routing, senders, and recipients

 

 

 

  • fault recovery and specialized routing
    • abstractions (such as messages versus packets, virtualized addressing)
    • security – such as Internet Protocol Security (IPsec) and secure sockets in the OSI layers
    • higher level services for registries, metadata, and management of the entities

 

  • discovery and information services – example
    • CORBA Trading Service,
    • UDDI (Universal Description, Discovery, and Integration),
    • LDAP (Lightweight Directory Access Protocol), and
    • ebXML (Electronic Business using eXtensible Markup Language)

 

  • Management services include service state and lifetime support – example
    • CORBA Life Cycle and Persistent states,
    • the different Enterprise JavaBeans models,
    • Jini’s lifetime model, and
    • a suite of web services specifications

 

 

 

1.4.1.2 Web Services and Tools

 

  • Loose coupling and support of heterogeneous implementations make services more attractive than distributed objects.
  • two choices of service architecture:
    • web services or REST systems.

 

  • Both web services and REST systems have very distinct approaches to building reliable interoperable systems.

 

  • Web services approach
    • Aims to fully specify all aspects of the service and its environment.
    • This specification is carried with communicated messages using Simple Object Access Protocol (SOAP).
    • The hosting environment then becomes a universal distributed operating system with fully distributed capability carried by SOAP messages.

 

  • REST approach,
    • Delegates most of the difficult problems to application (implementation-specific) software.
    • In a web services language, REST has minimal information in the header, and the message body carries all the needed information.

 

 

 

1.4.1.3 The Evolution of SOA

 

 

  • As shown in Figure 1.21, service-oriented architecture (SOA) has evolved over the years.
  • SOA applies to building grids, clouds, grids of clouds, clouds of grids, clouds of clouds (also known as interclouds), and systems of systems in general.
  • A large number of sensors provide data-collection services, denoted in the figure as SS (sensor service).
    • A sensor can be a ZigBee device, a Bluetooth device, a WiFi access point, a personal computer, a GPA, or a wireless phone, among other things.

 

 

  • Raw data is collected by sensor services.
  • All the SS devices interact with large or small computers, many forms of grids, databases, the compute cloud, the storage cloud, the filter cloud, the discovery cloud, and so on.
  • Filter services (fs in the figure) are used to eliminate unwanted raw data, in order to respond to specific requests from the web, the grid, or web services.
  • A collection of filter services forms a filter cloud.
  • SOA aims to search for, or sort out, the useful data from the massive amounts of raw data items.
  • Processing this data will generate useful information, and subsequently, the knowledge for our daily use.
  • Finally, we make intelligent decisions based on both biological and machine wisdom.

 

  • Most distributed systems require a web interface or portal.
  • Raw data is collected by a large number of sensors
  • This raw data stream is passed through a sequence of compute, storage, filter, and discovery clouds to transform it into useful information or knowledge.
  • The inter-service messages converge at the portal, which is accessed by all users.

 

 

 

1.4.1.4 Grids versus Clouds

 

  • The boundary between grids and clouds are getting blurred in recent years.
  • For web services, workflow technologies are used to coordinate or orchestrate services with certain specifications used to define critical business process models such as two-phase transactions.
    • The BPEL Web Service standard is the general approach used in workflow
    • Other important workflow approaches include Pegasus, Taverna, Kepler, Trident, and Swift.

 

  • In all approaches, one is building a collection of services – which together tackle all or part of a distributed computing problem.

 

  • In general, a grid system applies static resources, while a cloud emphasizes elastic resources.
  • For some researchers, the differences between grids and clouds are limited only in dynamic resource allocation based on virtualization and autonomic computing.

 

 

  • One can build a grid out of multiple clouds.
  • This type of grid can do a better job than a pure cloud, because it can explicitly support negotiated resource allocation.
  • Thus one may end up building with a system of systems: such as a cloud of clouds, a grid of clouds, or a cloud of grids, or inter-clouds as a basic SOA architecture.

 

 

1.4.2 Trends toward Distributed Operating Systems

 

  • The computers in most distributed systems are loosely coupled.
  • Thus, a distributed system inherently has multiple system images.
  • All node machines run with an independent operating system.
  • The distributed OS –
    • Promotes resource sharing
    • Fast communication among node machines
      • message passing and RPCs for internode communications
    • Manages all resources coherently and efficiently.
    • Improves the performance, efficiency, and flexibility of distributed applications.

 

 

1.4.2.1 Distributed Operating Systems

  • Three approaches for distributing resource management functions in a distributed computer system.
    • The first approach is to build a network OS over a large number of heterogeneous OS platforms.
    • The second approach is to develop middleware to offer a limited degree of resource sharing.
    • The third approach is to develop a truly distributed OS to achieve higher use or system transparency.

 

  • Table 1.6 compares the functionalities of these three distributed operating systems.

 

 

1.4.2.2 Amoeba versus DCE

  • DCE is a middleware-based system for Distributed Computing Environments.
  • The Amoeba was academically developed at Free University in the Netherlands.
  • The Open Software Foundation (OSF) has pushed the use of DCE for distributed computing.
  • However, the Amoeba, DCE, and MOSIX2 are still research prototypes that are primarily used in academia.
  • No successful commercial OS products followed these research systems.

 

1.4.2.3 MOSIX2 for Linux Clusters

 

  • MOSIX2 is a distributed OS, which runs with a virtualization layer in the Linux environment.
  • This layer provides a partial single-system image to user applications.
  • MOSIX2
    • Supports both sequential and parallel applications,
    • Discovers resources and
    • Migrates software processes among Linux nodes.

 

  • MOSIX2 can manage a Linux cluster or a grid of multiple clusters.
  • Flexible management of a grid allows owners of clusters to share their computational resources among multiple cluster owners.

 

 

1.4.2.4 Transparency in Programming Environments

 

 

  • Figure 1.22 shows the concept of a transparent computing infrastructure for future computing platforms.
  • The user data, applications, OS, and hardware are separated into four levels.
  • Data is owned by users, independent of the applications.
  • The OS provides clear interfaces, standard programming interfaces, or system calls to application programmers.

 

  • In future cloud infrastructure,
    • Hardware will be separated by standard interfaces from the OS.
    • Users will be able to choose from different OSes on top of the hardware devices they prefer to use.
    • Users can enable cloud applications as SaaS, to separate user data from specific application programs – hence, users can switch among different services.
    • The data will not be bound to specific applications.

1.4.3 Parallel and Distributed Programming Models

  • Four programming models for distributed computing with expected scalable performance and application flexibility.
  • Table 1.7 summarizes three of these models, along with some software tool sets developed in recent years.

 

 

  • MPI is the most popular programming model for message-passing systems.
  • Google’s MapReduce and BigTable are for effective use of resources from Internet clouds and data centers.
  • Service clouds demand extending Hadoop, EC2, and S3 to facilitate distributed computing over distributed storage systems.

 

 

 

 

1.4.3.1 Message-Passing Interface (MPI)

  • This is the primary programming standard used to develop parallel and concurrent programs to run on a distributed system.
  • MPI is essentially a library of subprograms that can be called from C or FORTRAN to write parallel programs running on a distributed system.
  • The idea is to embody clusters, grid systems, and P2P systems with upgraded web services and utility computing applications.
  • Besides MPI, distributed programming can be also supported with low-level primitives such as the Parallel Virtual Machine (PVM).

 

 

1.4.3.2 MapReduce

  • This is a web programming model for scalable data processing on large clusters over large data sets.
  • The model is applied mainly in web-scale search and cloud computing applications.
  • The user specifies a Map function to generate a set of intermediate key/value pairs.
  • Then the user applies a Reduce function to merge all intermediate values with the same intermediate key.
  • MapReduce is highly scalable to explore high degrees of parallelism at different job levels.
  • A typical MapReduce computation process can handle terabytes of data on tens of thousands or more client machines.
  • Hundreds of MapReduce programs can be executed simultaneously.

 

 

1.4.3.3 Hadoop Library

  • Hadoop offers a software platform that was originally developed by a Yahoo! group.
  • The package enables users to write and run applications over vast amounts of distributed data.
  • Users can easily scale Hadoop to store and process petabytes of data in the web space.
  • Hadoop is
    • Economical – comes with an open source version of MapReduce that minimizes overhead in task spawning and massive data communication.
    • Efficient – Processes data with a high degree of parallelism across a large number of commodity nodes,
    • Reliable – it automatically keeps multiple data copies to facilitate redeployment of computing tasks upon unexpected system failures.

 

 

1.4.3.4 Open Grid Services Architecture (OGSA)

  • The development of grid infrastructure is driven by large-scale distributed computing applications.
  • These applications must count on a high degree of resource and data sharing.
  • Table 1.8 introduces OGSA as a common standard for general public use of grid services.
  • Genesis II is a realization of OGSA.
  • Key features include
    • a distributed execution environment,
    • Public Key Infrastructure (PKI) services using a local certificate authority (CA),
    • trust management, and
    • security policies in grid computing.

 

1.4.3.5 Globus Toolkits and Extensions

  • Globus is a middleware library jointly developed by the U.S. Argonne National Laboratory and USC Information Science Institute over the past decade.
  • This library implements some of the OGSA standards for resource discovery, allocation, and security enforcement in a grid environment.
  • The Globus packages support multisite mutual authentication with PKI certificates.
  • The current version of Globus, GT 4, has been in use since 2008.
  • In addition, IBM has extended Globus for business applications.