Server Admin 10.4 Help

About Xgrid

Xgrid, a technology in Mac OS X Server and Mac OS X, simplifies deployment and management of computational grids. Xgrid enables administrators to group computers into grids or clusters, and allows users to easily submit complex computations to groups of computers (local, remote, or both), as either an ad hoc grid or a centrally managed cluster.

Xgrid Terminology

The Xgrid technology uses specific terms for its components and operations. These include:

  • Grid:  a group of computers that can collaboratively complete a job using the Xgrid technology in Mac OS X Server and Mac OS X.
  • Job:  a set of work submitted into a grid from the client to the controller.
  • Task:  a part of a job that one agent in the grid performs at one time.
  • Controller:  an Xgrid controller manages the grid and its work. It is built into Mac OS X Server.
  • Agent:  an Xgrid agent resides on one computer in a grid and runs tasks sent to it by the controller. Any computer running Mac OS X v10.3 or v10.4 can run the Xgrid agent.
  • Client:  any computer running Mac OS X v10.4 or Mac OS X Server v10.4 that submits a job to an Xgrid controller.

Xgrid Basics

Xgrid creates multiple tasks for each job and distributes those tasks among multiple nodes. These nodes can be desktop computers running Mac OS X v10.4 or v10.3, or systems running Mac OS X Server v10.4. Many desktop computers sit idle during the day, most evenings, and on weekends. The assembly of these systems into a computational grid is called "desktop recovery." This method of grid construction allows you to vastly improve your computational capacity without purchasing additional hardware, and Xgrid makes the software configuration a straightforward task.

For a server to function as a controller, Xgrid requires Mac OS X Server v10.4 or later, with a minimum of 256 MB of RAM. To operate as an agent in a grid, Xgrid requires Mac OS X v10.3 or later, with a minimum of 128 MB of RAM (256 MB recommended). All Xgrid participants must have a network connection. As always, the more RAM a system has, the better it will perform, particularly for high-performance computing applications.

A computational grid is a group of computers working together to solve a single problem. The systems in a grid can be loosely coupled, geographically dispersed, and, to some extent, heterogeneous. In contrast, systems in a cluster are often homogeneous, collocated, and strictly managed. Highly dispersed grids, such as SETI@Home, allow individuals to donate their spare processor cycles to a cause. In office environments, large rendering or simulation jobs may be distributed across all the systems left idle overnight. These can even be used to augment a dedicated computational cluster, which is available to Xgrid clients at all times. These three distinct grid configurations are discussed in more detail later in this chapter.

Xgrid has no real limitations on the amount of computational power it can support. The performance of the grid is dependent on the systems participating, the software running, and the network, among other factors. Individual applications, however, will strongly influence the performance of the grid. It is up to the user to determine how amenable a particular application is to being deployed on a computational grid. In the best case, application performance may scale linearly with the size of the grid. In the worst case, the addition of agents to a grid may cause the given job to complete in even more time than if there had been fewer agents. (In such a situation, tasks become so small that the overhead associated with distributing the increased number of tasks supersedes the performance gain of using more agents.) Users of the grid should be aware of these considerations.

There are many proprietary projects that allow you to participate in a large computational grid. Often these projects, such as SETI@Home and FightAids@home, are tied to a specific scientific purpose. They often have easy-to-install software that enables any volunteer to participate in that particular project, and they often take the form of a screen saver or background process.

But you don't need to think in terms of thousands or millions of seldom-used computers to see the significance of a computational grid. For example, computers used by university students and corporate employees often "work" fewer hours than the hours they sit idle at night or on weekends. These computers could contribute productively to the work of a grid without diminishing their usefulness to the students and employees.

Other grid projects are designed for large-scale computational grids, such as the Globus Alliance (a group founded by universities and researchers), with flexible resource management tools and more intelligent grid deployment methods. Instead of developing neatly packaged applications for a specific grid, such projects provide comprehensive frameworks for application deployment.

Xgrid allows users to participate in a computational grid of their choice, while still providing the flexibility of a more generic framework to grid developers in deploying their grid applications. Xgrid provides the primary benefits of both.

  • Easy grid configuration and deployment.
  • Straightforward yet flexible job submission.
  • Automatic controller discovery by both agents and clients.
  • Flexible architecture based on open standards.
  • Support for the UNIX security model including Kerberos single sign-on or regular password authentication.
  • Choice between a command-line interface or an API-based model for grid interaction.

Three Ways to Use Xgrid

Xgrid can be used in tightly coupled clusters, worldwide grids, and everything in between. This immense flexibility allows you to deploy grids of almost any nature. Three main topologies are commonly used, however, and these cover the vast majority of Xgrid deployments. They are:

  • Xgrid clusters
  • Local grids
  • Distributed grids

Xgrid Clusters

Computational clusters are sets of systems entirely dedicated to computation. In a cluster, systems are typically collocated in a rack, connected via gigabit Ethernet or another high-performance network, and strictly managed for maximum performance. Cluster systems are often entirely homogeneous; their operating systems are the same versions, they have the same software installed, and they generally have the same processor, disk, and RAM configurations.

Xgrid allows administrators to easily configure the distributed resource management functionality of the cluster. Each server in the system runs the agent software, and the head node in the cluster runs the controller software. Xgrid distributes tasks across the cluster. In clusters, failure rates are generally very low. Systems are rarely, if ever, offline, and their resources are not shared with general user tasks. Clusters are the most efficient and most expensive model of distributed computing.

Local Grids

Systems that are under common administration in a company, university computer lab, or other managed environment can often be easily assembled into a grid for desktop recovery. Because these systems are often on a local area network, and because they are generally managed by a single organization, they provide the ability for good network performance and substantial manageability.

Because these systems are often also used as day-to-day workstations, users can easily interrupt grid tasks by moving the mouse, resetting the system, or even accidentally disconnecting the system from the network. In such cases, a task may fail as part of an Xgrid job. The Xgrid controller will eventually reassign the failed task to another agent, and the job will complete successfully. In local grids, performance is limited by such situations, as well as by the varying performance of any given agent on the grid.

Distributed Grids

The Xgrid agent allows a user to specify any IP address or host name for its desired controller. By specifying a grid, a user can dedicate his or her CPU time to that grid no matter where the controller is located. When any system is permitted to donate its time, a distributed grid is formed. The manager of the controller has no direct management control or knowledge of the agent system, but is nonetheless able to harness its CPU time.

Distributed grids have very high failure rates for jobs, but very low administrative burden for the grid administrator. With very, very large jobs, high task failure rates may not substantially affect the performance of the grid if they can be rapidly reassigned to another available agent. Network performance may also be a consideration, as data will be sent over the Internet, rather than over a local network, to agents connected to a grid. The monetary cost of such distributed grids is extremely low.

Keywords: khelp ksa