| ACADEMIC COMPUTING and COMMUNICATIONS CENTER | |||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ARGO-NEW: General Information | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Jargon alert: There are a lot of terms in the argo-new pages that might be new to you; most are defined in the argo-new glossary. Links followed with an asterisk such as"the MPI* library" are to the term's definition in the glossary. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Overview-Beowulf clustering | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
A cluster is a group of PCs which are interconnected by one or more networks and are used for serial or parallel program execution. Clustering has been around for quite some time. In 1994 Donald Becker, a NASA researcher, invented a way to connect a group of inexpensive off-the-shelf PCs with special software to create a single system that could be scaled up to deliver supercomputer performance. The name assigned to that cluster: Beowulf. Now, the term Beowulf cluster generally refers to any cluster that loosely follows the Becker model. Parallel programming on a Beowulf-type cluster is accomplished by dividing a computation into parts and making use of multiple processes. Sometimes a single processor can be used for all the processes. Most complex problems, however, involve processes executing on separate processors. Processes coordinate their activities by explicitly sending and receiving messages. The most commonly used method of programming distributed-memory MIMD* systems is message passing. The MPI* library specification is the method for message passing. The UIC ACCC Beowulf cluster is called Argo-new; Internet domain name: argo-new.cc.uic.edu. Cluster Overview
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| The ACCC Beowulf | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Hardware
Details
Tigger and icarus vs. argo-newThough tigger, icarus and argo-new are MIMD* systems, tigger and icarus are shared memory multiprocessor systems whereas argo-new is a distributed memory multicomputer. Tigger/Icarus Tigger is a shared memory multiprocessor system (a conventional computer)
with multiple processors of the same type (symmetric) connected to multiple
memory modules. Access to physical memory and processors is shared among processes.
Syncronization among processes is accomplished by reading from and writing to
shared memory which is controlled by a locking mechanism (spinlocks and semaphores).
Control of the locks is the responsibility of the user. The path that connects
processors to physical memory is internal and transparent to users. The data
along with the executable code is contained in memory which is shared among
the processes. By using threads, portions of a single program may run on different
processors. Argo-new A distributed memory multicomputer system is a group of complete computers
that are connected to one another. Each of the constituent machines, called
nodes, is a self contained unit (has its own processor and local memory) capable
of acting independently
of the other machines in the group. The most common method of programming distributed-memory
systems is message passing which permits processes to share data and coordinate
their activities.
Computer Computer Computer Computer Computer
(Argo1-1) (Argo1-2) (Argo1-3) (Argo1-4) ... (Argo4-4)
+------------+ +------------+ +------------+ +------------+ +------------+
| +--------+ | | +--------+ | | +--------+ | | +--------+ | | +--------+ |
| |Processor | | |Processor | | |Processor | | |Processor | | |Processor |
| +--------+ | | +--------+ | | +--------+ | | +--------+ | | +--------+ |
| | | | | | | | | | | | | | |
| +--------+ | | +--------+ | | +--------+ | | +--------+ | | +--------+ |
| |Local | | | |Local | | | |Local | | | |Local | | | |Local | |
| |memory | | | |memory | | | |memory | | | |memory | | | |memory | |
| +--------+ | | +--------+ | | +--------+ | | +--------+ | | +--------+ |
+------------+ +------------+ +------------+ +------------+ +------------+
| | | | |
| | | | |
+-----------------------------------------------------------------------------+
| Ethernet network |
+-----------------------------------------------------------------------------+
| | | | |
| | | | |
(Argo5-1) (Argo5-2) (Argo5-3) (Argo5-4) ... (Argo8-4)
+------------+ +------------+ +------------+ +------------+ +------------+
| +--------+ | | +--------+ | | +--------+ | | +--------+ | | +--------+ |
| |Processor | | |Processor | | |Processor | | |Processor | | |Processor |
| +--------+ | | +--------+ | | +--------+ | | +--------+ | | +--------+ |
| | | | | | | | | | | | | | |
| +--------+ | | +--------+ | | +--------+ | | +--------+ | | +--------+ |
| |Local | | | |Local | | | |Local | | | |Local | | | |Local | |
| |memory | | | |memory | | | |memory | | | |memory | | | |memory | |
| +--------+ | | +--------+ | | +--------+ | | +--------+ | | +--------+ |
+------------+ +------------+ +------------+ +------------+ +------------+
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Master Node | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
OverviewClients log into the master (also called the front-end), create/edit/compile programs, and submit them for execution on one or more of the compute nodes. Clients do not run their programs on the master. Repeat: user programs MUST NOT be run on the master . Instead, they are submitted to the compute nodes for execution. Hardware
Operating System
Master Node FilesystemsThere are two types of filesystems on argo:
A ext3 filesystem* (Linux 2nd EXTended) is one constructed on disk that is physically attached to the machine. Some portion of or all of the local disk is used to contruct a partition which becomes the basis for the filesystem. An NFS filesystem* (Network FileSystem) is a filesystem that is local to one machine, but made to appear local to other machines by accessing it over a network.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Compute Nodes | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
OverviewThe compute nodes have ONE AND ONLY ONE purpose: run user programs. Clients do not have login access to the compute nodes. From the master node, clients submit programs to torque which executes them on one or more compute nodes. While clients are prohibited from logging into compute nodes, they may issue remote shell commands from the master to retrieve information. For example:
12:23:54 up 83 days, 1:03, 0 users, load average: 0.00, 0.00, 0.00 Hardware
Operating System
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Home Server | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
OverviewUser home directories on the master node are physically located on a separate machine whose sole purpose is to act as a file server. There are four disk partitions of 300GB each on the server. Each partition is NFS-mounted to the master and all compute nodes:
To see which filesystem your home directory is in, type: echo $HOME Hardware
Operating System
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Network | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
OverviewIntranodal communication (communication among the nodes) is more demanding than nodal communication with the external environment. Each node may need to interact with other nodes, either in concert or independently. The purpose of the communcation may be:
EthernetEach node, master and compute, is connected to an ethernet network and the NIC within a compute node is gigabit. The ethernet NIC may be used for interprocess communication using vanilla MPI as well as for out-of-band-management (basically anything - NIS, NFS, etc - except interprocess communication):
+-----------+ Master Switch Switch Switch Switch
| Outside | +----------+ +---------+ +---------+ +---------+ +---------+
| World | <--> eth1 eth0 <-->| | | | | | <--> | | | | | | <--> | | | | | | <--> | | | | | |
+-----------+ +----------+ +---------+ +---------+ +---------+ +---------+
| | | |
| | | |
V V V V
+---------+ +---------+ +---------+ +---------+
| argo4-4 | | argo8-4 | | argo12-4| | argo16-4|
+---------+ +---------+ +---------+ +---------+
| argo4-3 | | argo8-3 | | argo12-3| | argo16-3|
+---------+ +---------+ +---------+ +---------+
| ... | | ... | | ... | | ... |
+---------+ +---------+ +---------+ +---------+
| argo1-1 | | argo5-1 | | argo9-1 | | argo13-1|
+---------+ +---------+ +---------+ +---------+
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Argo-new Compute Cluster | Previous: Contents | Next: Getting Started |
| 2007-11-13 ACCC Systems Group |
|