| Academic Computing and Communications Center | ||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ARGO: General Information |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Jargon alert: There are a lot of terms in the argo pages that might be new
to you; most are defined in the argo glossary. Links followed with an asterisk
such as"the MPI* library" are to the
term's definition in the glossary. Most of what is presented here is not needed for you to use the cluster. Some folks are interested in the system details and that is why it is made available. But, for most users, it is unnecessary and may be skipped. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Overview-Beowulf clustering | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
A cluster is a group of PCs which are interconnected by one or more networks and are used for serial or parallel program execution. Clustering has been around for quite some time. In 1994 Donald Becker, a NASA researcher, invented a way to connect a group of inexpensive off-the-shelf PCs with special software to create a single system that could be scaled up to deliver supercomputer performance. The name assigned to that cluster: Beowulf. Now, the term Beowulf cluster generally refers to any cluster that loosely follows the Becker model. Parallel programming on a Beowulf-type cluster is accomplished by dividing a computation into parts and making use of multiple processes. Sometimes a single processor can be used for all the processes. Most complex problems, however, involve processes executing on separate processors. Processes coordinate their activities by explicitly sending and receiving messages. The most commonly used method of programming distributed-memory MIMD* systems is message passing. The MPI* library specification is the method for message passing. The UIC ACCC Beowulf cluster is called Argo; Internet domain name: argo.cc.uic.edu. Cluster Overview
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| The ACCC Beowulf | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Hardware
Details
Tigger vs. argoThough tigger and argo are MIMD* systems, tigger is a shared memory multiprocessor systems whereas argo is a distributed memory multicomputer. Tigger Tigger is a shared memory multiprocessor system (a conventional computer)
with multiple processors of the same type (symmetric) connected to multiple
memory modules. Like your PC or laptop - just bigger. Access to physical memory and processors is shared among processes.
Syncronization among processes is accomplished by reading from and writing to
shared memory which is controlled by a locking mechanism (spinlocks and semaphores).
The path that connects processors to physical memory is internal and transparent to users. The data
along with the executable code is contained in memory which is shared among
the processes. By using threads, portions of a single program may run on different
processors. Argo A distributed memory multicomputer system is a group of complete computers
that are connected to one another. Each of the constituent machines
is a self contained unit (has its own processors and memory) capable
of acting independently
of the other machines in the group. The most common method of programming distributed-memory
systems is message passing which permits processes to share data and coordinate
their activities.
Computer Computer Computer Computer Computer
(Argo1-1) (Argo1-2) (Argo1-3) (Argo1-4) ... (Argo4-4)
+------------+ +------------+ +------------+ +------------+ +------------+
| +--------+ | | +--------+ | | +--------+ | | +--------+ | | +--------+ |
| |Processor | | |Processor | | |Processor | | |Processor | | |Processor |
| +--------+ | | +--------+ | | +--------+ | | +--------+ | | +--------+ |
| | | | | | | | | | | | | | |
| +--------+ | | +--------+ | | +--------+ | | +--------+ | | +--------+ |
| |Local | | | |Local | | | |Local | | | |Local | | | |Local | |
| |memory | | | |memory | | | |memory | | | |memory | | | |memory | |
| +--------+ | | +--------+ | | +--------+ | | +--------+ | | +--------+ |
+------------+ +------------+ +------------+ +------------+ +------------+
| | | | |
| | | | |
+-----------------------------------------------------------------------------+
| Private ethernet network |
+-----------------------------------------------------------------------------+
| | | | |
| | | | |
(Argo5-1) (Argo5-2) (Argo5-3) (Argo5-4) ... (Argo8-4)
+------------+ +------------+ +------------+ +------------+ +------------+
| +--------+ | | +--------+ | | +--------+ | | +--------+ | | +--------+ |
| |Processor | | |Processor | | |Processor | | |Processor | | |Processor |
| +--------+ | | +--------+ | | +--------+ | | +--------+ | | +--------+ |
| | | | | | | | | | | | | | |
| +--------+ | | +--------+ | | +--------+ | | +--------+ | | +--------+ |
| |Local | | | |Local | | | |Local | | | |Local | | | |Local | |
| |memory | | | |memory | | | |memory | | | |memory | | | |memory | |
| +--------+ | | +--------+ | | +--------+ | | +--------+ | | +--------+ |
+------------+ +------------+ +------------+ +------------+ +------------+
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Master Node | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
OverviewClients log into the master (also called the front-end), create/edit/compile programs, and submit them for execution on one or more of the compute nodes. Clients do not run their programs on the master. Repeat: user programs MUST NOT be run on the master . Instead, they are submitted to the compute nodes for execution. Hardware
Operating System
Master Node FilesystemsThere are two types of filesystems on argo:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Compute Nodes | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
OverviewThe compute nodes have ONE AND ONLY ONE purpose: run user programs. Clients do not have login access to the compute nodes. From the master node, clients submit programs to torque which executes them on one or more compute nodes. While clients are prohibited from logging into compute nodes, they may issue remote shell commands from the master to retrieve information. For example:
12:23:54 up 83 days, 1:03, 0 users, load average: 0.00, 0.00, 0.00 HardwareOperating System
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Home Server | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
OverviewUser home directories are physically located on a separate machine whose sole purpose is to act as a file server. There are four disk partitions of 1.2GB each on the server. Total available space: approximately 4.8TB. The partitions are fiber-attached to the master and NFS-mounted to all compute nodes:
When your account was created, the system created a directory (your home space) in ONE of the four home filesystems. To see which one, type: echo $HOMEFor user jsmith:
jsmith $ echo $HOME /home/homes50/jsmith |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Scratch Server | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
User scratch directories are physically located on a separate machine that acts
as a file server for scratch space. There is single partition of 770GB (soon to
be expanded to 4TB). The scratch server is NFS-mounted to the master and all compute nodes.
When your account was created, the system also made a directory for you in the scratch filesystem. The location is:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Network | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
OverviewIntranodal communication (communication among the nodes) is more demanding than nodal communication with the external environment. Each node may need to interact with other nodes, either in concert or independently. The purpose of the communcation may be:
EthernetEach node, master and compute, is connected to an ethernet network and the NIC within a compute node is gigabit (with jumbo frames). The ethernet NIC may be used for interprocess communication using vanilla MPI as well as for out-of-band-management (basically anything - NIS, NFS, etc - except interprocess communication):
+-----------+ Master Switch Switch Switch Switch
| Outside | +----------+ +---------+ +---------+ +---------+ +---------+
| World | <--> eth1 eth0 <-->| | | | | | <--> | | | | | | <--> | | | | | | <--> | | | | | |
+-----------+ +----------+ +---------+ +---------+ +---------+ +---------+
| | | |
| | | |
V V V V
+---------+ +---------+ +---------+ +---------+
| argo4-4 | | argo8-4 | | argo12-4| | argo14-4|
+---------+ +---------+ +---------+ +---------+
| argo4-3 | | argo8-3 | | argo12-3| | argo14-3|
+---------+ +---------+ +---------+ +---------+
| ... | | ... | | ... | | ... |
+---------+ +---------+ +---------+ +---------+
| argo1-1 | | argo5-1 | | argo9-1 | | argo14-1|
+---------+ +---------+ +---------+ +---------+
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Argo Compute Cluster | Previous: About the New Argo | Next: Getting Started |
| 2011-9-9 ACCC Systems Group |
|