|
A configuration and replication system for cluster nodes |
Volnys Borges Bernal |
|
There are two important problems related to the installation and configuration task that must be addressed in ‘cluster of workstations’ (COW) systems. One of them is to assure that each node (or workstation) has the same environment, as kernel, utilities and configuration. The other one is to make the installation and configuration task easier and faster. This includes operating system installation (kernel, modules, patches, packages, etc.) and configuration (kernel options, boot options, network, resolution, trusted relations, local and remote file system mounting, welcome messages, services and so on). This is not a problem for one node, but how is it possible to assure that this manual task will be correctly done in all nodes while keeping consistency? How long would it take to install and configure completely each node?
There is a set of developed tools that helps the system manager in (a) the initial node configuration, (b) system reconfiguration when necessary and (c) assure that all nodes have the same environment (kernel, configuration and software). The set of these tools is called
clustermagic.
ClusterMagic Configurator
There are a lot of settings that must be done in the configuration of a node. These settings consist in the modification of several files. ClusterMagic Configurator interacts with the system manager to get the cluster configuration information (node names, primary network, high performance network, external network, node disk partitioning, remote mountings, node MAC addresses, domain name, DNS servers, etc.) in order to create the node configuration files automatically.
The following figure shows the files that are created for each node and some other used in the administration workstation.
|
|
The ClusterMagic Configurator creates automatically the DNS server configuration files for the cluster and the bootptab file which is used by the BOOTP server in the replication process.
ClusterMagic Replicator
Node replication means to install completely one node based on the replication of another node, called womb node. The womb image is a compressed file with the file hierarchy of the node.
A ClusterMagic Replicator diskette was developed to assist this task. It is a Linux environment specially designed to boot the Linux
operating system and execute the replication task. The interaction with the operator is done in the serial console that is shown in a window in the administration workstation and also by the node display that shows some messages informing the current step.
When the node is booted with the replication diskette, the diskette boot sector is loaded and the boot program reads the compressed kernel and loads it into the memory. Then, the compressed root file system is loaded into a ramdisk and the kernel initialization is started. After that, the node sends a BOOTP [CRO85] [WIM93] request and the administration workstation answers with its identification and some other information (IP address, broadcast address, network address, default gateway, node name, domain name, DNS server and search list). The node network interface and the host name are configured. At the end of the operating system initialization, a script shows two options to the operator: single user shell or replication. If "single user shell" option is selected, a new shell is started for the operator. If "replication" option is selected, the system reads the cluster configuration information stored in the
administration workstation and starts the node replication. First, the disk is partitioned. The partitions are initialized (mkfs and mkswap) and the partitions with file systems are mounted. Then, in the replication step, the womb image is copied to the local file system. After some minutes, the node configuration files (previously generated by clustermagic) stored in the administration workstation are copied to the local file system. Finally, the boot sector is initialized and the local file systems are dismounted. At this moment, the node is completely configured and the shutdown script is started. The new node is ready to be booted and will be completely operational.
Notice that this procedure requires that the operator inserts the boot diskette, resets the node and selects the replication option. The replication procedure is an automatic process and takes about 12 minutes, including the booting and shutting down, for a total file system size of 700 Mbytes. This also allows a small replacement time in the event of a node failure.
Acknowledgments
We would like to thank the Cluster development team from LSI-EPUSP and FINEP for supporting this work.
|
|
|
|