ORNL

Guides


The TeraGrid compute cluster at ORNL is primarily intended to support local bridging activities between the ETF and the Neutron Science Community. However, it is also available ETF-wide as part of the collective TeraGrid roaming resource.

Guide Topics:

  1. Help and Information
  2. Hardware
  3. Software
  4. System Access
  5. System Status
  6. File Transfer
  7. File Storage: Disk
  8. File Storage: Archival
  9. Compiling and Porting: MPI Programs
  10. Running: Interactive and Batch Jobs
  11. Running: Grid Jobs (from global TeraGrid guide)
  12. Debugging Programs
  13. Optimizing Programs
  14. References
  15. FAQ

Help and Information

[ top of page ]

Help from TeraGrid Consultants:

Please send email to help@teragrid.org, or call 866-336-2357 (toll free), to report problems or ask questions related to any TeraGrid systems or services.  Please see the TeraGrid Help Desk Web page for more information.


User News and Information:

Information about system downtimes, maintenance periods, upgrades, etc., is available at http://news.teragrid.org. Updates can be received via email or via the website. Users can manage their news subscriptions to receive information only on the platforms of interest.

Hardware

[ top of page ]

NSTG Resources:

The ORNL Neutron Science TeraGrid Gateway consists of 28 dual processor 3.06 GHz Intel Xeon nodes. The cluster has 16 nodes with 2.5GB of memory dedicated to batch job processing. The remaining 12 nodes are dedicated to service tasks such as interactive logins and data transfer. Four nodes are configured with 4GB memory and are dedicated to GridFTP.


Software

[ top of page ]

NSTG Software Stack:

The ORNL NSTG cluster closely follows the TeraGrid Common Software Stack and includes the following software packages:

System Access

[ top of page ]

Logging in:

Access to the NSTG cluster is available through the portal interface. Users can also access the cluster interactively using ssh with public keys or by using X.509 Certificates. ORNL users can request a DOEgrids certificate from http://pki1.doegrids.org/. An example screen with instructions for completing the form can be found HERE. For ssh access, users should send their public key to help@teragrid.org with a subject line of "Attn: ORNL Account Management". Someone from ORNL will contact you to verify your key and have it installed. Instructions for generating ssh public keys appear below.

For interactive command line access, users should connect to tg-login.ornl.teragrid.org which is an alias for tg-login1.ornl.teragrid.org or tg-login2.ornl.teragrid.org. Users should always use the alias tg-login.ornl.teragrid.org since maintenance and upgrade schedules may require one of the login nodes to be inaccessible. The alias will always point to a stable interactive node.

Examples of using ssh to connect to the NSTG cluster follow:

ssh username@tg-login.ornl.teragrid.org

ssh -l username tg-login.ornl.teragrid.org


Generating an ssh public key with UNIX:
  1. Run ssh-keygen
  2. Enter a passphrase when prompted. Passphrases can be any length and may contain spaces and special characters. Strong passphrases with spaces and special characters are recommended, and longer is better.
  3. You should now have two files, ~/.ssh/id_rsa (your private key) and ~/.ssh/id_rsa.pub (your public key). You should send your public key to help@teragrid.org as detailed above. You should protect your private key and not send it to anyone.
  4. Once your public key is installed on our cluster, ssh will prompt you for the passphrase for your key and you will be logged in to the cluster.


Generating an ssh public key with Windows (PuTTY):
  1. Obtain the PuTTYgen utility from the PuTTY distribiution.
  2. Run PuTTYgen and click on Generate in the main window.
  3. For the type of key, choose SSH2 RSA. The default length of 1024 for Number of bits in generated key is sufficient.
  4. Send your public key to help@teragrid.org as detailed above.

Detailed instructions for using PuTTY can be found here.

Detailed instructions for using PuTTYgen can be found here.

You can also use the PuTTYgen utility to load your UNIX private key. Click on Load in PuTTYgen to import your key. You will have to save your key after loading it because PuTTY uses a different format for storing keys than UNIX does.

To connect to a system using your private key, open PuTTY and select Connection -> SSH -> Auth in the Category box. This will allow you to load your private key for authentication. After your key is loaded, connect as usual.

File Transfer

[ top of page ]

Transferring Files:

There are several ways to transfer files to the TeraGrid. Secure copy (scp) and gsiscp are easy to use but yield mediocre performance transferring multiple or large sized files. The following is an example of an scp from a local machine to ORNL's TeraGrid cluster (with italicized variables):

% scp original_file username@tg-gridftp.ornl.teragrid.org:/to_dir/copied_file

GridFTP, based on FTP, is a high-performance file copying alternative to scp and gsiscp. GridFTP has been optimized for transfering many and large files, across high-bandwidth, wide area networks.

% globus-url-copy source_url destination_url

where source_url and estination_url are of the form:

To use secure copy from Windows platforms, download a copy of WinSCP (freeware). Other software packages for file transfer from Windows platforms are listed at the SDSC Security site.

File Storage: Disk

[ top of page ]

WARNING: It is your responsibility to back up critical data! Please maintain your own copy of important data stored on TeraGrid file systems.

The ORNL NSTG cluster provides limited storage space. Users who need to store large amounts of data should explore other TeraGrid resources with more extensive data facilities such as those provided by the San Diego Supercomputer Center.

Each user has general purpose storage space available in their home directory which currently has no quota limitations (although quotas may be imposed if the need arises). This area can be referenced through the environment variable $TG_CLUSTER_HOME. Files in this area are backed up daily.

Each user has access to a cluster wide scratch area which can be referenced through the environment variable $TG_CLUSTER_SCRATCH. Files in this area are purged daily, and files that have not been accessed within the past 14 days will be erased. These files are not backed up.

Each user has access to a scratch area which is local to each individual node which can be referenced through the environment variable $TG_NODE_SCRATCH. Files in this area are purged at the end of each job, and files that have no associated job are purged after daily if they have not been accessed within the past 14 days. These files are not backed up.

File Storage: Archival

[ top of page ]

Archival storage instructions should be available shortly.

Compiling and Porting: MPI Programs

[ top of page ]

Existing MPI-based parallel programs will typically need to be recompiled for the NSTG cluster. Copy your application file(s) into your home directory and recompile with one of the following commands:

mpicc [options] file.c      (C and C++ files)
mpif77 [options] file.f     (Fortran77 files)

Running: Interactive and Batch Jobs

[ top of page ]

Jobs can be submitted to the NSTG compute nodes through the PBS batch system. The NSTG only provides a single execution queue for jobs, the default dqueue queue. Jobs can be submitted with the qsub command. Detailed information on the qsub command is available through the online man pages. Some frequently used qsub parameters are listed here:

Parameter FormatDefinition
#PBS -A acct
Causes the job time to be charged to "acct".
#PBS -a time
Declares the time after which the job is eligible for execution.
#PBS -q queue
Directs the job to a specified queue, where "dqueue" is the default.
#PBS -j {eo|oe}
Causes the standard error and standard output to be combined in one file.
  • eo - standard output is added to standard error
  • oe - standard error is added to standard output
#PBS -l resources
Resources (separated by commas, with no spaces between):
  • nodes=n:ppn=2 - Number of nodes (with 2 processors per node)
  • walltime=hh:mm:ss - Total wall-clock time.
  • mem=ngb - Aggregate memory used by the job, in gigabytes.
#PBS -m {a|b|e}
Causes mail to be sent to the user when:
  • a - The job aborts.
  • b - The job begins running.
  • e - The job ends.
#PBS -N name
Sets the job name to "name" instead of the name of the script file.
#PBS -o name
Sets the standard output file to "name" instead of "script.o$PBS_JOBID". "$PBS_JOBID" is an environment variable created by PBS that contains the job identifier.
#PBS -e name
Sets the standard error file to "name" instead of "script.e$PBS_JOBID".
#PBS -S shell
Sets the shell to use. Make sure the full path to the shell is correct.
#PBS -V 
Declares that all environment variables are to be exported to the batch job.
#PBS -W 
Used to set job dependencies between two or more jobs.

Debugging Programs

[ top of page ]

Debugging:

Currently, the only debugger available on the NSTG cluster is gdb. To enable debugging, your code must be compiled and linked with the -g option, such as:

mpicc -g -o my_mpi_code my_mpi_code.c

This item will be updated as more tools become available.

FAQ

[ top of page ]