FAQ

CACDS ACCOUNTS

How Do I Request a CACDS Account?

The easiest way to request a new account is to fill out the Account Request Form. After we create your account, you will receive an email with further instructions and additional information on synchronizing your password with the rest of campus. Please be sure to read and follow the instructions carefully. Also note that a 1 hour introductory course is a co-requisite.

How Do I Find Some Quick Start Information?

Consult the CACDS new user orientation for advice on getting started with CACDS resources quickly.

I Can’t Login (New User / First Time Login)

After submitting the account request, you have to wait for account creation confirmation email before attempting to login. As mentioned in the confirmation email, your password for logging into CACDS resources is your standard cougarnet account password.

I’ve Forgotten My Password

Your password for logging into CACDS resources is your standard cougarnet account password. You can reset your password here.

How Do I Request an Account for an External Collaborator?

To get the form needed to make an account for an external collaborator, you must first go to the sponsored accounts page and follow the procedure for creating sponsored cougarnet account.

How Can I Maintain My CACDS Account After Graduating/Leaving UH?

Your CACDS account (and all associated data) will remain active while your general UH account remains active. You can expect your UH account to be deleted within approximately 90 days of leaving the university.

To maintain your UH account (and hence your CACDS account) you should ensure that a faculty sponsor renews your account annually. This process can be initiated by completing an external collaborator request.

I Am Getting A “Module: Command Not Found” Message When I Log In To My CACDS Account

This message generally indicates that you have deleted/corrupted your shell configuration scripts which prevent the Module environment from initializing correctly. To fix the issue, you will need to clean the .bashrc or .cshrc files.

How much does it cost to use CACDS’s systems?

Currently, the services rendered by CACDS to UH community are absolutely FREE. Charges may apply for external participants.

REMOTE ACCESS

Why Is My Remote Connection So Slow?

Slow remote GUI response time is actually quite difficult to troubleshoot.

  • First because slow in this regard does not have a fully tangible metric
    • In this case slow means it is impractical for you to work with
  • Secondly because it varies on numerous conditions:
    • OS and Graphics card/drivers on your system
    • Remote visualization client (Putty/Xming)
    • Network reliability (on campus wired is usually pretty good)
    • Load (CPU, RAM, Network, Disk I/O) on the machine you are trying to access

For the first two you need laptop/desktop support which is provided by your department/college IT support folks.

For the third you can verify if you are getting at least 100Mbps consistently:http://bandwidthplace.com/

For the fourth you can monitor the load on CACDS machines here:http://baragon.hpcc.uh.edu/ganglia/?c=Opuntia

TRAINING

How Do I Receive Training?

CACDS User Support conducts introductory training for new users all year round with video recording of the basic courses available anytime with active cougarnet account.

Periodically throughout the year, CACDS organizes scheduled training classes (typically at the beginning of Fall and Spring semesters). These classes are announced on the CACDS Training Pages.

Outside of the scheduled training classes, training is provided on-demand by CACDS User Support staff. Please send an email to contact@cacds.uh.edu to arrange a training session.

Do I need to bring anything to the training or workshop?

In most cases, no: training and workshops are meant to be hands-on, which means your hands should be on a keyboard while you’re learning. We have desktop computers with appropriate software stack available for training and workshops.

CACDS SOFTWARE

What Is The Available Software At CACDS?

CACDS maintains information and usage instructions for all currently available software. This information can also be accessed at the CACDS cluster command line using themodule help and module whatis commands.

How Do I Request Software To Be Installed/Updated?

CACDS encourages users to install software packages in the $HOME directory. If the package is used by wider UH community then CACDS will install it as a module for all users.

All paid license software requests for new/updated software on CACDS systems must come from UH Faculty. Please submit a Software Request Form that is attached to the CACDS Software Policy. The software policy also contains information on licensing and cost sharing with CACDS.

Please Note: all software requests require license approval via the University’s General Council. This process can take up to 21 business days from receipt of the software request. If you require an expedited service, please make that apparent on your request form.

NFS STORAGE

How Do I Check My Available NFS Storage?

You can check your $HOME directory disk usage with the following command e.g.

> cd $HOME
> du -sh

How Do I Request More NFS Storage Space?

Please submit a storage space request by creating a support ticket with brief justification for the additional storage space.

How Do I Give A Particular User/Group Access To A Directory In My Personal NFS Space?

For a detailed explanation, see this documentation.

I Am Receiving A Locking Authority File Error

If you receive the following error:

> /usr/bin/xauth:  error in locking authority file /nfs/cacds.uh.edu/user...

you have most likely exhausted your NFS storage quota. Please delete unwanted files and/or request more storage space.

I’m Not Seeing My Output When Running From NFS!

NFS improves I/O performance by caching program output on the local host. The output data is only written to the file server when the program is complete. To overcome this issue, you should add the fsync command to your submission script. Please see the following wiki page for more details:

I Am Getting The Following Message Using Find

> find ./ -type d -print 
find: WARNING: Hard link count is wrong for .: this may be a bug in your filesystem driver. 

Use the -noleaf option when using the find command:

-noleaf
    Do  not  optimize  by  assuming that directories contain 2 fewer subdirectories than their  hard  link count. 
    This option  is needed  when  searching  file systems that do not follow the Unix directory-link convention, 
    such as CD-ROM or MS-DOS file systems or NFS  volume  mount  points.
 
e.g find ./ -noleaf -type d -print

HIGH-PERFORMANCE STORAGE

How Do I Check My Available (/scratch) Storage?

Currently there is no high performance storage but in future it might be added based on user demands.

How Do I Request More (/scratch) Storage Space?

Currently there is no high performance storage.

In case you are interested in high performance storage, please email contact@cacds.uh.edu with the following information:

  1. Name
  2. Name of PI
  3. Brief Justification for the need

JOBS

My Job Is Aborted At Runtime Due To Excessive Disk-Swapping

If your job is aborted at runtime due to disk-swapping, this typically indicates that your simulation is starved of RAM at runtime. A general rule of thumb is to ensure that yourequest a maximum of 3.2 GB of RAM for every 1 core your simulation requires e.g. if your simulation requires 8GB of RAM in total, then you specify that in your slurm job submission script.

If unsure how much RAM your job will require, you can use the Ganglia Memory Monitoring tool to see how much RAM your job is using. So there should not be a problem if your job’sActual memory graph never reaches the Total in-core memory graph. If it does, that means the machine has run out of RAM for your job and that will cause the system to use the Hard Disk to read/write data. This can cause your job to take longer than expected as well as potentially crash the system your job is running on if it is excessively reading/writing to disk.

The CACDS also provides some large memory nodes(512GB or 1TB RAM) which can be used for particularly demanding simulations.

How Does Swap Affect Performance?

Once a system starts to run out of RAM, it “swaps” in memory from the hard drive. Hard drives are much slower than RAM, so any process that uses swapped memory will experience a substantial decrease in performance.

What Is A Swap File And Where Is It Located?

There is a swap file on each machine. It is a portion of the hard drive set aside to supplement RAM, or physical memory. The swap file is also known as virtual memory. Usually, the swap file is used as a RAM overflow, if you will. When there isn’t enough RAM to run an application, or applications, the machine will swap between RAM and the swap file. Since accessing the hard drive is much slower than RAM, performance begins to degrade. When a large portion of swap is needed, it can cripple or crash a machine.

Do Different Operating System Versions Affect Performance?

Each machine uses a version of Red Hat, which has various versions available. These different Red Hat builds can have a slight impact on performance.

How Do I Target The Large-Memory Nodes?

If your simulation requires a large amount of memory then you may be eligible to access a limited set of large-memory (512GB) nodes.

You can target these nodes in your submission script by specifying the memory required:

#SLURM --mem=128GB

If your simulation requires all 512GB/1TB RAM of the large-memory nodes then you should request all 40/60 cores of the large-memory nodes respectively to avoid other users from running on the same machine and using a portion of the RAM. You can request all the cores by specifying the following parallel environment in your submission script:

#SLURM -n 40 -N 1

Note that the maximum runtime for jobs using the large memory nodes is 14 days.

Unable to run job: error: no suitable queues’ Message When Submitting Job Script

Please check the partition that you have specified in your slurm job submission script. It should be either -p gpu or remove the partition field all together.

How Can I Monitor The Behavior Of My Running Jobs?

The behavior of running jobs can monitored using the Ganglia online tool. After usingsqueue to identify the node(s) on which your jobs are running, use the Ganglia link below to locate the specific node name. Associated with each node is a series of sub-links that you can use to monitor CPU status, memory usage, communication statistics etc.

CACDS Ganglia

My Job Has Created Zombies… What Does That Mean?

Zombie Processes are processes that still remain running even though your job has terminated within the Slurm batch system. There are various reasons why this happens including ungraceful termination of MPI due to either software or hardware issues.

The CACDS plans to run a script that runs periodically to remove these zombie processes from our systems. If zombie processes are detected during the running of this script, due to one of your jobs, you will be notified via email. If you continue to receive zombie notifications regularly please contact CACDS Support for assistance in detecting the cause.

APPLICATIONS

I Need To Install ArcGIS On My System

Please consult the library staff on how to request an installation on your local system.

I am trying to compile my code with mvapich2 but cannot find -lpsm_infinipath

Please compile your code on crcfeib01.crc.nd.edu. Please note this front end node is accessible within campus network or through VPN if you are off-campus.

I See Incomplete Images In COMSOL v4+

The OpenGL graphics libraries on our machines are not compatible with the latest versions of Comsol v4+. This results in distortion or incompleteness of rendered images. To overcome this issue please set your COMSOL graphics configuration to use software rendering via the following menu option:

Option -> Preferences > Graphics > Rendering >Software

My Gaussian Jobs Are Being Aborted Due To /tmp Over-usage

By default GAUSSIAN jobs write temporary working data to the /tmp directory of the compute node. For large GAUSSIAN jobs this /tmp space can fill very quickly causing the machine to slowdown/crash ( note: machine supervisors may pre-empt machine crashes by killing your jobs).

To avoid this situation please direct GAUSSIAN to write temporary working files to your own storage allocation (either NFS or /scratch in future). Details on how to set this up in your GAUSSIAN scripts can be found here

Qestion: How can I request ArcGIS Installation and Support?

All GIS software installation and GIS desktop support is handled by UH library or departmental IT support. You can get the desktop support team contact info from thelibrary website.

GRANTS/PROPOSALS

Question: How Do I Acknowledge the CACDS In My Proposals?

To acknowledge collaboration with the CACDS, please refer to this page.

Question: Does the CACDS Provide An Example Of A Data Management Plan?

Please consult one of the CACDS research faculty for more information including sample plans.

Frequently Asked Questions

Question: Why can’t I log in?

Log in problems can occur for a number of reasons. Opuntia uses Cougarnet ID and password for authentication and if you forgot your password, and would like to request a password reset, please submit a support request. If several failed attempts were made from a particular IP address then that IP address is blocked and you’d need to submit a support ticket get the IP unblocked.

Question: How much does it cost to use CACDS’s systems?

Currently, the services rendered by CACDS to UH community are absolutely FREE. Charges apply for external participants.

Question: I need a software package for my research. Can you install it for me?

CACDS encourages users to install software packages in the /$HOME directory. If the package is used by wider UH community then CACDS will install it as a module for all users.

Question: How can I submit a job that depends on the completion of another job?

Use the SLURM dependency command as follows,

#SBATCH –dependency=afterany:SLURM_JOB_ID

Question: How can I run multiple serial tasks inside one job?

Use the following script in your SLURM job submission file:

    # Define variables
    numtasks=20
    np=1
    # Loop through numtasks tasks
    while [ $np -le $numtasks ]
    do
      # Run the task in the background with input and output 
      # depending on the variable np
      ./a.out $np > $np.out &

      # Increment task counter
      np=$((np+1))
    done

    # Wait for all of the tasks to finish
    wait

 

 Question: How can I run multiple short, parallel tasks inside one job?

- Use the following script in your SLURM job submission file:

    # Specify the list of tasks
    tasklist="task1 task2 task3"

    # Loop through the tasks
    for tsk in $tasklist; do
    do
      # run the task $tsk
      mpirun -np $PBS_NP ./a.out $tsk
    done

Question: Do I need to bring anything to the training or workshop?

In most cases, no: training and workshops are meant to be hands-on, which means your hands should be on a keyboard while you’re learning. We have desktop computers with appropriate software stack available for training and workshops.