Condo HPC Service

Definition
Through UW’s condo service, users can invest in nodes or storage that is placed in ARCC’s common HPC resource (Mount Moran) or common HPS resource (Big Horn). Participants in the condo service share unused portions or elements of the resource with each other and non-invested users (such as students or occasional users) who may or may not pay a fee for access.

Condo membership is granted to a Principle Investigator (PI) who buys compute nodes or blocks of storage to plug into existing infrastructure. Unused portions of these resources (i.e.; compute cycles) will be shared among other members or non-vested users until the vested member requires access to the resource. A queue management system gives vested PIs top priority to the share he/she has purchased whenever the PI needs the resource. We use pre-emption to interrupt community users jobs as needed to give vested PIs access to their share.

Common infrastructure elements such as the environmentally regulated data center, network connectivity, equipment racks, management and technical staff, etc. allow the PI to focus time and energy on research.

Service Description
The ARCC provides a condo HPC platform to University of Wyoming researchers and their collaborators. Please refer to the Data Center Service, below, for more detailed information concerning shared/common infrastructure available at the University of Wyoming.

Benefits
The condo HPC model presents researchers with much greater flexibility and power, coupled with greatly reduced overhead and management requirements as compared to owning and operating individual, standalone clusters.

In order to support a broad range of research interests, the Condo HPC (Mount Moran) supports a variety of compute nodes outlined in the node summary table below.

Other benefits include:

  • Dynamic resource expansion
  • Enhanced security through restricted physical access

Lifecycle management
Given the rapidity with which computational hardware performance improves, all compute nodes will only enjoy factory support for the duration of the standard 3 year warranty. During this time any hardware problems will be corrected as soon as possible. After the warranty is up compute nodes will be supported on a Best Effort basis until they suffer complete failure, are replaced, or reach a service age of five (5) years. Once a node has reached End of Life due to failure or obsolescence, it will be removed from service.

Buy In
The costs below are for single nodes. ARCC will make bulk purchases every six months, depending on demand. ARCC staff will coordinate the twice annual purchases.

PI’s interested in investing in the condo HPC should send their request and any questions to ARCC_info@uwyo.edu.

Node Summary Table:

Node Type Description Cost
Thin Node Revised info coming soon.
Thin GPU Node Revised info coming soon.
Fat Node Revised info coming soon.
Fat GPU Node Revised info coming soon.

Storage Block Summary Table:

Storage Type Description Cost
Project Space 5 TB blocks of GPFS storage in BigHorn (backed up) $300/TB/yr
Scratch Space 10 TB blocks of high-performance, volatile GPFS storage in BigHorn $200/TB/yr

Service Exception
ARCC reserves one day per month for system time. It is expected that this day will rarely be used, however PI’s should be prepared for service interruptions. Except for emergency situations, all users will receive notification at least two days in advance of a planned outage. Except for emergencies, ARCC will not schedule maintenance the day before or after a non-work day.

Related Documentation
Account Policy
Job Scheduling Policy
Acknowledgement of ARCC Compute Resources