petaLibrary System Overview
The UW Libraries and UW/IT have collaborated on a shared digital data repository designed provide two main functions: 1) a storage service for UW researchers who need to reliably store and exchange data with students, and collaborators anywhere in the world and; 2) a place for UW researchers to store data linked to publications, or datasets as publications themselves, and be in compliance with funder requirements.
The petaLibrary is connected to the UW network backbone at 40 Gbps (upgradeable to 80 Gbps) which is connected to the Internet2 research network at 100 Gbps so the data can be provided to the wide area network (WAN) at world-class speeds. The service is designed and operated such that access to data in the Data Commons should be limited only by the receiving endpoint or intervening network connections, not the UW infrastructure. Thus UW researchers will be able to share (and/or host) data via a service that provides world-class speeds for research data.
The petaLibrary is broken down into four distinct areas: Data Commons (Homes and Commons), Data Curation (Publications, and Archive).
Data Commons is led by the UW/IT/Research Support group and is divided into two functional areas:
Homes is a basic tier of research storage that provides 500GB of free research storage to each faculty member for his/her personal research use.
Commons is a collaborative, project oriented storage area intended to be used jointly with other researchers (UW and/or beyond) to store data from an active research project. Principal investigators are able to delegate access permissions to other campus users or external collaborators. This area of the storage system is very similar to Bighorn on Mount Moran, but with fewer restrictions on what the system can be used for.
We anticipate making Commons available via the Shibboleth authentication system so that external collaborators may use their home credentials for access (and not have to maintain separate UW credentials). This is similar to networking services enabled with EduRoam.
Data Curation: Overseen by the UW Libraries, Data Curation is divided into two functional areas:
Publications is provided by the University as a service to UW researchers who, by choice or necessity, cannot publish their data in research domain-specific repositories, publisher repositories or other external repositories because suitable services don't exist or because the researcher prefers to associate UW (and their research team) as the authoritative home for the data.
Publications is used to present research materials and accompanying documentation, or data on the internet. All data contained within the Publications store is publicly available for download by via a UW/Libraries-provided web service and/or direct web access.
There is no charge for data stored in this repository, but it must meet publication criteria established by the UW Library and cannot be changed once published.
- Archive (under development)
Archive refers to the “cold storage” portion of the petaLibrary. It will serve as a Curation repository for not-published data that is no longer part on an ongoing research project but has value to the research community.
The system will allow for retrieval of data, but will not allow for modifications to data stored within Archive. If a change is required the data will be migrated to Data Commons until it meets the criteria for archival again.
The cost for storage allocations is based on capacity is available in three distinct billing models:
- Home Space - no cost allocations for research faculty (500GB, additional space is charge at the Commons Storage rate).
- Commons Storage - $100 per terabyte per year, billed monthly based on current usage
- Long Term Storage - $1,500 per terabyte for 10 years, billed upfront based on allocation