RCIF at WashU

With permission, see also the related CIRC facilities document.

Washington University Facilities and Resources

Washington University School of Medicine

The Washington University School of Medicine, consistently ranked among the top medical schools in the United States by U.S. News & World Report and by funding from the National Institutes of Health, has a rich, 133-year scientific history in basic, clinical, and translational research. Since its founding in 1891, it has contributed groundbreaking discoveries in many areas of medical research. The School’s faculty members are the staff physicians at Barnes-Jewish Hospital and St. Louis Children’s Hospital that form the academic hub for the BJC HealthCare System, the Medical School’s hospital partner. The School of Medicine and these fine hospitals, which are perennially recognized for excellence in patient care also provide a superb atmosphere for collaborative translational research and for training students, residents, and fellows, are the principal components of the Washington University Medical Center. The compact nature of this academic medical center in 12 city blocks and 60 buildings enhances the collaborative opportunities for both basic and translational research. See here for more information.

Computing Facilities

The Computational Imaging Research Center (CIRC) at Washington University, under the direction of Dr. Daniel Marcus, occupies approximately 8,000 square feet of office and computing lab space within the East Imaging Building of Washington University. The CIRC is a multi-investigator research and engineering laboratory that includes scientists, programmers, software developers, systems administrators, students, and trainees, with a core mission of developing and operating computational tools and informatics software to support medical imaging research. The CIRC, in conjunction with the Mallinckrodt Institute of Radiology’s Research Computing and Informatics Facility (RCIF), maintains a number of XNAT-based imaging databases for both internal and public facing operations. Together, these systems support over 500 ongoing research studies and over 1000 active users. In addition to imaging databases and informatics services, the RCIF maintains an on-premises high-performance computing facility (Center for High Performance Computing, CHPC) hosted within the university’s Research Data Center (RDC) and storage servers across multiple sites.

High Performance Computing

The Center for High Performance Computing (CHPC), operated by the Research Computing and Informatics Facility (RCIF), provides massively scaled compute resources to the Washington University imaging community and the expertise necessary to tackle any computationally intensive scientific project, including the largest GPU-accelerated and InfiniBand-connected cluster among the combined campuses of Washington University in St. Louis with over 100 datacenter class NVIDIA GPUs. The facility’s staff is both highly experienced and knowledgeable about HPC and informatics, and they are committed to helping users succeed in their computational research projects, hosting both in-person and online meetings as well as 1-on-1 support and project assistance for PI’s and users.

The CHPC hosts 3 petaflops (PF) of GPU-accelerated capacity, an HDR InfiniBand interconnect, and access to over 18 petabytes (PB) of tiered data storage on the HDR interconnect. The facility maintains both CPU-only and GPU-accelerated computing nodes with the most capable datacenter GPUs commercially available, including modular (SXM) NVIDIA H100, A100 (80G), and A100 (40G) with dual-socket 5th Generation Intel Xeon and AMD EPYC processors with intra-nodal NVSWITCH as well as PCIe A100, V100S, V100, T4, and K20 GPU-accelerated nodes. Batch job management on the heterogeneous compute nodes is addressed through Slurm. The HDR InfiniBand interconnect provides high-speed and ultra low-latency communication between the servers and storage, enabling efficient data sharing and distributed computing.

Additionally, the RCIF maintains separate virtualization clusters, including a Proxmox-based cluster and a VMWare cluster for running virtual machines. These host various XNATs and handle some small-scale pipeline processing. Each Proxmox system has 2TB RAM, 2x Intel 32-core processors, 1x dual-port IB card, 1x quad-port 10G/25G Ethernet card, 2x 960G NVMe boot drives in RAID1, and 1x 15 TB NVMe local storage drive. The Proxmox cluster supports Rocky v9.x 64-bit Linux. The VMWare cluster consists of 216 CPU cores in total and 2TB RAM. Processing power and RAM usage for any virtualization cluster machine is dynamically scalable from 1-32 cores and up to 256GB of RAM through hosting in Proxmox Virtual Environment or VMware vSphere Enterprise environment, depending on the underlying hardware. These machines are clustered and provisioned using Proxmox/VMWare and the Puppet Open Source configuration management tools. The cluster is networked to the RCIF’s data storage systems (see below) via InfiniBand (Proxmox) or dual-connected 10Gb Cisco network infrastructure (VMWare), allowing rapid read/write access to data feeding compute jobs on the CHPC.

Data storage

The facility’s storage system is comprised of a tiered high-performance parallel file system, consisting of both large-volume and fast scratch storage. Over 16 PB of large-volume storage is maintained in a Ceph file system with an offsite disaster recovery (DR) center hosting 11PB of compressed data. The Ceph-based system currently hosts the as the /home directories for CHPC users, the main datasets used by researchers, and a variety of shared human imaging datasets and data from MIRRIR and CNDA. At the fastest storage tier is 2 PB of high-throughput scratch storage in a BeeGFS file system with both NVMe- and HDD- based pools for caching data during computation. The NVMe-based pools of the scratch file system are on servers with dual HDR ports for up to 400gbps data transfer rates, maintaining computationally-constrained rooflines for performance.

Secondary data storage includes over 3 PB of tiered SSD and NL-SAS based storage using the OpenZFS file system, which is primarily designated for the virtual machine (VM) cluster. The ZFS system is built on Supermicro hardware with a SAS Host Bus Adapter (HBA) breaking out to daisy-chained DataOn JBOD Storage Chassis, holding up to 60 drives per unit, for high density and easy expandability. This system hosts a full independent backup copy of the lab’s data. Custom backup scripts create daily ZFS snapshots for a perpetual point-in-time history of the archive and initiate incremental backup processes. The automated VM deployment process used in the RCIF incorporates these backups into development systems as writable ‘snap-clones’, providing developers with space-efficient copies of the data without the possibility of harming the live file system. In addition to these functions, the ZFS system also hosts high performance development and production VMs utilizing the SSD-based read and write cache mechanisms.

In addition to the off-site Ceph-based DR, the RCIF also operates a separate ZFS DR system hosted at a data center on another campus. This system holds a matching 1PB of high-capacity storage and receives nightly backups from the primary RCIF ZFS storage system. It also retains snapshots in perpetuity for a completely redundant copy of all ZFS hosted data.

While both ZFS- and Ceph-based primary and DR systems are deployed, the RCIF is currently in the process of migrating ZFS-based data into the Ceph-based system to reduce the maintenance load.

Informatics Systems

Central Neuroimaging Database Archive (CNDA)

The CNDA (Gurney et al., 2017) is an imaging informatics platform that provides secure data management services for Washington University investigators. The CNDA’s services include automated archiving of imaging studies from all of the University’s research scanners, automated quality control and image processing routines, and secure web-based access to acquired and post-processed data. The CNDA is operated by the Neuroimaging Informatics and Analysis Center (NIAC) at the Washington University School of Medicine under the direction of Dr. Daniel Marcus, Professor of Radiology. The CNDA currently stores over 180,000 individual scans, representing data from a wide range of multi-site and in-house studies. It also manages thousands of non-imaging experiments, including neuropsychological, clinical, biomarker, and behavioral data. The CNDA is built on XNAT (Marcus et al., 2007a; Marcus et al., 2007b), a widely used open source imaging informatics platform developed in Dr. Marcus’ laboratory.

Benefits of the CNDA

The CNDA was designed to facilitate common data management and productivity tasks for neuroimaging and associated data. Notable features include: 1) support for a range of image upload/download methods, including DICOM, FTP, web services, and web browsers; 2) an extensible data model that simplifies the incorporation of new data types by automatically generating the necessary database tables and relations, user interface components, and search engine plug-ins; 3) quality control modules and audit trails, including a virtual quarantine that houses uploaded data until authorized users have validated them and a complete history profile that tracks all changes made to the managed data; 4) a secure web-based user interface for data entry and access; 5) a sophisticated search engine that builds queries across data types; 6) an online image viewer that supports a number of common neuroimaging formats, including DICOM and Analyze; and 7) an pipeline engine for automating image processing routines.

Support for multi-site studies

The CNDA is widely used to support studies that include geographically dispersed data acquisition sites and analysis teams. Example studies include the Dominantly Inherited Alzheimer Network (DIAN), a 15-site study of inherited Alzheimer’s disease that includes PET, MR, neuropsychological, clinical, and tissue data; the Comprehensive Neuro-oncology Data Repository, a 2-site study developed advanced imaging biomarkers for glioblastoma; and INTRUST, a 49-site study of traumatic brain injury and post-traumatic stress disorder in combat veterans. The CNDA supports several options for importing scans from remote sites. Most often scans are uploaded using a user friendly web-based tool that removes identifiers from the image file metadata and transfers the files to the CNDA over an encrypted protocol. Alternatively, for sites that acquire a large volume of data, a relay computer can be configured on site to automatically receive data from the scanner, remove identifiers, and forward on to the CNDA on an encrypted channel. Similarly, a number of tools are available for retrieving data from the CNDA, including web-based downloads, DICOM query/retrieve, and scriptable command line programs.

Data security and integrity

The CNDA implements a number of features and procedures to ensure the security and integrity of the data it hosts and full HIPAA compliance. All data coming into and out of the CNDA are transmitted over secure channels using SSL. All data are stored on ZFS raidz2 storage system with disaster recovery and offsite backup. Snapshots of the relational database are taken nightly, enabling reconstruction of the database from any time point in the study. Access to study data is restricted to authorized users who are assigned specific access privileges (create, read, edit, delete) according to their role in the study. All logins and access to data are tracked in the internal audit system.

Hardware

CNDA runs on the NRG’s share virtual infrastructure that consists of 36 VMware ESXi hosts. The CNDA has 32 virtual CPU cores for production virtual and between 15 and 32 virtual CPU cores for development virtual machines. It is allocated up to 96GB of vRAM for production and up to 48GB of vRAM for development. The virtual infrastructure and CNDA primary storage is backed by a shared cluster of ZFS data storage that contains a mix of SAS and SSD storage. The CNDA consumes approximately 300TB of data storage. All physical systems related to the CNDA connected via dual 10Gb/s Ethernet connections.

Quality control

The CNDA offers a number of features to monitor and maintain the quality of acquired data. As data are uploaded to the system, sequence details (e.g. flip angle, repetition time) are validated against a study specific protocol to ensure that the acquisition is compliant. Noncompliant scans are flagged in the system for immediate follow-up. Automated image analysis routines are then executed to determine overall image quality specific to the acquisition type. For fMRI, for example, signal to noise and subject movement histograms are generated. For DTI, fractional anisotropy maps are generated. The CNDA also supports radiological evaluation and manual quality assessments that can be optionally used by studies. The output from these routines is available to users in web-based reports and is flagged when key values fall outside acceptable limits.

MIR Research Imaging Repository (MIRRIR)

MIRRIR is an imaging informatics platform that makes large, anonymized computational imaging datasets available to university researchers. MIRRIR is built on an XNAT foundation (Marcus et al., 2007a; Marcus et al., 2007b), a widely used open source imaging informatics platform developed in Dan Marcus’ laboratory. MIRRIR is connected to BJC HealthCare’s PACS, and is able to pull large sets of anonymized clinical imaging sessions and associated radiology reads.

MIRRIR continually queries PACS to keep a databased index of DICOM metadata that’s queriable and searchable for pulling session data. IRB-approved researchers can request imaging data by accession number or by general search criteria. The resulting images can be deposited as a project in MIRRIR, using XNAT’s imaging session organization, can be delivered to the university’s HIPAA-compliant WUSTL Box cloud storage system, or can be delivered to an RCIF filesystem to be made available for processing on the RCIF’s high performance computing platform (CHPC).

For MIRRIR-delivered imaging data, containerized pipeline processing is available via XNAT’s container services plugin and a Docker swarm hosted in the RCIF’s VMWare virtualization environment. GPU-enabled processing is available through both MIRRIR’s Docker swarm and on a massive scale via the RCIF’s CHPC.

XNAT is a highly secure imaging platform, and security is enhanced further by requiring on-site or university VPN connection. Logins and data access are audited.

RCIF Shared Datasets Program

The RCIF recently began a shared datasets program, where high value datasets are made available to Washington University researchers on a subscription basis. These data are made available on the RCIF’s High Performance Computing (HPC) cluster via filesystem access. Currently, approximately a dozen high value datasets have been made available, including data from the Human Connectome Project, UK Biobank Project, OASIS, and others. Researchers are encouraged to apply to have their data shared as part of this program, and in return, shared data is stored without storage fees.

REFERENCES

Gurney, J., Olsen, T., Flavin J., Ramaratnam, M., Archie, K., Ransford, J., Herrick R., Wallace, L., Cline, J., Horton, W., Marcus, D.S. (2017) The Washington University Central Neuroimaging Data Archive. Neuroimage, 144(Pt B): 287-293.

Marcus, D.S., Olsen, T., Ramaratnam, M., and Buckner, R.L. (2007) The Extensible Neuroimaging Archive Toolkit (XNAT): An informatics platform for managing, exploring, and sharing neuroimaging data. Neuroinformatics, 5: 11-34.

Marcus, D.S., Archie, K.A., Olsen, T., Ramaratnam, M. (2007) The open source neuroimaging research enterprise. J. Digital Imaging, 20: 130-138.