The Oak Ridge Leadership Computing Facility (OLCF) runs the No. 3 supercomputer in the world, supported by a petascale file system, to facilitate scientific discovery. In this paper, using the daily file system metadata snapshots collected over 500 days, we have studied the behavioral trends of 1,362 active users and 380 projects across 35 science domains. In particular, we have analyzed both individual and collective behavior of users and projects, highlighting needs from individual communities and the overall requirements to operate the file system. We have analyzed the metadata across three dimensions, namely (i) the projects’ file generation and usage trends, using quantitative file system-centric metrics, (ii) scientific user behavior on the file system, and (iii) the data sharing trends of users and projects. To the best of our knowledge, our work is the first of its kind to provide comprehensive insights on user behavior from multiple science domains through metadata analysis of a large-scale shared file system. We envision that this study will provide valuable insights for the design, operation, and management of storage systems at scale.
Contributors: Hyogi Sim, Sudharshan Vazhkudai, Raghul Gunasekaran
A model-driven provisioning tool to assist storage system designers and administrators in reconciling key figures of merit (cost, capacity, performance, disk size, rebuild times, redundancy), answering what-if scenarios, and determining the relative importance of spare parts in maximizing data availability, both during initial system provisioning and continuous operations. Validated the tool with field observed data from Spider operations.
Contributors: Sudharshan Vazhkudai, Feiyi Wang, Sarp Oral, Devesh Tiwari, Lipeng Wan, Qing Cao
The Multi-Tiered Storage project is an effort to abstract the multiple storage tiers seen in HPC centers, e.g., a distributed SSD-based burst buffer, PFS, warm archive and deep archival storage. The team is designing scalable NoSQL database catalogs, a unified namespace, and policy-based data migration to present a virtual, on- demand storage hierarchy for application workflows.
Contributors: Woong Shin, Sudharshan Vazhkudai, Hyogi Sim, Sarp Oral
Oak Ridge Leadership Computing Facility (OLCF) at Oak Ridge National Laboratory (ORNL) has a long history of deploying the world's fastest supercomputers for enabling open science. Coupled with other OLCF computational resources, Titan <http://www.olcf.ornl.gov/titan>, a 27 petaflop/second Cray XK7 system, radically increases the I/O demand beyond the capabilities of the existing Spider parallel file system. The new Lustre <http://www.lustre.org> parallel file system is designed to provide roughly 30 petabytes of usable capacity to open science users at OLCF at an aggregate bandwidth of more than 1 terabytes/second.
Contributors: Sarp Oral, James Simmons, Feiyi Wang, Youngjae Kim, Douglas Fuller, Jason Hill, Dustin Leverman, Blake Caldwell, and Sudharshan Vazhkudai
The Spider system at the Oak Ridge National Laboratory’s (ORNL) Leadership Computing Facility (OLCF) was the world’s largest scale Lustre parallel file system at the time of deployment. The project started in 2005 and was successfully deployed and delivered to users in 2008. Envisioned as a shared parallel file system, Spider is capable of delivering 240 gigabyte/second aggregate throughput on a file system with 10 petabytes of usable space, supporting over 26,000 clients concurrently accessing the file system to support workloads of the OLCF’s diverse computational platforms.
Contributors: Galen Shipman, David Dillow, Sarp Oral, Feiyi Wang, Ross Miller, and Jason Hill
High Performance Storage System (HPSS) <http://www.hpss-collaboration.org/> is the result of over a decade of collaboration among five Department of Energy laboratories and IBM, with significant contributions by universities and other laboratories worldwide.
HPSS can manage petabytes of data on disk and robotic tape libraries, providing highly flexible and scalable hierarchical storage management that keeps recently used data on disk and less recently used data on tape. Through the use of cluster, LAN and/or SAN technology, HPSS aggregates the capacity and performance of many computers, disks, and tape drives into a single virtual archival system of exceptional size and versatility. This approach enables HPSS to easily meet otherwise unachievable demands of total storage capacity, file sizes, data rates, and number of objects stored.
Technology Integration members develop and maintain components of HPSS responsible for the administrative interfaces, low level data management, and logging.
Contributors: Deryl Steinert, Vicky White, and Tom Barron
Oak Ridge Leadership Computing Facility (OLCF) at Oak Ridge National Laboratory (ORNL) has been running numerous Lustre file systems at medium to very large scales (in terms of performance, capacity, and client counts) since 2005. The Lustre file system architecture was started as a DOE-funded research project in 1999. Lustre is an open-source project.
Often times, due to running at very large scales, OLCF systems have unique challenges with Lustre that will not be visible at any other facility or data center. In order to anticipate and address these challenges ahead of time, the Technology Integration Group actively engages in evaluating the Lustre development branches.
Contributors: James Simmons, Jason Hill, and Sarp Oral
The Ceph file and storage system is an emerging technology. It started as a doctoral research project at the University of California Santa Cruz, and was funded by the DOE/NNSA involving LLNL, LANL, and Sandia National Laboratories. Ceph is an open-source project and offers some very interesting key storage features such as object replication and object placement. Technology Integration Group's Ceph Evaluation project aims at gaining a comprehensive understanding of the Ceph technology, delivering a Proof of Concept (POC) test deployment, and optimizing the POC installation based on OLCF's use cases and requirements.
Contributors: Feiyi Wang, Douglas Fuller, James Simmons, Jason Hill, Dustin Leverman, Blake Caldwell, and Sarp Oral
Towards a Resilient and Scalable Infrastructure for Big Data is an Oak Ridge National Laboratory (ORNL) Laboratory Directed Research and Development (LDRD) funded project. This project focuses on researching various possible architectures for designing a scalable object storage system for efficiently solving scientific and commercial Big Data problems. Key areas of interest are asynchronous object storage APIs and I/O programming models, object storage architectures, I/O scheduling for Quality of Service (QOS), heterogeneous storage media, and object replication and morphing.
Contributors: Bradley Settlemyer, Feiyi Wang, and Sarp Oral
Although Lustre is heavily used in the high-performance computing (HPC) domain (supporting more than 70% of the Top500 <http://www.top500.org> supercomputers in the world), there are few user-level diagnostic and analysis tools specific to Lustre. In order to better serve the Oak Ridge Leadership Computing Facility (OLCF) users, the Technology Integration Group started developing custom user tools for Lustre file system in 2006. A series of tools emerged as a result of this effort.
The first of these tools, libLUT contains header files, man pages, buildable source code, and a sample integration module for the Cray XT.
Another tool developed in this effort is SpeedCopy (spdcp). Spdcp is a batch-aware copy tool, such that, when necessary, a batch job is automatically spawned to acquire needed client resources to improve the efficiency of the copy operation. In addition, the copy is Lustre-aware and will copy the Lustre striping meta-data.
The I/O Tracing and Allocation Library tool, IOTA was also developed as part of this effort. IOTA is an I/O tuning and profiling tool which can be used to help applications make more efficient use of system resources.
Contributors: Ken Matney, James Simmons, and Douglas Fuller
Status: Completed, Deprecated
The lack of user-level parallel file system tools has been a long-standing problem in the HPC community. Increased size and complexity of scientific data sets only make this problem worse.
A collaboration has been established between OLCF/ORNL, LLNL, LANL, ANL, DDN, Auburn University, and others to address this problem. So far the collaboration has produced early versions of parallel list, copy, remove, find, and tar. Work is planned to start parallel compare, and sync tools next.
Contributors: Feiyi Wang, Douglas Fuller, Sarp Oral
Evaluating or assessing the performance of a parallel file system is achieved by exercising a predefined set of I/O workloads on the target parallel file system. This process allows one the ability to identify the existing performance response and compare it to a past value. The Technology Integration Group is heavily invested in developing and running I/O benchmarks at various levels, such as at the block-level or Lustre file-system-level, to support procurement, deployment, performance configuring, or health assessment of Oak Ridge Leadership Computing Facility (OLCF) parallel file systems. Of the in-house developed tools, fair-lio is a block-level synthetic I/O exerciser designed as a speed/skew benchmark. The memory buffers are accessed in a round-robin fashion over the submitted requests so that any scatter/gather issues are seen by all targets in a balanced manner. The fair-lio benchmark is an open source project and was also used in the OLCF-3 Scalable Storage Systems Procurement effort to assess the block-level performance of offered solutions. Fair-lio was distributed as a part of a benchmark suite which was designed to establish the performance profile of the proposed Oak Ridge Leadership Computing Facility-3 (OLCF-3) Scalable Storage System (SSS) by executing a series of scripts and synthetic benchmarks representative of common I/O workloads.
Contributors: Sarp Oral, Youngjae Kim, Feiyi Wang, and James Simmons
Large-scale systems are heavily-shared resource environments where a mix of applications run concurrently and compete for network and storage resources. It is essential to characterize the runtime behavior of these applications in order to provision system resources and understand the impact of resource contention on an application’s performance.
Traditionally, applications are characterized and benchmarked using ﬁne-grained performance tools on a quiescent machine, however, the runtime behavior of the applications tend to be impacted differently by the availability of shared resources and also varies based on the needs of the specific scientific user.
This project has two objectives. The first is to characterize the I/O usage patterns of user applications from the perspective of backend storage servers while minimizing overhead and interference with the application's I/O activities. The second is to build an I/O-aware decision making capability that can be used to guide decisions about issues such as scheduling.
At OLCF, the Spider storage system is available as four partitions and the users are allowed to use all or any of them during runtime. The immediate goal of the project is to enable scientific users to choose the best file system partition based on current loads and what they know about the I/O behavior of their application.
Contributors: Raghul Gunasekaran, Sudharshan Vazhkudai, Yang Liu (NCSU), and Xiaosong Ma (NCSU)
Understanding workload characteristics is critical for optimizing and improving the performance of current systems and software, and architecting new storage systems based on observed workload patterns. In this project, we characterize the scientific workloads of the HPC (High Performance Computing) storage cluster, Spider, at the Oak Ridge Leadership Computing Facility (OLCF). Spider provides an aggregate bandwidth of over 240 gigabyte/second with over 10 petabytes of RAID 6 formatted capacity. We characterize the system utilization, the read and write requests, idle time, and the distribution of read requests to write requests for the storage system. We also study applications' I/O characteristics by collecting block-level and RPC (remote procedure call) trace information from the OSSs (object storage servers). The analysis of the block level traces and RPC logs will help optimize individual applications' behavior and profile application I/O access patterns.
Contributors: Youngjae Kim, Raghul Gunasekaran, Sarp Oral, and David Dillow
Open Scalable File Systems (OpenSFS) is a non-profit industry organization that supports vendor-neutral development and promotion of Lustre, an open-source file system that supports many of the world’s largest and most complex computing environments. Oak Ridge National Laboratory is one of the four founders of the OpenSFS organization. Oak Ridge Leadership Computing Facility (OLCF) at ORNL is a leader in the Lustre community and also is an active member of OpenSFS. OLCF participates in almost all OpenSFS efforts, particularly in the Technical Working Group (TWG), the Benchmarking Working Group (BWG), and the Community Development Working Group (CDWG).
Jason Hill and John Carrier (Cray) lead the TWG which focuses on gathering technical requirements for new Lustre feature developments and major bug fixes, generating statements of work/requests for proposals, developing resource allocation proposals, working with contractors on technical requirements, and creating high level estimates of impacts/costs to meet requirements.
The BWG is led by Sarp Oral and Richard Roloff (Cray) and focuses on researching primary I/O workloads in high performance parallel file system configurations, studying existing benchmarking tools that best emulate these workloads towards producing a report that best matches existing benchmarking tools available to the workloads defined, and producing a set of instructions for download, execution and analysis for the benchmarking tools recommended, and working with OpenSFS to make progress on these goals, either through contracting resources or through other affiliations.
The CDWG focuses on facilitating discussions and drafting a community development model for Lustre, taking into consideration the needs/concerns of Lustre vendors, delivering short and long term goals to the OpenSFS Board of Directors for community development of Lustre, ensuring proper execution of community involvement in Lustre, creating an open forum for community discussion, and advising the OpenSFS Board of Directors on the community’s goals and perspective on Lustre.
Contributors: Jason Hill, Sarp Oral, and James Simmons
The Technology Integration Group played a key role in the architecture and implementation of the National Climate Computing Research Center, a joint effort between Oak Ridge National Laboratory (ORNL) and the National Oceanic and Atmospheric Administration (NOAA). NCRC's primary compute platform, Gaea, has evolved through a three-phase deployment into a 1.1 petaflop/second system supporting atmospheric research. Group members helped specify, procure, and test the machine through each deployment phase. In addition, the Technology Integration Group has assisted the NOAA user community in porting and developing tools to leverage this incredibly powerful compute platform.
Contributors: Douglas Fuller and Sarp Oral