11 Questions
What is the purpose of training a network in this context?
To predict physical parameters of the universe
What is TF-IO integrated with in Figure 9?
The DAOS libdfs I/O library
What benefit does the TF-IO integration provide?
Bypassing POSIX and operating system kernel inefficiencies
What was the read bandwidth achieved by DAOS?
96 GiB/s (768 Tbps)
What is notable about DAOS's IOP/s and latency?
Very high IOP/s and remarkably low latency
What is the approximate latency achieved by DAOS?
0.3ms
How does DAOS handle small file reads?
At a rate of 551K/sec
What is the significance of Figure 8?
It reports the performance results of DAOS
What is the random write speed achieved by DAOS?
825K/sec
How does DAOS handle file creation?
At a rate of 1.5M/sec (empty) and 689K/sec (3901 bytes)
What is the key differentiation of DAOS?
Very high IOP/s, MDop/s, and remarkably low latency
Study Notes
DAOS and Google Cloud HPC Performance
- Dean Hildebrand, Technical Director in the Google Cloud Office of the CTO, praises DAOS' performance, stating it is rare to see such good performance from a single storage system across all four dimensions.
- The CosmoFlow AI application, leveraging the TensorFlow framework with DAOS, demonstrated high performance during the SC'22 conference.
- The CosmoFlow training application benchmark is part of the MLPerf HPC benchmark suite, involving the training of a 3D convolutional neural network for N-body cosmology simulation data.
DAOS Configuration and Performance
- The DAOS configuration for the IO500 benchmark runs consisted of 32 DAOS clients, 17 DAOS servers, and a 102TB storage configuration.
- The benchmark demonstrated high-bandwidth performance (even exceeding that of Lustre on some workloads) combined with ultra-low-latency storage access and tremendous scalability.
- DAOS achieved extremely high efficiency, realizing over 94% of the published VM network and Local-SSD bandwidth.
DAOS Features and Benefits
- DAOS uses a key-value architecture, which avoids many POSIX limitations and differentiates it from other storage solutions.
- DAOS features low-latency, built-in data protections, and end-to-end data integrity, making it suitable for workloads where small file, small IO, and/or many metadata operations per second (MDop/s) performance is critical.
- DAOS eliminates many metadata and locking issues of traditional POSIX-based filesystems, providing direct access to both data and metadata.
HPC-in-the-Cloud and Google Cloud HPC Toolkit
- Google Cloud has the hardware capability to speed the most computationally intensive HPC workloads with fast processors and access to GPU and TPU accelerators.
- The Google Cloud HPC Toolkit simplifies the process of deploying HPC workloads in the cloud, featuring DAOS as part of its integration.
- DAOS is recommended for any workload where small file, small IO, and/or many metadata operations per second (MDop/s) performance is critical.
Use Cases and Benefits of DAOS and Cloud Storage
- Use cases for DAOS and cloud storage include traditional HPC, HPDA, and AI/ML applications.
- Users can leverage Google Cloud Storage (GCS) for data ingestion from sources across the globe and long-term retention of data at low cost.
- DAOS can be used for high-performance analysis, drastically reducing the execution time and cost of high-performance applications.
The performance of CosmoFlow AI application on Google Cloud, demonstrated by Google and Intel teams at SC'22 conference, achieving rare TensorFlow-IO performance across all four dimensions.
Make Your Own Quizzes and Flashcards
Convert your notes into interactive study material.
Get started for free