The venerable storage format, tape, is the topic of the fifth article in our Storage Innovation blog series. If you’re shocked by the mention of tape in a series about storage innovation, you may be pleasantly surprised by how far tape has evolved. In this post, I’ll cover the essentials that research organizations need to know about tape storage, and highlight some recent advances that make tape more attractive than ever.
Why is tape storage still relevant for research computing?
First off, why should anyone care about tape storage? The reasons haven't really changed: Because it’s cheap and reliable, of course!
So — why is cheap and reliable storage a requirement rather than a “nice to have”? Because the volume of research data continues to expand dramatically across all fields. Data is pouring in from scientific instruments, sensors, multimedia streams, spreadsheets, databases, documents, …, as well as from historically big-data-producing HPC systems. Adding another dimension to the data volume equation, funding agencies are ratcheting up the requirements for Data Management Plans (DMPs), increasing the pressure on institutions to provide reliable and affordable long-term storage resources for all of their researchers.
We’ve heard pundits predict the imminent demise of tape based storage for several decades. Yet it continues to evolve and thrive. Indeed, vendors are actively pursuing tape technology innovations and announcing new capabilities. I discussed these advances with Matt Starr of Spectra Logic earlier in this blog series. Matt argued that while disk is increasingly challenged to eke out more areal density improvements, tape continues to make rapid capacity improvements, which will only make it more compelling going forward.
Can I have my cake (cheap and reliable storage) and eat it too (easy access)?
Researchers need a reliable and cheap place to keep all of their data. And, of course, they want convenience. They don’t want to worry about the data until they need it; when they need the data, they want it at their fingertips, without having to ask for a new account, learn a new tool, or call their IT admin. Furthermore, data access is a consideration not only for the researchers producing the data, but also in the context of funding agency policies on the dissemination and sharing of research results.
Similar to how multi-level memory hierarchies balance cost and performance without end-user involvement, tape storage systems (hardware and software) have evolved to a point where they can be integrated into an overall storage hierarchy with a consistent access interface. In addition, the demand for automated storage solutions in the commercial sector has prompted new product lines at different price points and made the technology available to research organizations with relatively modest hardware and staffing budgets. While HPSS remains an option, particularly for large-scale deployments, there are an increasing number of tape management and Hierarchical Storage Management (HSM) solutions available for simple, cost-effective deployment in research data environments, such as the Spectra Logic BlackPearl.
Where might tape fit into my overall storage strategy?
Most universities and labs are on a search for cheaper ways to provide big, reliable storage for their entire research community. And it’s a struggle — not just in HPC, but also for the other 90+% of campus researchers who need more storage than their local systems can handle.
In the typical data lifecycle, researchers work with their data very actively for weeks or months, and then need to retain the data for years or even decades with only infrequent access. Research data protection, retention, and accessibility (dissemination and sharing with the larger community) are critical components of DMPs.
Referring back to the first article in this blog series, we believe there is value in approaching research storage as a shared collection of workload-specific tiers, instead of a collection of monolithic systems tied to specific owners and serving multiple purposes. With this in mind, here are some opportunities to effectively use tape storage as part of an overall campus strategy:
- Near-line storage behind disk-based active storage in HSM setup.
- Reliable near-line or backup storage for HPC scratch file systems.
- Reliable backup for other campus storage systems, including object storage, lab servers, cloud storage, etc.
- Reliable archival storage for data that is not being actively used, but that must be kept.
Ideally, campuses would have a single system that can provide active storage, and then automatically migrate to archive storage (e.g., HSM), all with a simple, powerful user interface, if the price is right. This is what we’ve developed with Spectra Logic (Globus for BlackPearl) — a solution that leverages the cost advantages of tape with the performance advantages of disk, all with the familiar Globus UI. Globus also works with HPSS — a strong contender for backup and archival storage at very large scale.
Ultimately, researchers want to be able to conveniently save, access, search, and optionally share their data. Of course, researchers prefer blazingly fast storage that is free. In our experience, data transfer rates are less of an issue than usability hurdles in many cases, so you may be able to provide storage that is fast enough and cheap enough to keep your researchers happy.
In cost comparisons of storage options, evaluate beyond the cost of (media + hardware + software + utilities + maintenance) per TB. On-site staff requirements for installation, ongoing administration, and assistance to users often vary widely. Read more about these frequently overlooked costs in this blog post.
By providing a uniform easy-to-use interface to a variety of back-end storage solutions, Globus is committed to helping all researchers manage their data without having to spend extra time and effort learning new commands or asking their IT admins for help. The Globus connectors for Spectra Logic’s BlackPearl and HPSS remove one significant hurdle for sites integrating tape into their storage strategies.
In our next article, we’ll be discussing the relevance of cloud-based storage to research use cases – stay tuned and thanks for reading!