Skip to content

Archive System

Overview

The BSC Archive is a Hierarchical Storage Management (HSM) system designed for long-term data storage. It combines high-speed disk storage with high-capacity tape storage to provide cost-effective archival solutions for large research datasets.


System Architecture

The Archive operates as a two-tier HSM system:

Tier Purpose Access Speed
Disk Recently accessed or frequently used data Immediate
Tape Long-term archival storage Minutes to hours

Data is automatically migrated between tiers based on access patterns and retention policies.


Key Characteristics

Storage Capacity Large-scale storage optimized for datasets requiring long-term preservation without immediate access needs.

Access Patterns - Recent data: Available on disk with immediate access - Archived data: Stored on tape, requires retrieval time - Retrieval time: Minutes to hours depending on tape library load


Usage Guidelines

File Size Recommendations

Archive policy generally dictates that only files over 1GB should be archived. However, be wary of very large files:

  • Tapes have a 12TB capacity
  • A single 10TB file would leave 2TB of wasted space on the tape
  • Multiple smaller (but still 1GB+) files can share tape space more efficiently

Tip: Consolidate small files into compressed archives (tar.gz, zip) before archiving to improve storage efficiency.


Key Limitations

Important Considerations

  • Tape data is not immediately accessible - retrieval may take minutes to hours
  • Not suitable for active analysis, real-time processing, or frequent access patterns
  • Best for long-term preservation, compliance, and infrequently accessed datasets

Frequently Asked Questions

How can I access the Archive?

Contact the data management team at datamanagement@bsc.es for access credentials and instructions.

How can I check which files are on tape vs. disk?

On transfer nodes:

hsmFileState filename

Shows migrated (on tape) or resident (on disk).

On other nodes:

stat filename

Check Blocks: 0 or 1 = migrated (tape), more than 1 = resident (disk).

Check multiple files:

for i in $(ls); do hsmFileState $i; done

Accessing Files on Tape

Files are automatically recalled from tape when you access them (e.g., opening, reading, copying). No manual retrieval command is needed. Recall time may take minutes to hours depending on tape library load.


Support

For assistance with the Archive system, contact the data management team at datamanagement@bsc.es.