Skip to content

Archive System

Overview

The BSC Archive is a Hierarchical Storage Management (HSM) system designed for long-term data storage. It combines high-speed disk storage with high-capacity tape storage to provide cost-effective archival solutions for large research datasets.


System Architecture

The Archive operates as a two-tier HSM system:

Tier Purpose Access Speed
Disk Recently accessed or frequently used data Immediate
Tape Long-term archival storage Minutes to hours

Data is automatically migrated between tiers based on access patterns and retention policies.


Key Characteristics

Storage Capacity Large-scale storage optimized for datasets requiring long-term preservation without immediate access needs.

Access Patterns - Recent data: Available on disk with immediate access - Archived data: Stored on tape, requires retrieval time - Retrieval time: Minutes to hours depending on tape library load


Usage Guidelines

File Size Recommendations

Archive policy generally dictates that only files over 1GB should be archived. However, be wary of very large files:

  • Tapes have a 12TB capacity
  • A single 10TB file would leave 2TB of wasted space on the tape
  • Multiple smaller (but still 1GB+) files can share tape space more efficiently

Tip: Consolidate small files into compressed archives (tar.gz, zip) before archiving to improve storage efficiency.


Key Limitations

Important Considerations

  • Tape data is not immediately accessible - retrieval may take minutes to hours
  • Not suitable for active analysis, real-time processing, or frequent access patterns
  • Best for long-term preservation, compliance, and infrequently accessed datasets

Frequently Asked Questions

How can I access the Archive?

Contact the data management team at datamanagement@bsc.es for access credentials and instructions.

How can I check which files are on tape vs. disk?

Use ls -l to check file status:

ls -l /path/to/directory

Key indicator: Files on tape show as 0 bytes (or very small size like 4096 bytes) even if they're actually large files. This is a "stub" - the actual data is on tape.

Note

Empty files may also appear as 0 bytes. A 0-byte file could be either a tape stub or a genuinely empty file.

Example:

-rw-r--r-- 1 user group    0 Nov 17 12:00 large_file_on_tape.dat  ← On tape
-rw-r--r-- 1 user group 1.2G Nov 16 11:00 file_on_disk.dat        ← On disk

Find all files on tape:

find . -type f -size 0

Accessing Files on Tape

Files are automatically recalled from tape when you access them (e.g., opening, reading, copying). No manual retrieval command is needed. Recall time may take minutes to hours depending on tape library load.


Support

For assistance with the Archive system, contact the data management team at datamanagement@bsc.es.