Archive System¶
Overview¶
The BSC Archive is a Hierarchical Storage Management (HSM) system designed for long-term data storage. It combines high-speed disk storage with high-capacity tape storage to provide cost-effective archival solutions for large research datasets.
System Architecture¶
The Archive operates as a two-tier HSM system:
| Tier | Purpose | Access Speed |
|---|---|---|
| Disk | Recently accessed or frequently used data | Immediate |
| Tape | Long-term archival storage | Minutes to hours |
Data is automatically migrated between tiers based on access patterns and retention policies.
Key Characteristics¶
Storage Capacity Large-scale storage optimized for datasets requiring long-term preservation without immediate access needs.
Access Patterns - Recent data: Available on disk with immediate access - Archived data: Stored on tape, requires retrieval time - Retrieval time: Minutes to hours depending on tape library load
Usage Guidelines¶
File Size Recommendations
Archive policy generally dictates that only files over 1GB should be archived. However, be wary of very large files:
- Tapes have a 12TB capacity
- A single 10TB file would leave 2TB of wasted space on the tape
- Multiple smaller (but still 1GB+) files can share tape space more efficiently
Tip: Consolidate small files into compressed archives (tar.gz, zip) before archiving to improve storage efficiency.
Key Limitations¶
Important Considerations
- Tape data is not immediately accessible - retrieval may take minutes to hours
- Not suitable for active analysis, real-time processing, or frequent access patterns
- Best for long-term preservation, compliance, and infrequently accessed datasets
Frequently Asked Questions¶
How can I access the Archive?
Contact the data management team at datamanagement@bsc.es for access credentials and instructions.
How can I check which files are on tape vs. disk?
On transfer nodes:
hsmFileState filename
Shows migrated (on tape) or resident (on disk).
On other nodes:
stat filename
Check Blocks: 0 or 1 = migrated (tape), more than 1 = resident (disk).
Check multiple files:
for i in $(ls); do hsmFileState $i; done
Accessing Files on Tape
Files are automatically recalled from tape when you access them (e.g., opening, reading, copying). No manual retrieval command is needed. Recall time may take minutes to hours depending on tape library load.
Support¶
For assistance with the Archive system, contact the data management team at datamanagement@bsc.es.