Archive System¶
Overview¶
The BSC Archive is a Hierarchical Storage Management (HSM) system designed for long-term data storage. It combines high-speed disk storage with high-capacity tape storage to provide cost-effective archival solutions for large research datasets.
System Architecture¶
The Archive operates as a two-tier HSM system:
| Tier | Purpose | Access Speed |
|---|---|---|
| Disk | Recently accessed or frequently used data | Immediate |
| Tape | Long-term archival storage | Minutes to hours |
Data is automatically migrated between tiers based on access patterns and retention policies.
Key Characteristics¶
Storage Capacity Large-scale storage optimized for datasets requiring long-term preservation without immediate access needs.
Access Patterns - Recent data: Available on disk with immediate access - Archived data: Stored on tape, requires retrieval time - Retrieval time: Minutes to hours depending on tape library load
Usage Guidelines¶
File Size Recommendations
Archive policy generally dictates that only files over 1GB should be archived. However, be wary of very large files:
- Tapes have a 12TB capacity
- A single 10TB file would leave 2TB of wasted space on the tape
- Multiple smaller (but still 1GB+) files can share tape space more efficiently
Tip: Consolidate small files into compressed archives (tar.gz, zip) before archiving to improve storage efficiency.
Key Limitations¶
Important Considerations
- Tape data is not immediately accessible - retrieval may take minutes to hours
- Not suitable for active analysis, real-time processing, or frequent access patterns
- Best for long-term preservation, compliance, and infrequently accessed datasets
Frequently Asked Questions¶
How can I access the Archive?
Contact the data management team at datamanagement@bsc.es for access credentials and instructions.
How can I check which files are on tape vs. disk?
Use ls -l to check file status:
ls -l /path/to/directory
Key indicator: Files on tape show as 0 bytes (or very small size like 4096 bytes) even if they're actually large files. This is a "stub" - the actual data is on tape.
Note
Empty files may also appear as 0 bytes. A 0-byte file could be either a tape stub or a genuinely empty file.
Example:
-rw-r--r-- 1 user group 0 Nov 17 12:00 large_file_on_tape.dat ← On tape
-rw-r--r-- 1 user group 1.2G Nov 16 11:00 file_on_disk.dat ← On disk
Find all files on tape:
find . -type f -size 0
Accessing Files on Tape
Files are automatically recalled from tape when you access them (e.g., opening, reading, copying). No manual retrieval command is needed. Recall time may take minutes to hours depending on tape library load.
Support¶
For assistance with the Archive system, contact the data management team at datamanagement@bsc.es.