Supported file format¶
The BSC Dataverse supports the ingestion of any file format, allowing researchers the flexibility to upload a wide range of data. However, to maximize the usability and longevity of your data, we strongly encourage the use of standardized, widely-used file formats recognized by your research community.
Examples of recommended formats:
| File Type | Preferred File Formats | Non-Preferred File Formats |
|---|---|---|
| Audio | - Uncompressed and lossless WAV or AIFF (.wav, .aiff) - Compressed and lossless FLAC ( .flac) - Compressed MP3 ( .mp3) |
- AAC (.m4a) - Ogg Vorbis ( .ogg) - Windows Media Audio ( .wma) |
| Image | - Uncompressed TIFF (.tif, .tiff) - Compressed and lossless PNG ( .png) - Compressed JPEG ( .jpg, .jpeg) |
- Adobe Photoshop (.psd) - Windows Bitmap ( .bmp) - Raw Image Data ( .raw) |
| Geospatial Data | - ESRI Shapefile (.shp) - GeoJSON ( .geojson) - NetCDF ( .nc) |
- Proprietary or unsupported geospatial formats |
| Simulation Data | - NetCDF (.nc) - HDF5 ( .h5) - VTK ( .vtk) - ParaView ( .pvd) |
- Custom binary formats without documentation |
| Spreadsheet/Tabular Data | - Plain text with UTF-8 encoding, tab-separated or comma-separated (.tsv, .csv) |
- Excel (.xlsx) |
| Text | - Plain text (.txt, .md) - XML ( .xml) - PDF/A ( .pdf) combined with the original file |
- Microsoft Word (.docx) |
| Code/Scripts | - Python (.py) - R ( .R, .RData) - MATLAB ( .m, .mat) - Jupyter Notebooks ( .ipynb) |
- Non-commented or non-documented scripts |
| Video | - MPEG-4 (.mp4) |
- AVI (.avi) - Flash Video ( .flv) - Windows Media Video ( .wmv) |
| Genomics Data | - FASTA (.fasta) - FASTQ ( .fastq) - VCF ( .vcf) |
- Unsupported proprietary genomic file formats |
| Statistical Data | - R scripts and data (.R, .RData) - SPSS scripts ( .sps) - STATA scripts ( .do) |
- SPSS proprietary data (.sav) - STATA proprietary data ( .dta) |
| Qualitative Data | - Plain text (.txt) - PDF/A ( .pdf) combined with original file |
- Workspace dumps without clear documentation |
| Compressed or Archive Files | - ZIP (.zip) - GZIP/TAR ( .tar.gz) |
- Proprietary or encrypted compression formats without password-sharing information |
| Visualization Files | - PNG (.png) - JPEG ( .jpg, .jpeg) - Scalable Vector Graphics ( .svg) |
- Complex 3D visualization files without metadata |
| Transcription | - PDF/A (.pdf) combined with tab-separated or comma-separated values (.csv, .tsv) |
- Word documents (.docx) |
- Preferred formats: These formats are widely used, well-supported, and promote long-term accessibility.
- Non-preferred formats: These formats are less suitable due to their proprietary nature, lack of documentation, or reduced compatibility.
- Metadata and documentation: Always provide sufficient metadata or documentation to accompany datasets, regardless of file type.
- Anonymization: Ensure no sensitive or personal data is included in submissions.