Supported file format¶
The BSC Dataverse supports the ingestion of any file format, allowing researchers the flexibility to upload a wide range of data. However, to maximize the usability and longevity of your data, we strongly encourage the use of standardized, widely-used file formats recognized by your research community.
Examples of recommended formats:
File Type | Preferred File Formats | Non-Preferred File Formats |
---|---|---|
Audio | - Uncompressed and lossless WAV or AIFF (.wav , .aiff ) - Compressed and lossless FLAC ( .flac ) - Compressed MP3 ( .mp3 ) |
- AAC (.m4a ) - Ogg Vorbis ( .ogg ) - Windows Media Audio ( .wma ) |
Image | - Uncompressed TIFF (.tif , .tiff ) - Compressed and lossless PNG ( .png ) - Compressed JPEG ( .jpg , .jpeg ) |
- Adobe Photoshop (.psd ) - Windows Bitmap ( .bmp ) - Raw Image Data ( .raw ) |
Geospatial Data | - ESRI Shapefile (.shp ) - GeoJSON ( .geojson ) - NetCDF ( .nc ) |
- Proprietary or unsupported geospatial formats |
Simulation Data | - NetCDF (.nc ) - HDF5 ( .h5 ) - VTK ( .vtk ) - ParaView ( .pvd ) |
- Custom binary formats without documentation |
Spreadsheet/Tabular Data | - Plain text with UTF-8 encoding, tab-separated or comma-separated (.tsv , .csv ) |
- Excel (.xlsx ) |
Text | - Plain text (.txt , .md ) - XML ( .xml ) - PDF/A ( .pdf ) combined with the original file |
- Microsoft Word (.docx ) |
Code/Scripts | - Python (.py ) - R ( .R , .RData ) - MATLAB ( .m , .mat ) - Jupyter Notebooks ( .ipynb ) |
- Non-commented or non-documented scripts |
Video | - MPEG-4 (.mp4 ) |
- AVI (.avi ) - Flash Video ( .flv ) - Windows Media Video ( .wmv ) |
Genomics Data | - FASTA (.fasta ) - FASTQ ( .fastq ) - VCF ( .vcf ) |
- Unsupported proprietary genomic file formats |
Statistical Data | - R scripts and data (.R , .RData ) - SPSS scripts ( .sps ) - STATA scripts ( .do ) |
- SPSS proprietary data (.sav ) - STATA proprietary data ( .dta ) |
Qualitative Data | - Plain text (.txt ) - PDF/A ( .pdf ) combined with original file |
- Workspace dumps without clear documentation |
Compressed or Archive Files | - ZIP (.zip ) - GZIP/TAR ( .tar.gz ) |
- Proprietary or encrypted compression formats without password-sharing information |
Visualization Files | - PNG (.png ) - JPEG ( .jpg , .jpeg ) - Scalable Vector Graphics ( .svg ) |
- Complex 3D visualization files without metadata |
Transcription | - PDF/A (.pdf ) combined with tab-separated or comma-separated values (.csv , .tsv ) |
- Word documents (.docx ) |
- Preferred formats: These formats are widely used, well-supported, and promote long-term accessibility.
- Non-preferred formats: These formats are less suitable due to their proprietary nature, lack of documentation, or reduced compatibility.
- Metadata and documentation: Always provide sufficient metadata or documentation to accompany datasets, regardless of file type.
- Anonymization: Ensure no sensitive or personal data is included in submissions.