Hub documentation

Storage Buckets

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Storage Buckets

Storage Buckets are a repo type on the Hugging Face Hub providing S3-like object storage, powered by the Xet storage backend. Unlike Git-based repositories (models, datasets, Spaces), buckets are non-versioned and mutable, designed for use cases where you need simple, fast storage such as training checkpoints, logs, intermediate artifacts, or any large collection of files that doesn’t need version control.

You can interact with buckets using the Hub web interface, the hf CLI, or the Python API.

Buckets are available to all users and organizations. See hf.co/storage for pricing details.

Buckets vs Repositories

The Hub offers two types of storage: Git-based repositories for versioned, collaborative work and buckets for fast, mutable object storage.

Feature Repositories (Git-based) Storage Buckets
Versioning Full Git history None (mutable, overwrite-in-place)
Types Models, Datasets, Spaces Standalone bucket
Primary use case Publishing finished artifacts Working storage / intermediate data
Operations Hub API, Git push/pull S3-like sync, cp, rm
Deduplication Xet chunk-level Xet chunk-level
Pull Requests Yes No
Model/Dataset Cards Yes No

Use repositories when you want version history, collaboration features (PRs, discussions), and library integrations. Use buckets when you need fast, mutable storage for data that changes frequently β€” files can be overwritten or deleted in place.

Creating a Bucket

From the Hub UI

  1. Navigate to huggingface.co/new-bucket:
  1. Specify the owner of the bucket: this can be either you or any of the organizations you’re affiliated with.

  2. Enter a bucket name.

  3. Choose whether the bucket should be public or private.

  4. Optionally, preselect CDN pre-warming regions to cache your data closer to your compute from the start.

After creating the bucket, you should see the bucket page:

From the CLI

# Create a bucket under your namespace
hf buckets create my-bucket

# Create a private bucket
hf buckets create my-bucket --private

# Create a bucket under an organization
hf buckets create my-org/shared-bucket

From Python

from huggingface_hub import create_bucket

# Create a bucket under your namespace
create_bucket("my-bucket")

# Create a private bucket
create_bucket("my-bucket", private=True)

# Create a bucket under an organization
create_bucket("my-org/shared-bucket")

For the full Python API reference including deleting, moving, and listing buckets, see the huggingface_hub Buckets guide.

Browsing Buckets on the Hub

Every bucket has a page on the Hub where you can browse its contents, navigate directories, and view file details. Bucket pages are available at https://huggingface.co/buckets/<owner>/<bucket-name>.

You can also list bucket contents from the CLI:

# List files in a bucket (with human-readable sizes)
hf buckets list julien-c/my-training-bucket -h
                     Feb 17 14:46  art/
                     Feb 17 14:58  arxivqa/
                     Feb 17 15:02  arxivqa2/
                     Feb 17 15:04  arxivqa3/
                     Feb 17 14:47  captcha/
                     Feb 17 14:53  captcha2/
                     Feb 24 17:22  julien/

# Recursive listing
hf buckets list julien-c/my-training-bucket/art -h -R
    423.6 MB         Feb 17 14:29  art/train-00000-of-00011.parquet
    441.0 MB         Feb 17 14:29  art/train-00001-of-00011.parquet
    521.7 MB         Feb 17 14:29  art/train-00002-of-00011.parquet
    481.4 MB         Feb 17 14:29  art/train-00003-of-00011.parquet
    444.6 MB         Feb 17 14:29  art/train-00004-of-00011.parquet
    461.6 MB         Feb 17 14:29  art/train-00005-of-00011.parquet
    466.4 MB         Feb 17 14:29  art/train-00006-of-00011.parquet
    486.3 MB         Feb 17 14:29  art/train-00007-of-00011.parquet
    477.0 MB         Feb 17 14:29  art/train-00008-of-00011.parquet
    454.0 MB         Feb 17 14:29  art/train-00009-of-00011.parquet
    483.1 MB         Feb 17 14:29  art/train-00010-of-00011.parquet

# Tree view
hf buckets list julien-c/my-training-bucket --tree -h -R
                        β”œβ”€β”€ art/
423.6 MB  Feb 17 14:29  β”‚   β”œβ”€β”€ train-00000-of-00011.parquet
441.0 MB  Feb 17 14:29  β”‚   β”œβ”€β”€ train-00001-of-00011.parquet
521.7 MB  Feb 17 14:29  β”‚   β”œβ”€β”€ train-00002-of-00011.parquet
481.4 MB  Feb 17 14:29  β”‚   β”œβ”€β”€ train-00003-of-00011.parquet
444.6 MB  Feb 17 14:29  β”‚   β”œβ”€β”€ train-00004-of-00011.parquet
461.6 MB  Feb 17 14:29  β”‚   β”œβ”€β”€ train-00005-of-00011.parquet
466.4 MB  Feb 17 14:29  β”‚   β”œβ”€β”€ train-00006-of-00011.parquet
486.3 MB  Feb 17 14:29  β”‚   β”œβ”€β”€ train-00007-of-00011.parquet
477.0 MB  Feb 17 14:29  β”‚   β”œβ”€β”€ train-00008-of-00011.parquet
454.0 MB  Feb 17 14:29  β”‚   β”œβ”€β”€ train-00009-of-00011.parquet
483.1 MB  Feb 17 14:29  β”‚   └── train-00010-of-00011.parquet
                        β”œβ”€β”€ arxivqa/
495.9 MB  Feb 17 14:32  β”‚   β”œβ”€β”€ train-00000-of-00164.parquet
518.3 MB  Feb 17 14:32  β”‚   β”œβ”€β”€ train-00001-of-00164.parquet
495.5 MB  Feb 17 14:32  β”‚   β”œβ”€β”€ train-00002-of-00164.parquet
486.6 MB  Feb 17 14:32  β”‚   β”œβ”€β”€ train-00003-of-00164.parquet
490.4 MB  Feb 17 14:32  β”‚   β”œβ”€β”€ train-00004-of-00164.parquet
...

Managing Files

You can upload and download files directly from the bucket page on the Hub, or use the CLI and Python API for programmatic access. Bucket files are referenced using hf://buckets/ paths (e.g., hf://buckets/username/my-bucket/path/to/file). The hf buckets cp command handles individual file transfers while hf buckets sync is better suited for directories. All commands work in both directions β€” local-to-remote and remote-to-local.

Uploading files

For quick uploads, you can drag and drop files directly on the bucket page in your browser. For programmatic use, hf buckets cp copies individual files into a bucket. The source is a local path and the destination is an hf://buckets/ path. You can also pipe data from stdin, which is handy for programmatically generated content.

CLI:

# Upload a single file
hf buckets cp ./model.safetensors hf://buckets/username/my-bucket/models/model.safetensors

# Upload from stdin
cat config.json | hf buckets cp - hf://buckets/username/my-bucket/config.json

In Python, use batch_bucket_files to upload one or more files in a single call. Each entry is a tuple of (local_path, remote_path).

Python:

from huggingface_hub import batch_bucket_files

batch_bucket_files(
    "username/my-bucket",
    add=[
        ("./model.safetensors", "models/model.safetensors"),
        ("./config.json", "models/config.json"),
    ],
)

For more upload options (raw bytes, combined upload+delete, etc.), see the huggingface_hub upload guide.

Downloading files

You can download individual files directly from the bucket page on the Hub by clicking on them. For programmatic access, downloading mirrors the upload syntax β€” swap the source and destination in hf buckets cp. You can also stream a file to stdout by using - as the destination, which lets you pipe bucket contents directly into other tools.

CLI:

# Download a single file
hf buckets cp hf://buckets/username/my-bucket/models/model.safetensors ./model.safetensors

# Download to stdout and pipe
hf buckets cp hf://buckets/username/my-bucket/config.json - | jq .

In Python, use download_bucket_files with a list of (remote_path, local_path) tuples.

Python:

from huggingface_hub import download_bucket_files

download_bucket_files(
    "username/my-bucket",
    files=[
        ("models/model.safetensors", "./local/model.safetensors"),
        ("config.json", "./local/config.json"),
    ],
)

For faster downloads using pre-fetched metadata, see the huggingface_hub download guide.

Syncing directories

The sync command works like rsync or aws s3 sync β€” it compares source and destination and only transfers files that have changed. This is the most efficient way to keep a local directory and a bucket in sync. By default, sync only adds and updates files. Pass --delete to also remove files at the destination that no longer exist at the source. Use --dry-run to preview what would happen without actually transferring anything.

CLI:

# Upload a local directory to a bucket
hf buckets sync ./data hf://buckets/username/my-bucket/data

# Download from a bucket to a local directory
hf buckets sync hf://buckets/username/my-bucket/data ./data

# Sync with deletion of extraneous files
hf buckets sync ./data hf://buckets/username/my-bucket/data --delete

# Preview what would be synced without executing
hf buckets sync ./data hf://buckets/username/my-bucket/data --dry-run

# Plan and apply: review the sync plan before executing
hf buckets sync ./data hf://buckets/username/my-bucket/data --plan sync-plan.jsonl
# ... review the plan file, then apply it
hf buckets sync --apply sync-plan.jsonl

hf sync is a convenient alias for hf buckets sync.

Python:

from huggingface_hub import sync_bucket

# Upload a local directory to a bucket
sync_bucket("./data", "hf://buckets/username/my-bucket/data")

# Download from a bucket to a local directory
sync_bucket("hf://buckets/username/my-bucket/data", "./data")

The sync command supports filtering (--include, --exclude), comparison modes (--ignore-times, --existing), and a plan-and-apply workflow to review operations before executing them. For the full set of options, see the huggingface_hub sync guide.

Deleting files

Since buckets are non-versioned, deletions are immediate and permanent β€” there is no way to recover a deleted file. Use --dry-run to double-check before removing files, especially when using --recursive.

CLI:

# Remove a single file
hf buckets rm username/my-bucket/old-model.bin

# Remove all files under a prefix
hf buckets rm username/my-bucket/logs/ --recursive

# Preview what would be deleted
hf buckets rm username/my-bucket/checkpoints/ --recursive --dry-run

Python:

from huggingface_hub import batch_bucket_files

batch_bucket_files("username/my-bucket", delete=["old-model.bin", "logs/debug.log"])

For more deletion options (pattern-based filtering, recursive removal, etc.), see the huggingface_hub delete guide.

Pre-warming and CDN

Buckets live on the Hub’s global storage by default. For workloads where storage location directly affects throughput you can pre-warm bucket data to bring it closer to your compute.

Pre-warming caches files at edge locations near specific cloud providers and regions, so your jobs read data locally instead of pulling it across regions. This is especially useful for:

  • Training clusters that need fast access to large datasets or checkpoints
  • Multi-region setups where different parts of a pipeline run in different clouds
  • Distributing large artifacts to many consumers worldwide

See hf.co/storage for available regions and details on enabling pre-warming.

Use Cases

Training checkpoints and logs

When running training jobs (e.g., via Jobs), save checkpoints and logs to a bucket. Unlike a Git repo, you can overwrite the latest checkpoint without accumulating version history, and sync ensures only changed data is transferred.

# After each evaluation step, sync checkpoints to a bucket
hf sync ./checkpoints hf://buckets/my-org/training-run-42/checkpoints

Because buckets are built on Xet, successive checkpoints where large parts of the model are frozen benefit from chunk-level deduplication. Only the changed chunks are uploaded.

Data processing pipelines

Buckets serve as staging areas for data processing workflows. Process raw data, write intermediate outputs to a bucket, then promote the final artifact to a versioned Dataset repository when the pipeline completes. This keeps your versioned repo clean while giving your pipeline fast mutable storage.

Note that transferring data from a Bucket to a repository without reuploading is not yet available, but is on the roadmap.

Agentic storage

AI agents need scratch storage for intermediate results, tool outputs, traces, and working memory. Buckets provide a Hub-native place for this data: fast mutable access without Git overhead, standard Hugging Face permissions, and addressable via hf://buckets/ paths across the Hub ecosystem.

Rolling backups

Buckets are well-suited for maintaining rolling backups. With a Git-based Dataset repository, deleting outdated files doesn’t free storage β€” Git history retains every past version, so you’d need to squash commits or rewrite history to actually reclaim space. With buckets, old files are truly gone once deleted, and you only pay for what’s currently stored.

# Sync today's backup, removing files that no longer exist locally
hf sync ./daily-backup hf://buckets/my-user/backups/latest --delete

Pricing

Storage Buckets are billed based on the amount of data stored, with simple per-TB pricing. Enterprise plans benefit from dedup-based billing, where shared chunks across files directly reduce the billed footprint.

For current pricing tiers and volume discounts, see hf.co/storage. For general billing information, see the Billing documentation.

Update on GitHub