Welcome to the Hugging-Verse

@stevhliu|Mar 10, 2025 (4m ago)329 views

I often see people ask, what is Hugging Face? The most common answer is usually some variant of "Hugging Face is the GitHub of machine learning".

It is a good one-liner considering the breadth of Hugging Face's quest to enable collaborative machine learning (ML). But it also hides the depth of what we do. The Hugging Face ecosystem, or Hugging-Verse, is quite expansive and it can often feel like we are doing too much without a strong focus.

But I believe the Hugging-Verse needs to be expansive precisely because we're pursuing such an ambitious quest.

Image generated from the linoyts/huggy_flux_fal_lora checkpoint.

Think about games like Baldur's Gate 3 or Elden Ring. They're enormous games you can easily spend 100+ hours on in a single playthrough. Alongside the main quest, there are many side quests that add a ton of richness, worldbuilding, and depth.

Similarly, the main Hugging Face libraries and Hub platform advance the main quest. All the other libraries, research projects, courses, tools and services are side quests that enrich the Hugging-Verse.

I get that it can be overwhelming if you're new though, so here is my high-level walkthrough.

✦If you only take one thing away, remember, side quests are optional. Focus on a main library like Transformers and using the Hub at first if you aren't sure where to start.

Libraries

The Hugging-Verse contains many libraries related to nearly every aspect of ML.

The main ones, like Transformers and Diffusers, provide access to pretrained models for inference and fine-tuning. They're able to do a little bit of everything.

Side libraries focus on specific ML topics such as:

distributed training (Accelerate, picotron, nanotron)
serving large language models in production (Text Generation Inference)
evaluation (Lighteval, Leaderboards)
on-device (Transformers.js, Optimum)

Where you start in the Hugging-Verse depends on what you're trying to do. There aren't defined progression paths.

Two general routes I think most people take are:

start learning ML with Transformers

adapting a side library for their own work or research

My recommendation is to pick 1 if you're a beginner. Otherwise, feel free to choose 1, 2, or browse the table below to see what libraries are available and create your own progression path.

Hub

The Hub is the most visible part of Hugging Face. This is probably what most people think of when they think of Hugging Face (or Facehuggers from Aliens).

It is a Git-based platform to access or share models, datasets, and Spaces. Every public repository provides free storage for your ML artifacts. Social features like posts, articles, and the Community tab encourage collaboration and discussion.

Model repositories include a widget to run inference with a model on your browser. The widget is powered by the Serverless Inference API, but it also supports other inference providers like Replicate and fal.

Dataset repositories have a Dataset Viewer for easily previewing a datasets contents. You can also run SQL queries (or ask AI to craft a SQL query for you) on the dataset with the Data Studio feature to explore it in more detail. Again, this all takes place in the browser.

Caleb

@calebfahlgren

·Follow

Replying to @calebfahlgren

You can now query datasets with natural language 🗣️ ▶︎ Powered by Deepseek V3 🐳 via Inference Providers ▶︎ Use SQL to query datasets entirely in the browser via @duckdb. You'll find the best performance for smaller datasets

Watch on Twitter

Read 1 reply

Spaces is a scaffold to create and deploy ML apps with Gradio, Streamlit, HTML, and even Docker containers. A free tier Space runs on a basic 16GB CPU, but you can upgrade to more powerful GPUs and persistent storage if your app needs it. PRO subscribers have access to ZeroGPU, a shared cluster of A100 GPUs that are automatically allocated to a Space to complete a workload and then released to the next Space.

HuggingChat

HuggingChat is an open version of ChatGPT, providing access to some of the latest models like DeepSeek-R1. You can modify the system prompt of each model and create or use assistants.

HuggingChat is powered by Text Generation Inference, and it is also available on macOS for running language models locally.

Research

Hugging Face is also engaged in research projects that empower the entire ML ecosystem. The goal isn't necessarily to train the best models. Instead, we're trying to create new and interesting research that benefits everyone.

Collaboration >>> competition.

As an example, the recently released The Ultra-Scale Playbook is a culmination of everything the research team has learned about distributed training. It is freely available to anyone who is interested in scaling training of large language models to thousands of GPUs.

Thomas Wolf

@Thom_Wolf

·Follow

After 6+ months in the making and burning over a year of GPU compute time, we're super excited to finally release the "Ultra-Scale Playbook" Check it out here: hf.co/spaces/nanotro… A free, open-source, book to learn everything about 5D parallelism, ZeRO, fast CUDA kernels,

3.9K

Read 110 replies

Some other research projects include:

open-r1: fully reproduce the DeepSeek-R1 model, an open and powerful reasoning model comparable to the OpenAI o-series models.
SmolLM: release high-quality pretraining datasets (FineWeb, Cosmopedia) and small language and vision language models.

These types of projects enable other researchers to build on top of our work and push the field forward together.

Courses

Educational content, such as the courses and cookbook, at hf.co/learn provide an accessible starting point for learning ML. A large part of our quest hinges on lowering the barrier for everyone to learn ML and convert them into active participants.

There are several courses - agents, reinforcement learning, diffusion, etc. - available on hf.co/learn. You can also find more of our courses, such as quantization fundamentals, we collaborated on with other learning platforms like DeepLearning.AI.