Steven Liu

Redesigning the Transformers documentation

June 04, 2024 (10m ago)34 views

2/10/25 - Edited to consolidate the redesign process and motivation into a single post.

When I joined Hugging Face over 3 years ago, the Transformers docs were very different from the current main version. It focused on training and inference with text models for natural language tasks (text classification, summarization, language modeling, etc.).

As transformer models increasingly became the default architecture, the docs expanded significantly to include new modalities and usage patterns. But new content was added incrementally without really considering how the audience and Transformers have evolved.

I think this is why the docs experience (DocX) feels disjointed, difficult to navigate, and outdated. Basically, a whole mess.

A redesign is necessary to make sense of this mess. The goal is to:

  1. Write for developers interested in building products with AI.
  2. Foster an organic docs structure to enable sustainable growth and scalability, instead of rigidly adhering to a predefined structure.
  3. Create a more unified DocX by integrating content rather than amending it to the existing docs.

The North Star

AI moves fast and it's easy to lose sight of what we think Transformers should offer. You can interpret the library through a couple lenses.

None of these are wrong, but they're only single facets of Transformers.

The Transformers library is a library of pretrained models (mostly Transformer architectures but not only).

  • Sylvain Gugger

There can be new modalities, use cases, and APIs, but at the end of the day, Transformers is really a collection of pretrained models for training or inference.

This is the North Star for guiding the redesign.

A new audience

The Transformers docs was initially written for machine learning engineers, researchers, and model tinkerers.

Now that AI is mainstream and not just a fad, developers are interested in learning how to build AI into products. This means realizing developers interact with docs unlike machine learning engineers and researchers.

Two important distinctions are:

With the redesign, you will start with code and a simple explanation of how it works for a more complete and beginner-friendly onboarding experience (some basic prerequisite knowledge is still required).

Once developers have a basic understanding, they can progressively level up their Transformers knowledge.

Toward a more organic structure

One of my first projects at Hugging Face aligned the docs with Diátaxis, a documentation approach based on user needs (learning, solving, understanding, reference).

Somewhere along the way, I started using Diátaxis as a plan instead of a guide. I tried to force content to fit neatly into one of the 4 prescribed categories.

Natural content structures were blocked from emerging and the docs grew bloated. Docs about one topic soon spanned several sections, because it was what the structure dictated, not because it made sense.

It's okay if the structure is complex, but it's not okay if it's complex and not easy to use.

The redesign will allow the docs to grow organically according to the North Star.

Natively integrated content

New content is layered progressively over the previous content instead of coexisting as a part of the overall docs.

Tree rings provide a climatological record of the past (drought, flood, wildfire, etc.). The docs also has its own tree rings that capture its evolution.

  1. Not just text: Transformer models are adapted for computer vision, audio, multimodal, and more.
  2. Large language models (LLMs): Transformer models are scaled to billions of parameters, leading to new interaction types (prompting and chat). There are more docs about how to efficiently train LLMs, such as parameter efficient finetuning (PEFT) methods, distributed training, and data parallelism.
  3. Optimization: Training or inference with LLMs can be a challenge unless you are GPU Rich. So now, there is a ton of interest in how to democratize LLMs for the GPU Poor. There are more docs about quantization, FlashAttention, optimizing the key-value cache, Low-Rank Adaptation (LoRA), and more.

Each phase incrementally added new content to the docs, unbalancing and obscuring its previous parts. Content is sprawled over a greater surface, navigation is more complex.

The redesign will help rebalance the overall DocX. Content will be native and integrated rather than added on.

🤗 Shout out to @evilpingwin for the feedback and motivation to redesign the docs.