Organizing this mess

@stevhliu|July 2, 2024 (6m ago)277 views

recap:

The last post, Unraveling the mess, identified the major issues in the Transformers documentation by asking 3 questions. Who are the intended users? How is content arranged? What is the current state of the documentation?

The most frequent and enduring feedback about the Transformers documentation is its navigational complexity.

I mentioned this problem in my last two posts about how the documentation needs to trend toward a more organic structure and how it is currently a jungle of information.

To solve this issue:

Create a diagram (block diagram, Venn diagram, mind map) of how content is related and observe what structures organically develop.
Experiment with deep and shallow hierarchies for the content.

#The North Star

Before exploring the diagrams and structures, it's critical to align them with the intended purpose of the library because there are a couple lenses you can view Transformers through:

by modality (text, audio, vision, multimodal)
by API (transformers.Pipeline and transformers.Trainer)

Both are valid, but they're limited interpretations. Transformers is more than.

The Transformers library is a library of pretrained models (mostly Transformer architectures but not only).

Sylvain Gugger

This is the North Star, and our guiding principle for restructuring the documentation.

Those other lenses are important for organizing content, but they shouldn't be used to ground the library.

#Content mapping

The diagrams I chose focus on showing how separate parts of Transformers are related, and the information it conveys about the library as a whole.

✦Some documentation is omitted from the diagrams and structures to keep it readable.

#Block diagram

At the highest level, you'll find content about the library. Design choices, and how to contribute.
The second layer is the core:
- pretrained models: how to use the base model and tokenizer classes, how to export models to other formats for production, how to quantize models for training or inference, and task guides (a collection of recipes for training models to perform inference on a variety of tasks)
- training: how to finetune pretrained models with the transformers.Trainer API, distributed training on GPUs/CPUs and parallelism schemes, and hardware-specific training content on devices like TPUs and Apple silicon
- inference: how to use the high-level transformers.Pipeline API for inference, using LLMs for text generation, chat applications with LLMs, and optimizing inference with PyTorch and TensorFlow

You can see how structure develops internally within the block diagram.

For example, the training content is mainly composed of 3 substructures: the transformers.Trainer API, distributed training, and hardware-specific training.

#Venn diagram

Another way to map the content is with a Venn diagram.

With a Venn diagram, the content is distinctly separated into 2 categories, training and inference.

The overlap highlights how training and inference are related by:

the base classes that make up each pretrained model for training and inference
the task guides that demonstrate an example end-to-end process of training and inference for a specific task
the quantization schemes available for training and inference

There are some subtle differences from the block diagram though.

Deploy to production is aligned with inference because it's common to export a model to a format (TFLite, TorchScript) optimized for inference on specific hardware or framework. This is especially useful for deploying models to production instances.

With the block diagram, it is possible to keep deploy to production in the pretrained model layer because it's a general feature of any Transformers pretrained model.

#Mind map

A mind map is a more open type of diagram. It doesn't impose any structure or hierarchy unlike a block or Venn diagram. You're only looking at how content is connected in the context of Transformers.

#Structure

Structure is how content is organized. It needs to convey Transformers is a library of pretrained models that supports training and inference for any modality.

There are 2 important considerations for structuring content:

exact vs fuzzy match: How precisely should content be classified? If it is too exact, you lose flexibility. But if it is too vague, you lose clarity.
narrow vs broad: How narrow or broad should the structure be?
- Broad structures are more accessible because they offer more choices upfront, but it can be overwhelming.

Narrow structures are more approachable because they offer fewer choices upfront, but at the expense of having to go deeper.

#Conclusion

Diagrams organize content by helping us see how they're interrelated, and structures arrange the organized content in a way that represents the library as a whole.

In the previous two posts, I described why the documentation needs a redesign and I also outlined the major issues with its current form. This post focused on how to organize the content and how to structure it.

With the mess identified and a general plan for tidying it up, it's time to get to work. ✌️

I'll be focusing on actually redesigning the documentation now. I'll be back with more updates once the redesign is complete!