Steven Liu

Unraveling the mess

June 18, 2024 (7m ago)97 views

recap:

The last post, Making sense of this mess, discussed the motivation and goal behind redesigning the Transformers documentation. The 3 main takeaways are to write for developers, allow structure to grow organically, and integrate content instead of adding it on.

The first step in redesigning the Transformers documentation is to get a good look at the mess.

This can be daunting. A bit like cleaning out your parents holiday decorations accumulated over many years. You're overwhelmed by the sheer number of Santas and Easter bunnies.

But once you've scoped out the mess, it is easier to create a plan to make sense of it all. So, I need to:

  1. Understand my intended users and what they think about the mess. Then I can tailor the documentation to the users and address their pain points.
  2. Understand how content is arranged. Then I can rearrange the content in a way that makes sense and reduces friction.
  3. Understand the current state of the documentation. Then I can identify how to improve the content.

Who are you writing for?

There are 2 main users, machine learning researchers and engineers (MLEs) and software engineers or developers.

This is a very generalized description of MLEs and developers.

MLEs have an in-depth understanding of how transformer models work. MLEs run experiments, train models, and optimize performance. MLEs convert machine learning ideas/concepts into code. MLEs bridge the gap between research and production. MLEs might describe the documentation as:

Developers may not have a deep understanding of how transformer models work. Developers integrate code into their tech stacks or build on top of it. Developers consume code. Developers create products. Developers might describe the documentation as:

Welcome to the jungle

"I feel thin, sort of stretched, like butter scraped over too much bread."

  • J.R.R. Tolkien, The Fellowship of the Ring

This is an accurate description of the Transformers documentation.

Users have difficulty surfacing the information they need because it is spread too widely. It is hard to retrace their steps after going down a rabbit hole of links. They have to plod through the documentation because they are unsure and lost. Too much time is spent navigating rather than coding.

For example, let's say you're a developer and you want to find content about how to generate text. Where do you start?

This example relies on manually navigating the documentation. You could use ⌘+K to search for something, but you need to know the exact words to search for, which may not be possible if you're new to ML.

You start by Googling (most visitors enter the documentation through Google rather than directly) something like "hugging face text generation". You click on the 3rd link which describes Text generation strategies. You scroll until you find some code. You find code that generates a translation. Not interested. You keep scrolling until you finally reach a section about different decoding strategies. You copy and paste the code examples and start experimenting with them. Uh-oh, you get some unintelligible output. What do you do?

If you scanned up the left navigation bar, you might've clicked on the Generation with LLMs tutorial and found a description of common pitfalls and the solution to your issue.

If you scanned down the left navigation bar, you might've clicked on the Troubleshoot guide, but your error is not there. You keep going until you reach the Text Generation API. As a beginner to Transformers, you are overwhelmed by all the classes, methods, and parameters. Even worse, you have no clue what is responsible for the error you're encountering. You abort and go back to the top of the left navigation bar where you finally see Generation with LLMs and the common pitfalls and solutions described there.

At this point, you might feel overwhelmed and frustrated. Your initial impression is that this is a mess. Content is spread across 3 sections!

Goldilocks and the Three Bears

In the fairy tale, Goldilocks and the Three Bears, Goldilocks tries 3 things (porridge, chairs, and beds) belonging to a family of 3 bears. Each time, she finds 2 extremes and 1 that is just right.

MLE's probably find that there is, for example:

Developers probably find that there is, for example:

Conclusion

The documentation needs to find a middle path. Beginner-friendly while still valuable for experts. Detailed but not overwhelming. Practical without being too theoretical.

Just right.

Stay tuned for the next post, which explores diagrams for mapping content and architectures for structuring it.