Whether baking your first loaf of bread or building a PC from scratch, there’s nothing like a good YouTube video to guide you through the process. But as helpful as step-by-step YouTube how-tos can be, I’ve often wished for a written guide to go along with the video — something that gives me an overview of the entire process and helps me jump back to a previous step.
Enter Gemini, which has the native ability to watch and summarize YouTube videos. If you didn’t already know, you can ask Gemini to summarize any YouTube video by feeding it the URL. (ChatGPT can’t “watch” YouTube videos without third-party tools, and Claude will simply ask you to cut-and-paste a transcript.)
Even better, you can also get Gemini to extract and write out the steps from a how-to video, complete with linked time stamps. All you need is the right prompt.
To be clear, I’m not suggesting you use this prompt to replace a YouTube how-to; instead, the idea here is to create a text supplement that makes it easier to follow along during a stressful DIY project, whether you’re kneading dough or juggling PC components.
Here’s the prompt (crafted by Claude after a fair amount of back-and-forth with me):
Watch this YouTube video: [URL]
Create a step-by-step guide based on what’s actually shown and said. Start with a Materials/Tools List — every tool, ingredient, setting, or material mentioned or shown before or during the process.
Then, for each step: give it a number, write a short action-oriented title, then describe exactly what to do using only what the video demonstrates. Include timestamps. If the presenter mentions a specific tool, setting, ingredient, measurement, or material, include it. Where the presenter explains the reason for a step, include that reasoning too. Don’t add context, tips, or advice that isn’t in the video — just document what’s there.
Finally, add a credit for the content creator’s YouTube channel and provide the URL for the actual YouTube video.
This prompt does a few key things, starting with giving you a list of tools, parts, and/or ingredients up front. It also encourages Gemini to write out the steps without embellishment or inventing its own steps, while also adding linked timestamps that allow you to quickly jump to the correct spot in the YouTube video. The prompt also directs Gemini to credit the YouTube creator.
I’d suggest running the prompt using Gemini’s “Fast” (or “Flash”) mode for shorter videos (less than five minutes) or “Pro” for longer videos, particularly those that involve more than a dozen steps. The “Thinking” version of Gemini probably isn’t the right choice here, as we’re trying to extract information from a source rather than solve a complex math or programming problem.
Finally, this prompt demonstrates how we keep humans in the loop — namely, by putting AI in the middle of the process, with human YouTube creators at the beginning and those of us performing the actual DIY work at the end.



