Today, I’m sharing my recipe tracking app powered by machine learning, how it works, and some things I struggled with (mostly Next.js).
Okay, okay. Yes, I’m also tired of the industry shoving AI/ML into every product that has no business doing any AI/ML. It’s like blockchains and NFTs all over again, but with way more money, promise, and fantasy.
BUT HEAR ME OUT! For the last few months, I’ve been working on a new app to revolutionize how I store and search my recipes using some basic machine learning techniques. It’s got one monthly active user (me) and generates a profit of -$12/mo. Customer satisfaction is through the roof! But enough of my success story, let me share why I even started this project.
4 years ago, I started taking notes on my cooking. I love cooking, but I’ve been pretty sloppy with improving myself for a long time. ADHD always pushes me to new shiny recipes while my fried rice or mapo tofu stagnates. My partner kindly explained how tired she was of me making the same mistakes over and over again. I should have learned by now that she does not like spicy soup nearly as much as I do, or that our dumpling wrappers were too salty last time, or that David Chang’s Bo Saam recipe is an atrocity to both blood pressure and diabetes.
All those notes live in a single 55MB, 117 page long Google Doc. I didn’t intend to make a monster. I just wanted to science up my cooking! Jokes aside, I take a bunch of notes whenever I cook and I’ve become a lot more adventurous and consistent because of it! I write down all my modifications and how well it turned out. Sometimes, just eating my food inspires me; “oh, I should try potato starch next time!” But my ADHD also means I filled my doc full of recipes I will never cook. They’re just there for inspiration. This has lead to a huge growth in my doc and now it’s falling apart.
Unfortunately, I’ve outgrown this doc. It sucks in a lot of ways. It crashes if I try to search on my phone. It’s so damn long that my eyes glaze over trying to read even the recipe names.
So I made a web app that uses some Machine Learning ideas to organize my recipes, supporting fancy search, and improving discoverability. Along the way, I learned Next.js which feels really popular on reddit so I’ll share some of my thoughts on that.
Table of Contents:
Quick Definitions for the non-Machine Learning folks
Already a AI/ML aficionado? Skip to the next section!
I’m a noob at ML/AI so I’ll share my definitions that are likely somewhat wrong:
- embedding – a vector of numbers representing something. For example, in word2vec, vec is the embedding representing the word.
- feature extraction – taking some thing and making an embedding out of it. For example, in word2vec, the process of applying word2vec on a word to get a vector/embedding is the feature extraction. You can think of it as extracting the important “features” out of some input and getting something that represents the important bits.
Cool things you can do with embeddings:
- In some embedding spaces like word2vec, you can add them together and get resulting vectors that make sense: king + woman = queen.
- There are surprisingly simple ways of combining vectors / embeddings like averaging or adding.
Check out this visualization of my recipes! It flattens the 384 dimensional vector into 2 dimensions using UMAP (versus PCA or t-SNE). Try searching “eggplant” or “rice” and you’ll find things are pretty localized. Zoom in and you’ll likely find some surprisingly good clusters. On the other hand, there are some bad examples, like ramen, pasta, and spaghetti are not related at all.
Hard to use? Try visiting it directly in WizMap on a computer.
You can also see the distribution of individual words words and how they cluster based on this model:
Hard to use? Try visiting it directly in WizMap on a computer.
Still not excited? Read Embeddings are underrated (HN discussion). It’s a short but fun blog post that talks in more detail.
A Front Page of Inspiration
I want a front page where everything is a bit different. Diversity is important! A random list of recipes is more inspiring.
What I currently have is awful. I read table of contents for my doc and it’s a huge wall of text, where my eyes glaze over immediately. It’s always in the same order. At one point, I grouped related recipes together, but that actually breaks my “inspiration” use case because I’ll have 20 recipes in a row about making rice or tacos, and everything else is chronological because I was too lazy to organize all 300+ recipes.
https://www.whatthefuckshouldimakefordinner.com is a funny website but it’s really kind of useless. You can only see one recipe at a time and most are boring or irrelevant to me.
I tried out Paprika but it seems more focused on organizing your recipes and meal-planning rather than exploring. There’s no “random” section.
Finally, I just feel bored looking at these things. It’s nearing Thanksgiving and every recipe website is showing me 20 ways to roast a turkey. PaprikaApp just shows me sorted by last added date.
So I made a random front page! When you go to my list of recipes, the app randomizes all my recipes, every time. Don’t like what you see? Refresh!
Still don’t like it? Try searching for something (and see the next section)!
Searching and Organizing Recipes
Existing platforms do a poor job on searching and organizing recipes.
In PaprikaApp, you can organize your recipes into categories and folders but it’s completely manual. Also, I’ve never found a good method for categorizing recipes. Some people do it by primary ingredient (beef, chicken, bean) but I feel like I should just search for that ingredient and the recipes should just show up. Also, what happens when I am more okay with substituting ingredients? Or when a commenter tried a recipe with beef instead of chicken and it was great? Maybe I’m searching for “shrimp” but the recipe I want says “prawns”. I personally need something much more flexible.
On websites like AllRecipes, you can search all the recipes in the world but it’s pretty strict. If I search “mashed potatoes” (plural), it’ll ignore every recipe that spells it “mashed potato” (singular). Again, I need something that is flexible.
Basic Search Implementation
I went with a super basic search implementation, where I tokenize the query into “words” and then search for each word as a substring in each recipe. Then I give each recipe a score using TF-IDF.
When searcing “bacon and eggs” (with debug mode enabled), we give more weight to “bacon” and “eggs” because those terms are more unique and rarer among recipes. “and” is a useless term because it appears in nearly every recipe, it provides no useful information about relevance. So we should show recipes that say “bacon” and “eggs” much more often!
Think about TF-IDF in terms of Information Gain. Does this term meaningfully select for recipes?
Massage Typos And Related Spellings
I have lots of typos in my recipes. I spell ratatouille 3 ways. Cooking against transliterated recipes is hard too, like chow mein vs chao mian. In Peru, they say chaufa instead of chow fan. All these mean the same thing but spelled differently.
To fix typos and near-words, I calculate Levenshtein distance from the input word to all words found in my recipes. This distance measures the needed number of character changes to convert one word to another. PostgreSQL supports it natively, too!
Levenshtein distance also handles plural words really well:
- waffles <-> waffle has a distance of 1
- potatoes <-> potato has a distance of 2
- eggs <-> egg has a distance of 1
Notice how the singular word usually covers the plural word if you search by substring: “egg” is a substring of “eggs”, “potato” is a substring of “potatoes”. Substring search gives you more complete results and so the app should encourage substring terms.
Suggest Semantically Related Search Terms
This is where ML embeddings really take over. Say I am just exploring recipes or maybe I have a vague idea of what recipe I am looking for. Was it shrimp? Was it crawfish? I’m not sure. Did I save the recipe as noodles or pasta? Who knows. I want my app to tell me what other terms might be appropriate to search with:
- shrimp -> seafood, crawfish, lobster, crab, prawns
- bread -> dough, sandwich, wheat
- pasta -> spaghetti, noodles, cheese, pizza, italian
This is actually pretty simple to implement! If I take all my recipe content and tokenize them into words, then generate embeddings for every word I see, then I can search for words that have embeddings closest to my search terms. Vector support in PostgreSQL is pretty easy to use, but PostgreSQL awkwardly stores as a JSON string.
From here, you just click on each suggested term that you want to add. I thought about doing this automatically, so you don’t need to click, but I couldn’t really find a good threshold for what to include. Also, some very related words aren’t good to merge. I might actually be searching directly for a shrimp recipe, and not a general seafood recipe. If I want a general search, I can just click all the terms I’m okay with.
Search Through Linked Content
This was a big deal for me. This makes my recipe app SUPER worthwhile. When I drop a link to a recipe, I want to be able to search by the contents of that link.
Why is this so powerful?
Let’s say I drop a YouTube link, if I could search through the title, description, and subtitles/transcript, then I can search by what the video is about! I just need to paste a link like https://www.youtube.com/watch?v=gUlkwSHOT7Q and then search “congee”.
Other times, I’ll search for “kale” and a suprising recipe comes up because someone commented that the recipe “works great with kale”.
Including linked content allows me to be a lot more creative about what I can cook because it’s a lot more fuzzy.
This was kind of tricky to implement. I had to extract the links out of the content I save on my website, then I submit them to a personal instance of ArchiveBox, when archiving finishes, ArchiveBox posts to a webhook callback on my web app and my web app pulls the archive information (transcripts, htmltotext.txt, etc) and puts that into the database for searching. I also need to regenerate embeddings whenever this happens. But, it works!
Related Recipes Should Float Together
Say I’m looking at my recipe for “InstantPot White Rice“. My app should tell me other recipes that are similar:
- Stove Top White Rice
- InstantPot White Rice
- Oven Fried Rice
- InstantPot Brown Rice
- Mango Sticky Rice
- Kimchi Fried Rice
All these recipes are somewhat related to each other, either because they’re white rice recipes, or derived from white rice. Other times, recipes might be related by cooking type or a special ingredient.
With embeddings, you can actually see all the “rice” recipes clustered in the bottom right, near korean-kimbab and other Asian recipes.
Hard to use? Try visiting it directly in WizMap.
So what I did was for each recipe, just show all the recipes that have the “closest” embedding to them.
Demo
That’s it! You can see the web app in read-only mode at https://recipes.href.cat/recipes.
I also dumped some embedding visualizations in Quick Definitions for the non-Machine Learning folks.
Wish List and TODOs
I cut out a bunch of stuff because I don’t need the features. My app works good enough! None-the-less, they still seem like good ideas, just not worth the effort anymore.
OCR for Image Searchability
Sometimes I take a picture of a recipe in a book, or someone sends me a hand-written recipe. It’d be nice if I could search for the text in images that I’ve uploaded.
This turned out to be more difficult than I thought it would be. Word embeddings are super easy. You shove words into a model and it spits out embeddings. OCR models seem trained either on hand-writing or text, so you have to use both models to handle both cases. It does seem like the Donut model can handle both. But these models specialize single-line text, so you need to submit images cropped to a single line or word, which requires another model.
Alternatively, I could use tessaract.js which seems a little slow but could work well since I’m already using Node and feature extraction is usually a background job. The more popular, recommended, and efficient option is PaddleOCR but that requires python.
Label images using AI
I wish I could just upload pictures of recipes I’ve cooked and then I could search “peppers” or “broccoli” to find the recipe.
Next.js really got in the way here. With limited runtime allocated to free tier and no native background job support, I just couldn’t justify spending so much time implementing something I don’t really need. I’ve only uploaded like 4 photos to my recipes. Everything is possible with more time but this feature was more cool than useful.
Combine TF-IDF With Embedding Pooling
I’m currently using a BERT model with average pooling, which averages the embeddings of each token (or word) to generate one for the sentence. In my case, I concatenate all the sentences from the title, notes, and linked content to make one super long sentence and then I get a pooled average embedding for that to represent the whole recipe. Every word is equal weight in this average.
I already weigh the recipe name and the words I write more heavily than the linked content, mostly because linked content can be a little spammy. If I also use TF-IDF, then the embeddings for “saffron” would weigh more heavily than “and” or “cook”. I think this would give me much more interesting results, but I also find really good results with what I currently have.
Sort front page of inspiration by maximum distance between recipe embeddings
This would give more diversity in the semantic meaning of the items. For example, “Fried Rice” shows up first on the list. Don’t show related recipes like “White Rice”, “Fried Noodles”, or “Coconut Rice” until later down the list. Showing them together is less inspiring and diverse.
I think, in theory, I would cluster the embeddings and then some simple leetcode style algorithm to spread out the clusters based on frequency. But I’m happy with purely random too so this is as far as I go.
Deduplicate Linked Content
I used ArchiveBox for link content extraction, but one weird situation is that it uses multiple methods so if I concatenate them all, I might get duplicate results which weighs it heavier. However, I need to consider all the extraction methods because some work better than others for specific content. For example, DOM dumps of YouTube are basically useless, but yt-dlp
output is super valuable.
Next.js, was it worth it?
As part of this endevour, I decided to pick up Next.js and with it, Tailwind CSS and React. TLDR: I wouldn’t do it again specifically because of Next.js. It just feels like a very young framework with no batteries included. I think I’d stick to Django for the backend, but use Tailwind CSS and React (or Svelte) for the frontend. IDK, web dev is kind of a mess.
Pros – Benefits of Next.js
- The development flow is really cool. I can make a commit on a branch and they’ll have a subdomain pointing to a deploy of that branch. Push auto-deploys in seconds (or however long your builds take). You can easily push a change and get a domain to test against very easily. It’s really interesting!
- Server Actions are really interesting. You basically don’t need to build out an API at all. It auto-generates endpoints for whatever functions you mark as “server actions” and it creates an HTTP based RPC call from the client side to that function. Feature building becomes really really easy because you don’t have to think super hard about what API form you want and how to lock it down (CSRF, auth, etc). It just does. The ergonomics are really good too, you just put the function handle as the “action” in your form or whatever and it’ll do the right thing.
- Biphasic programming with Server Side Rendering (SSR) is a new-ish React feature, not specific to Next.js, but is super interesting nonetheless. You can write a component as either server- or client-side and they can mix-and-match throughout the tree. A server component can access server resources like databases, internal APIs, and server hardware without restrictions. A client component can access all client APIs and maintain client state info like clicks, typing, input, filesystem for better interactivity. All of this happens pretty transparently!
- They have an insane amount of caching and it’s pretty good and fast.
- Edge runtime is kind of a cool concept, to run specific parts of your app like routing at edge to speed things up.
- Partial Pre-Rendering (PPR) is cool too, to pre-render and statically generate as much of the page as possible, until you encounter dynamic code. I honestly don’t know if the speedup is good, but the concept is cool.
Cons – My Struggles with Next.js
- 6 second request timeouts on Vercel’s platform means I can’t do longer-running tasks in a straight-forward manner. This pushed me to self-host.
- If you self-host, you lose like half of the pros, unless you use OpenNext. Most of all, you lose the cool development flow. The caching and edge runtime is kind of whatever to me.
- Next.js implemented Middleware hella weirdly. They crippled middleware by forcing only one single middleware ever. It runs in the very restricted Edge Runtime (no db access). There is no opt out.
- Auth is hella weird. I tried to use auth.js which is Next.js’ official auth solution. Middleware mid-ness really holds back auth. You can work around this by using an oauth service, but you can’t have any of your users, roles, or permissions stored in your personal database. A third party auth service MUST handle scoping and permissions too.
- Invaliding caches is really manual and you need it often. What would be better? Declaring reactivity behind caches and data sources. It’s exhausting to invalidate all the caches for all the things you need. In a complex app, you’ll inevitably miss a spot and the app will just behave strangely. Next.js caches pretty much everything so it’s something you’ll run into and have to think about pretty quickly. No, reactive caches aren’t a thing AFAIK, but it sounds good, right?
- No indicators for when links are loading or forms are submitting. When you submit forms or click links, the browser shows indicators that it’s in progress. Since Next.JS pushes EVERYTHING into JS, there are no indicators. The linter aggressively pushes the Next.js Link component. Prefetching is nice if you’re on a powerful machine but it gives no indication when you click, it’s just a slow page. Submitting forms? Gotta implement your own indicator too because it’s just a stuck page for like 1 second.
- No obvious way to handle background tasks, and they acknowledge it
Source Code
Interested in any particular implementation? Check out the source code on GitHub.
Feel free to ask me any questions and I’ll try my best to answer.
Special shoutout to transformers.js, which allowed me to run machine learning models in Javascript in the browser and in Node on the server. Crazy how far we’ve come.
Are you hiring?
Context: I’m looking for a job. I built this app as part of my year away from capitalism. Airtable laid me off in September 2023 (14 months ago) and spent that time taking art classes, teaching nutrition to elementary school kids, building all the side-projects and personal apps that I’ve been itching to work on, and some traveling. However, capitalism is a reality of my life and I need to pay rent.
About me: I’m a generalist software engineer, 10 years into my career, primarily experienced in backend python and javascript/typescript. I have some light experience with frontend dev so you can consider me a backend-leaning fullstack engineer. I love building things and digging through layers. If you’re looking for someone to gain deep expertise in some system, or someone to understand how systems fit together, I’m your person. I’ve worked from operating systems up to client apps and I know a lot about a lot of stuff. I’m someone who loves learning! I’ve worked at companies with 25 employees, up to Apple with 15k+ engineers.
About you: I am staying in San Francisco so either you’re local or you’re okay with remote work. My other preferences are flexible but they fall along:
- Small-ish with fewer than 500 employees but more than 10
- Works in AI-ish stuff
- Has cool people working on stuff
- Encourages wearing many hats or supports team changes or working cross functionally
- I tend to value cash compensation and liquid stocks over private equity
- Start work in January or you’re okay with me taking all of December off
If this sounds like a decent match, reach out to m.hire.rx[at]href.cat or drop a comment here or wherever I posted this and I’ll be in touch! I might reply in January depending on how my personal life flows.
Leave a Reply