120423 - louis030195

https://louis030195.medium.com/using-openai-to-increase-time-spent-on-your-blog-3f138d5ae6aa https://www.tiktok.com/@louis03011995/video/7222247048391789830?lang=en title: how i build a reco engine no need ml knwoledgev without ml kjnowledge how i built a reco engine with openai app dev give outline overview interactive examples show status: same tech in ava most popular obsidian plugin increase status (stars, etc, poisitive feeedback) how to build example use embedbase why embedbase? problem faced why did i need embeddings unstructured data, scale, hard to do another way, embeddings hard to manage -> embedbase (optional) embeddings start example brain obsidian ava ava links how it works ## Overview ## interactive examples ava ## brain i started using a note taking app a few years ago and started to gather everything in it over time and reached 1 million words. as it scale I would often get lost, couldn't find my notes, i experimented in building a plugin to recommend me similar notes, which allowed me to connect and generalise my ideas, for example "activation energy" is originally a concept in chemistry, and seeing my "velocity", "momentum" mental model notes i got an insight that activation energy is a powerful mental model ## alll ![[pika-1681547885672-1x.png]] Today we're going to learn how to build a recommendation system for a social publishing platform like medium.com. The whole project's source code we are going to build is available [here](https://github.com/different-ai/embedbase-recommendation-engine-example) You can also try an interactive version [here](https://embedbase-recommendation-engine-example.vercel.app/). You can also deploy the end version on Vercel right now: [![Vercel Platform](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https%3A%2F%2Fgithub.com%2Fdifferent-ai%2Fembedbase-recommendation-engine-example%2Ftree%2Fcomplete&env=EMBEDBASE_API_KEY&envDescription=Embedbase%20API%20key%20is%20necessary%20to%20use%20Embedbase%20Cloud%2C%20you%20can%20also%20self-host%20it!&envLink=https%3A%2F%2Fapp.embedbase.xyz%2Fsignup) https://www.loom.com/share/0bbdf3ca54dd4010872df2ad2b695e37 Hey I'm Louis, recently I built a recommendation system for a note taking app that I use, and today I want to share how to reproduce the same thing for an app like medium.com where you have recommendation on the side. You can implement the same experience in your product to increase user retention. Checkout out the link below to follow a step-by-step tutorial to do so. # Overview ![[Pasted image 20230415101954.png]] We're going to cover a few things here, but the big picture is that: 1. We will need to turn our blog posts into a numerical representation called "embeddings" and store it in a database. 2. We will get the most similar blog posts to the one currently being read 3. We will display these similar blog posts on the side Concretely, it only consists in: 1. `embedbase.dataset('recsys').batchAdd('<my blog posts>')` 2. `embedbase.dataset('recsys').search('<my blog post content>')` Here, "search" allows you to get recommendations. Yay! Two lines of code to implement the core of a recommendation system. ## Diving into the implementation As a reminder, we will create a **recommendation engine for your publishing platform** we will be using NextJS and tailwindcss. We will also use [Embedbase](https://embedbase.xyz/) as a database. Other libraries used encompass: - `gray-matter` to parse Markdown front-matter (used to store document metadata, useful when you get the recommended results) - `swr` to easily fetch data from NextJS API endpoints - `heroicons` for icons - Last, `react-markdown` to display nice Markdown to the user Alright let's get started 🚀 Here's what you'll be needing for this tutorial - [Embedbase api key](https://app.embedbase.xyz/signup), a database that allows you to find "most similar results". Not all databases are suited for this kind of job. Today we'll be using Embedbase which allows you to do just that. Embedbase allows you to find "semantic similarity" between a search query and stored content. You can now `clone` the repository like so: ```bash git clone https://github.com/different-ai/embedbase-recommendation-engine-example ``` Open it with your favourite IDE, and install the dependencies: ```bash npm i ``` Now you should be able to run the project: ```bash npm run dev ``` Write the Embedbase API key you just created in `.env.local`: ``` EMBEDBASE_API_KEY="<YOUR KEY>" ``` ### Creating a few blog posts As you can see the `_posts` folder contains a few blog posts, with some front-matter `yaml` metadata that give additional information about the file. ![[pika-1681489692627-1x.png]] ℹ️ Disclaimer: **GPT-4 has been to write these blog posts, don't take it for my words or anything valuable.** ### Preparing and storing the documents The first step requires us to store our blog posts in Embedbase. To read the blog posts we've just written, we will need to implement a small piece of code to parse the Markdown front-matter and store it in documents metadata, it will improve the recommendation experience with additional information. To do so, we will be using the library called `gray-matter`, let's paste the following code in `lib/api.ts`: ```ts import fs from 'fs' import { join } from 'path' import matter from 'gray-matter' // Get the absolute path to the posts directory const postsDirectory = join(process.cwd(), '_posts') export function getPostBySlug(slug: string, fields: string[] = []) { const realSlug = slug.replace(/\.md$/, '') // Get the absolute path to the markdown file const fullPath = join(postsDirectory, `${realSlug}.md`) // Read the markdown file as a string const fileContents = fs.readFileSync(fullPath, 'utf8') // Use gray-matter to parse the post metadata section const { data, content } = matter(fileContents) type Items = { [key: string]: string } const items: Items = {} // Store each field in the items object fields.forEach((field) => { if (field === 'slug') { items[field] = realSlug } if (field === 'content') { items[field] = content } if (typeof data[field] !== 'undefined') { items[field] = data[field] } }) return items } ``` Now we can write the script that will store our documents in Embedbase, create a file `sync.ts` in the folder `scripts`. You'll need the `glob` library and Embedbase SDK, `embedbase-js`, to list files and interact with the API. In Embedbase, the concept of `dataset` represents one of your data sources, for example, the food you eat, your shopping list, customer feedback, or product reviews. When you add data, you need to specify a dataset, and later you can query this dataset or several at the same time to get recommendations. Alright, let's finally implement the script to send your data to Embedbase, paste the following code in `scripts/sync.ts`: ```ts import glob from "glob"; import { createClient, BatchAddDocument } from 'embedbase-js' import { splitText } from 'embedbase-js/dist/main/split'; import { getPostBySlug } from "../lib/api"; try { // load the .env.local file to get the api key require("dotenv").config({ path: ".env.local" }); } catch (e) { console.log("No .env file found" + e); } // you can find the api key at https://app.embedbase.xyz const apiKey = process.env.EMBEDBASE_API_KEY; // this is using the hosted instance const url = 'https://api.embedbase.xyz' const embedbase = createClient(url, apiKey) const sync = async () => { const pathToPost = (path: string) => { // We will use the function we created in the previous step // to parse the post content and metadata const post = getPostBySlug(path.split("/").slice(-1)[0], [ 'title', 'date', 'slug', 'excerpt', 'content' ]) return { data: post.content, metadata: { path: post.slug, title: post.title, date: post.date, excerpt: post.excerpt, } } }; // read all files under _posts/* with .md extension const documents = glob.sync("_posts/**/*.md").map(pathToPost); // using chunks is useful to send batches of documents to embedbase // this is useful when you send a lot of data const chunks = [] documents.map((document) => splitText(document.data, {}, async ({ chunk, start, end }) => chunks.push({ data: chunk, metadata: document.metadata, })) ) const datasetId = `recsys` console.log(`Syncing to ${datasetId} ${chunks.length} documents`); const batchSize = 100; // add to embedbase by batches of size 100 return Promise.all( chunks.reduce((acc: BatchAddDocument[][], chunk, i) => { if (i % batchSize === 0) { acc.push(chunks.slice(i, i + batchSize)); } return acc; // here we are using the batchAdd method to send the documents to embedbase }, []).map((chunk) => embedbase.dataset(datasetId).batchAdd(chunk)) ) .then((e) => e.flat()) .then((e) => console.log(`Synced ${e.length} documents to ${datasetId}`, e)) .catch(console.error); } sync(); ``` Great, you can run it now ```bash npx tsx ./scripts/sync.ts ``` This is what you should be seeing: ![[ray-so-export.png]] Protip: you can even visualise your data now in Embedbase dashboard (which is [open-source](https://github.com/different-ai/embedbase/tree/main/dashboard)), [here](https://app.embedbase.xyz/dashboard/explorer/recsys?page=0): ![[pika-1681545554475-1x.png]] Or ask questions about it using ChatGPT [here](https://app.embedbase.xyz/dashboard/playground) (make sure to tick the "recsys" dataset): ![[pika-1681545757224-1x.png]] ### Implementing the recommendation function Now, we want to be able to get recommendations for our blog posts, we will add an API endpoint (if you are unfamiliar with NextJS API pages, check out [this](https://vercel.com/docs/concepts/functions/serverless-functions)) in `pages/api/recommend.ts`: ```ts import { createClient } from "embedbase-js"; // Let's create an Embedbase client with our API key const embedbase = createClient("https://api.embedbase.xyz", process.env.EMBEDBASE_API_KEY); export default async function recommend (req, res) { const query = req.body.query; if (!query) { res.status(400).json({ error: "Missing query" }); return; } const datasetId = "recsys"; // in this case even if we call the function search, // we actually get recommendations let results = await embedbase.dataset(datasetId).search(query, { // We want to get the first 4 results limit: 4, }); res.status(200).json(results); } ``` ### Building the blog interface All we have to do now is connect it all in a friendly user interface and we're done! #### Components As a reminder, we are using tailwindcss for the styling which allows you to "Rapidly build modern websites without ever leaving your HTML": ```tsx // components/BlogSection.tsx ``` ```tsx // components/ContentSection.tsx ``` ```tsx // components/Markdown.tsx ``` #### Pages When it comes to pages, we'll tweak `pages/index.tsx` to redirect to the first blog post page: ```tsx // pages/index.tsx ``` And create this post page that will use the components we previously built in addition to the `recommend` api endpoint: ```tsx // pages/posts/[post].tsx ``` Head to the browser to see the results (http://localhost:3000) You should see this: ![[pika-1681547885672-1x.png]] Of course, feel free to tweak the style 😁. ## Closing thoughts In summary, we have: - Created a few blog posts - Prepared and stored our blog posts in Embedbase - Created the recommendation engine in a few lines of code - Built an interface to display blog posts and their recommendations Thank you for reading this blog post, you can find the complete code on the ["complete" branch of the repository](https://github.com/different-ai/embedbase-recommendation-engine-example/tree/complete) If you liked this blog post, leave a star ⭐️ on https://github.com/different-ai/embedbase, if you have any feedback, [issues are highly appreciated ❤️](https://github.com/different-ai/embedbase/issues/new/choose). If you want to self-host it, please [book a demo](https://cal.com/potato/20min) 😁. Please feel free to contact us for any help or [join the Discord community](https://discord.gg/pMNeuGrDky) 🔥. ## Further reading - [Check out this Github action that automatically index your blog posts at every push](https://github.com/different-ai/embedbase-recommendation-engine-example/blob/complete/.github/workflows/index.yaml) - Examples and other resources on [GPT-4 powered Embedbase documentation](https://docs.embedbase.xyz) in pages/posts/[slug].tsx we will slightly edit to create recommendation on the side: ```tsx <div className="flex"> ``` ```tsx <BlogSection /> ``` ![[Pasted image 20230413183529.png]] npm i react-markdown mkdir components touch components/Markdown.tsx https://github.com/different-ai/embedbase-recommendation-engine-example ``` louisbeaumont@Louiss-Air:~/Library/Mobile Documents/com~apple~CloudDocs/Documents/embedbase-reco-engine-example$ npm run sync > [email protected] sync > npx tsx ./scripts/sync.ts Syncing to recsys 52 documents Synced 52 documents to recsys ```