https://louis030195.medium.com/using-openai-to-increase-time-spent-on-your-blog-3f138d5ae6aa
https://www.tiktok.com/@louis03011995/video/7222247048391789830?lang=en
title:
how i build a reco engine no need ml knwoledgev without ml kjnowledge
how i built a reco engine with openai
app dev
give
outline
overview
interactive examples
show status:
same tech in ava most popular obsidian plugin
increase status (stars, etc, poisitive feeedback)
how to build example
use embedbase
why embedbase? problem faced
why did i need embeddings
unstructured data, scale, hard to do another way, embeddings hard to manage -> embedbase
(optional) embeddings
start example brain obsidian ava
ava links
how it works
## Overview
## interactive examples
ava
## brain
i started using a note taking app a few years ago and started to gather everything in it over time and reached 1 million words.
as it scale I would often get lost, couldn't find my notes, i experimented in building a plugin to recommend me similar notes, which allowed me to connect and generalise my ideas, for example "activation energy" is originally a concept in chemistry, and seeing my "velocity", "momentum" mental model notes i got an insight that activation energy is a powerful mental model
## alll
![[pika-1681547885672-1x.png]]
Today we're going to learn how to build a recommendation system for a social publishing platform like medium.com.
The whole project's source code we are going to build is available [here](https://github.com/different-ai/embedbase-recommendation-engine-example) You can also try an interactive version [here](https://embedbase-recommendation-engine-example.vercel.app/).
You can also deploy the end version on Vercel right now:
[](https://vercel.com/new/clone?repository-url=https%3A%2F%2Fgithub.com%2Fdifferent-ai%2Fembedbase-recommendation-engine-example%2Ftree%2Fcomplete&env=EMBEDBASE_API_KEY&envDescription=Embedbase%20API%20key%20is%20necessary%20to%20use%20Embedbase%20Cloud%2C%20you%20can%20also%20self-host%20it!&envLink=https%3A%2F%2Fapp.embedbase.xyz%2Fsignup)
https://www.loom.com/share/0bbdf3ca54dd4010872df2ad2b695e37
Hey I'm Louis, recently I built a recommendation system for a note taking app that I use, and today I want to share how to reproduce the same thing for an app like medium.com where you have recommendation on the side. You can implement the same experience in your product to increase user retention.
Checkout out the link below to follow a step-by-step tutorial to do so.
# Overview
![[Pasted image 20230415101954.png]]
We're going to cover a few things here, but the big picture is that:
1. We will need to turn our blog posts into a numerical representation called "embeddings" and store it in a database.
2. We will get the most similar blog posts to the one currently being read
3. We will display these similar blog posts on the side
Concretely, it only consists in:
1. `embedbase.dataset('recsys').batchAdd('<my blog posts>')`
2. `embedbase.dataset('recsys').search('<my blog post content>')`
Here, "search" allows you to get recommendations.
Yay! Two lines of code to implement the core of a recommendation system.
## Diving into the implementation
As a reminder, we will create a **recommendation engine for your publishing platform** we will be using NextJS and tailwindcss. We will also use [Embedbase](https://embedbase.xyz/) as a database.
Other libraries used encompass:
- `gray-matter` to parse Markdown front-matter (used to store document metadata, useful when you get the recommended results)
- `swr` to easily fetch data from NextJS API endpoints
- `heroicons` for icons
- Last, `react-markdown` to display nice Markdown to the user
Alright let's get started đ
Here's what you'll be needing for this tutorial
- [Embedbase api key](https://app.embedbase.xyz/signup), a database that allows you to find "most similar results". Not all databases are suited for this kind of job. Today we'll be using Embedbase which allows you to do just that. Embedbase allows you to find "semantic similarity" between a search query and stored content.
You can now `clone` the repository like so:
```bash
git clone https://github.com/different-ai/embedbase-recommendation-engine-example
```
Open it with your favourite IDE, and install the dependencies:
```bash
npm i
```
Now you should be able to run the project:
```bash
npm run dev
```
Write the Embedbase API key you just created in `.env.local`:
```
EMBEDBASE_API_KEY="<YOUR KEY>"
```
### Creating a few blog posts
As you can see the `_posts` folder contains a few blog posts, with some front-matter `yaml` metadata that give additional information about the file.
![[pika-1681489692627-1x.png]]
âšī¸ Disclaimer: **GPT-4 has been to write these blog posts, don't take it for my words or anything valuable.**
### Preparing and storing the documents
The first step requires us to store our blog posts in Embedbase.
To read the blog posts we've just written, we will need to implement a small piece of code to parse the Markdown front-matter and store it in documents metadata, it will improve the recommendation experience with additional information. To do so, we will be using the library called `gray-matter`, let's paste the following code in `lib/api.ts`:
```ts
import fs from 'fs'
import { join } from 'path'
import matter from 'gray-matter'
// Get the absolute path to the posts directory
const postsDirectory = join(process.cwd(), '_posts')
export function getPostBySlug(slug: string, fields: string[] = []) {
const realSlug = slug.replace(/\.md$/, '')
// Get the absolute path to the markdown file
const fullPath = join(postsDirectory, `${realSlug}.md`)
// Read the markdown file as a string
const fileContents = fs.readFileSync(fullPath, 'utf8')
// Use gray-matter to parse the post metadata section
const { data, content } = matter(fileContents)
type Items = {
[key: string]: string
}
const items: Items = {}
// Store each field in the items object
fields.forEach((field) => {
if (field === 'slug') {
items[field] = realSlug
}
if (field === 'content') {
items[field] = content
}
if (typeof data[field] !== 'undefined') {
items[field] = data[field]
}
})
return items
}
```
Now we can write the script that will store our documents in Embedbase, create a file `sync.ts` in the folder `scripts`.
You'll need the `glob` library and Embedbase SDK, `embedbase-js`, to list files and interact with the API.
In Embedbase, the concept of `dataset` represents one of your data sources, for example, the food you eat, your shopping list, customer feedback, or product reviews.
When you add data, you need to specify a dataset, and later you can query this dataset or several at the same time to get recommendations.
Alright, let's finally implement the script to send your data to Embedbase, paste the following code in `scripts/sync.ts`:
```ts
import glob from "glob";
import { createClient, BatchAddDocument } from 'embedbase-js'
import { splitText } from 'embedbase-js/dist/main/split';
import { getPostBySlug } from "../lib/api";
try {
// load the .env.local file to get the api key
require("dotenv").config({ path: ".env.local" });
} catch (e) {
console.log("No .env file found" + e);
}
// you can find the api key at https://app.embedbase.xyz
const apiKey = process.env.EMBEDBASE_API_KEY;
// this is using the hosted instance
const url = 'https://api.embedbase.xyz'
const embedbase = createClient(url, apiKey)
const sync = async () => {
const pathToPost = (path: string) => {
// We will use the function we created in the previous step
// to parse the post content and metadata
const post = getPostBySlug(path.split("/").slice(-1)[0], [
'title',
'date',
'slug',
'excerpt',
'content'
])
return {
data: post.content,
metadata: {
path: post.slug,
title: post.title,
date: post.date,
excerpt: post.excerpt,
}
}
};
// read all files under _posts/* with .md extension
const documents = glob.sync("_posts/**/*.md").map(pathToPost);
// using chunks is useful to send batches of documents to embedbase
// this is useful when you send a lot of data
const chunks = []
documents.map((document) =>
splitText(document.data, {}, async ({ chunk, start, end }) => chunks.push({
data: chunk,
metadata: document.metadata,
}))
)
const datasetId = `recsys`
console.log(`Syncing to ${datasetId} ${chunks.length} documents`);
const batchSize = 100;
// add to embedbase by batches of size 100
return Promise.all(
chunks.reduce((acc: BatchAddDocument[][], chunk, i) => {
if (i % batchSize === 0) {
acc.push(chunks.slice(i, i + batchSize));
}
return acc;
// here we are using the batchAdd method to send the documents to embedbase
}, []).map((chunk) => embedbase.dataset(datasetId).batchAdd(chunk))
)
.then((e) => e.flat())
.then((e) => console.log(`Synced ${e.length} documents to ${datasetId}`, e))
.catch(console.error);
}
sync();
```
Great, you can run it now
```bash
npx tsx ./scripts/sync.ts
```
This is what you should be seeing:
![[ray-so-export.png]]
Protip: you can even visualise your data now in Embedbase dashboard (which is [open-source](https://github.com/different-ai/embedbase/tree/main/dashboard)), [here](https://app.embedbase.xyz/dashboard/explorer/recsys?page=0):
![[pika-1681545554475-1x.png]]
Or ask questions about it using ChatGPT [here](https://app.embedbase.xyz/dashboard/playground) (make sure to tick the "recsys" dataset):
![[pika-1681545757224-1x.png]]
### Implementing the recommendation function
Now, we want to be able to get recommendations for our blog posts, we will add an API endpoint (if you are unfamiliar with NextJS API pages, check out [this](https://vercel.com/docs/concepts/functions/serverless-functions)) in `pages/api/recommend.ts`:
```ts
import { createClient } from "embedbase-js";
// Let's create an Embedbase client with our API key
const embedbase = createClient("https://api.embedbase.xyz", process.env.EMBEDBASE_API_KEY);
export default async function recommend (req, res) {
const query = req.body.query;
if (!query) {
res.status(400).json({ error: "Missing query" });
return;
}
const datasetId = "recsys";
// in this case even if we call the function search,
// we actually get recommendations
let results = await embedbase.dataset(datasetId).search(query, {
// We want to get the first 4 results
limit: 4,
});
res.status(200).json(results);
}
```
### Building the blog interface
All we have to do now is connect it all in a friendly user interface and we're done!
#### Components
As a reminder, we are using tailwindcss for the styling which allows you to "Rapidly build modern websites without ever leaving your HTML":
```tsx
// components/BlogSection.tsx
```
```tsx
// components/ContentSection.tsx
```
```tsx
// components/Markdown.tsx
```
#### Pages
When it comes to pages, we'll tweak `pages/index.tsx` to redirect to the first blog post page:
```tsx
// pages/index.tsx
```
And create this post page that will use the components we previously built in addition to the `recommend` api endpoint:
```tsx
// pages/posts/[post].tsx
```
Head to the browser to see the results (http://localhost:3000)
You should see this:
![[pika-1681547885672-1x.png]]
Of course, feel free to tweak the style đ.
## Closing thoughts
In summary, we have:
- Created a few blog posts
- Prepared and stored our blog posts in Embedbase
- Created the recommendation engine in a few lines of code
- Built an interface to display blog posts and their recommendations
Thank you for reading this blog post, you can find the complete code on the ["complete" branch of the repository](https://github.com/different-ai/embedbase-recommendation-engine-example/tree/complete)
If you liked this blog post, leave a star âī¸ on https://github.com/different-ai/embedbase, if you have any feedback, [issues are highly appreciated â¤ī¸](https://github.com/different-ai/embedbase/issues/new/choose). If you want to self-host it, please [book a demo](https://cal.com/potato/20min) đ.
Please feel free to contact us for any help or [join the Discord community](https://discord.gg/pMNeuGrDky) đĨ.
## Further reading
- [Check out this Github action that automatically index your blog posts at every push](https://github.com/different-ai/embedbase-recommendation-engine-example/blob/complete/.github/workflows/index.yaml)
- Examples and other resources on [GPT-4 powered Embedbase documentation](https://docs.embedbase.xyz)
in pages/posts/[slug].tsx we will slightly edit to create recommendation on the side:
```tsx
<div className="flex">
```
```tsx
<BlogSection />
```
![[Pasted image 20230413183529.png]]
npm i react-markdown
mkdir components
touch components/Markdown.tsx
https://github.com/different-ai/embedbase-recommendation-engine-example
```
louisbeaumont@Louiss-Air:~/Library/Mobile Documents/com~apple~CloudDocs/Documents/embedbase-reco-engine-example$ npm run sync
>
[email protected] sync
> npx tsx ./scripts/sync.ts
Syncing to recsys 52 documents
Synced 52 documents to recsys
```