Tutorial: First Step in a RAG pipeline, Data Ingestion (Part 2) - Databases Configuration.
Before we can store our data, we need to set up the two databases we’ll be using in this tutorial: the vector database hosted by Qdrant and the CloudFlare D1 database. As mentioned in the previous part of this tutorial, you must create an account for both of these platforms. Let’s get started.
Creating a Qdrant Database
The steps for creating a Qdrant database are straightforward. After creating your account, you need to create a cluster and then generate an API key to interact with the cluster from another application.
A free-tier cluster includes a single node with the following resources:
Resource | Value |
---|---|
RAM | 1 GB |
vCPU | 0.5 |
Disk space | 4 GB |
Nodes | 1 |
This configuration supports serving about 1 M vectors of
The steps to create a Qdrant Cloud cluster are the following:
- Start in the Clusters section of the Cloud Dashboard.
- Select Clusters and then click + Create.
- In the Create a cluster screen select Free.
- When you’re ready, select Create. It takes some time to provision your cluster.
After creating the cluster, the page will automatically guide you to the API key creation process. Be sure to save this key in a secure location and avoid sharing it with anyone, as we’ll need it for later use.
Library to Manage Qdrant Database
To streamline interactions with the Qdrant database, I’ve created a utility script in TypeScript named qdrant.ts
. To begin using the Qdrant API, we must install the @qdrant/js-client-rest
npm package:
Once installed, we can import the QdrantClient
class into our script and define the data point type that will be stored in the database.
Finally, I’ve exported a utility class called QdrantDatabase
containing all the necessary functions for managing our database.
The constructor of this class accepts the name of the collection to be created, allowing you to create multiple collections if needed. It also takes the size of the vector to be stored in that collection. We’ll be using a vector size of createClient
method, which is used for all interactions with the vector database. To create the client, you’ll need the API key and cluster URL provided during the creation process described in the previous section. The constructor also creates the collection to store the vectors if it doesn’t already exist with the specified name, using the createCollection
method.
Finally, the class includes two additional methods: addPoints
for storing points in the database and search
for conducting similarity searches for a query vector within the database:
Create a CloudFlare D1 database and a table to store the text
We’ll use Node.js to create and configure our CloudFlare D1 database. Before we begin, you’ll need to choose either a CloudFlare Worker or a CloudFlare Page for your project, depending on your specific needs.
In this separate tutorial, I explain how to create a project with AstroJS and Cloudflare Pages: link to AstroJS documentation.
Now, let’s create a database within your project using the following command:
The first time you run this command, it will open your browser to prompt you to log in to your CloudFlare account. I’ve named the database rag-database
, but feel free to choose a different name if desired. This command will also generate the following lines in your terminal with a valid ID:
Copy these lines into your wrangler.toml
configuration file, which was created when you initialised the project with Wrangler. We’ll use the database ID to connect to the database later.
Finally let’s create a table. In this application, we’ll create a notes
table in D1, which will allow us to store notes and later retrieve them with the ID field stored in Qdrant. To create this table, run a SQL command using wrangler d1 execute
:
Finally, you can also test the database by adding one entry to your table:
Generate a CloudFlare API token
Before moving on to the next part, it’s necessary to create an API token for making calls to the CloudFlare D1 and Workers AI APIs.
To create the token, navigate to the “My Profile” page, which is located in the drop-down menu when you click on the user logo in the top right corner of the CloudFlare dashboard. Once there, go to “API Tokens” and select “Create Token.” Assign a desired name to the token and add the following permissions:
After completing the remaining steps, securely store the API token in a private location. We’ll use it in the next part of the tutorial.
Now that we’ve completed the setup, we’re ready to start storing our processed data in both databases. Let’s see how to do that in the next part of this tutorial.