In the first part of this series, we built a simple, command-line-only RAG application. In this post, we’ll turn our command-line application into a web API using FastAPI.
The Goal for This Post
By the end of this post, you will have a web API that can answer questions about the documents in the data
directory. You will be able to interact with the API using curl
or the Swagger UI.
Why FastAPI?
FastAPI is a modern, fast (high-performance) web framework for building APIs with Python. It’s easy to use, has great documentation, and comes with a built-in Swagger UI that allows you to interact with your API from the browser.
The Code
Here is the complete code for our FastAPI application.
requirements.txt
langchain
faiss-cpu
pdfplumber
fastapi
uvicorn
python-dotenv
huggingface_hub
app/models.py
from pydantic import BaseModel
class Query(BaseModel):
question: str
app/main.py
from fastapi import FastAPI, HTTPException
import uvicorn
import os
from dotenv import load_dotenv
from app.models import Query
from app.rag_logic import (
load_documents,
split_documents,
create_and_store_embeddings,
load_retriever,
create_rag_chain,
)
load_dotenv()
app = FastAPI()
def load_vectordb():
vectordb_path = os.getenv("VECTOR_DB_DIR")
if not vectordb_path:
raise ValueError("VECTOR_DB_DIR environment variable not set.")
data_dir = os.getenv("DATA_DIR")
if not data_dir:
raise ValueError("DATA_DIR environment variable not set.")
if not os.path.exists(vectordb_path) or not os.listdir(vectordb_path):
documents = load_documents(data_dir)
chunks = split_documents(documents)
create_and_store_embeddings(chunks, vectordb_path)
else:
print("Vector store already exists.")
@app.post("/ask")
def ask_question(query: Query):
"""Answers a question using the RAG pipeline."""
try:
vectordb_path = os.getenv("VECTOR_DB_DIR")
if not vectordb_path:
raise ValueError("VECTOR_DB_DIR environment variable not set.")
retriever = load_retriever(vectordb_path)
local_llm_url = os.getenv("LOCAL_LLM_URL")
if not local_llm_url:
raise ValueError("LOCAL_LLM_URL environment variable not set.")
rag_chain = create_rag_chain(retriever, local_llm_url)
response = rag_chain.invoke({"input": query.question})
return {"answer": response["answer"]}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
if __name__ == "__main__":
load_vectordb()
uvicorn.run(app, host="0.0.0.0", port=8000)
How to Run the Code
-
Install the dependencies:
pip install -r requirements.txt
-
Create a
.env
file:Create a
.env
file in the root of the project and add the following environment variables:LOCAL_LLM_URL=http://localhost:1234 VECTOR_DB_DIR=vector_store DATA_DIR=data
-
Run the application:
python app/main.py
-
Interact with the API:
You can interact with the API using
curl
:curl -X POST "http://localhost:8000/ask" -H "Content-Type: application/json" -d '{ "question": "What is attention?" }'
Or you can use the Swagger UI by navigating to
http://localhost:8000/docs
in your browser.
What’s Next
In the next post, we’ll containerize our application with Docker and refactor our code to support multiple LLM providers. Stay tuned!
Full implementation: hadywalied/AskAttentionAI