Building a RAG from Scratch: A Beginner’s Guide (Part 2: Building a Web API)


In the first part of this series, we built a simple, command-line-only RAG application. In this post, we’ll turn our command-line application into a web API using FastAPI.



The Goal for This Post

By the end of this post, you will have a web API that can answer questions about the documents in the data directory. You will be able to interact with the API using curl or the Swagger UI.



Why FastAPI?

FastAPI is a modern, fast (high-performance) web framework for building APIs with Python. It’s easy to use, has great documentation, and comes with a built-in Swagger UI that allows you to interact with your API from the browser.



The Code

Here is the complete code for our FastAPI application.



requirements.txt

langchain
faiss-cpu
pdfplumber
fastapi
uvicorn
python-dotenv
huggingface_hub
Enter fullscreen mode

Exit fullscreen mode



app/models.py

from pydantic import BaseModel


class Query(BaseModel):
    question: str
Enter fullscreen mode

Exit fullscreen mode



app/main.py

from fastapi import FastAPI, HTTPException
import uvicorn
import os
from dotenv import load_dotenv

from app.models import Query
from app.rag_logic import (
    load_documents,
    split_documents,
    create_and_store_embeddings,
    load_retriever,
    create_rag_chain,
)

load_dotenv()

app = FastAPI()


def load_vectordb():
    vectordb_path = os.getenv("VECTOR_DB_DIR")
    if not vectordb_path:
        raise ValueError("VECTOR_DB_DIR environment variable not set.")
    data_dir = os.getenv("DATA_DIR")
    if not data_dir:
        raise ValueError("DATA_DIR environment variable not set.")
    if not os.path.exists(vectordb_path) or not os.listdir(vectordb_path):
        documents = load_documents(data_dir)
        chunks = split_documents(documents)
        create_and_store_embeddings(chunks, vectordb_path)
    else:
        print("Vector store already exists.")


@app.post("/ask")
def ask_question(query: Query):
    """Answers a question using the RAG pipeline."""
    try:
        vectordb_path = os.getenv("VECTOR_DB_DIR")
        if not vectordb_path:
            raise ValueError("VECTOR_DB_DIR environment variable not set.")
        retriever = load_retriever(vectordb_path)
        local_llm_url = os.getenv("LOCAL_LLM_URL")
        if not local_llm_url:
            raise ValueError("LOCAL_LLM_URL environment variable not set.")
        rag_chain = create_rag_chain(retriever, local_llm_url)
        response = rag_chain.invoke({"input": query.question})
        return {"answer": response["answer"]}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))


if __name__ == "__main__":
    load_vectordb()
    uvicorn.run(app, host="0.0.0.0", port=8000)

Enter fullscreen mode

Exit fullscreen mode



How to Run the Code

  1. Install the dependencies:

    pip install -r requirements.txt
    
  2. Create a .env file:

    Create a .env file in the root of the project and add the following environment variables:

    LOCAL_LLM_URL=http://localhost:1234
    VECTOR_DB_DIR=vector_store
    DATA_DIR=data
    
  3. Run the application:

    python app/main.py
    
  4. Interact with the API:

    You can interact with the API using curl:

    curl -X POST "http://localhost:8000/ask" -H "Content-Type: application/json" -d '{
      "question": "What is attention?"
    }'
    

    Or you can use the Swagger UI by navigating to http://localhost:8000/docs in your browser.



What’s Next

In the next post, we’ll containerize our application with Docker and refactor our code to support multiple LLM providers. Stay tuned!

Full implementation: hadywalied/AskAttentionAI



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *