Harnessing LangChain for Enhanced PDF Question Answering

Chapter 1: Introduction to LangChain and RAG

In the realm of large language models (LLMs) like GPT-4, their ability to answer questions spans numerous topics. However, their knowledge is confined to the data on which they were trained. While LLMs excel at language comprehension, instruction following, and basic reasoning, they often struggle with current knowledge, particularly when it comes to proprietary or specific data.

To address these limitations, the technique known as Retrieval Augmented Generation (RAG) has gained traction. This approach involves retrieving relevant documents and integrating them into the prompt, directing the LLM to formulate responses based solely on this information. This not only enhances the contextual background for the language model but also ensures its responses are anchored in factual data.

In this guide, we will explore how to leverage LangChain to interact with your data, specifically focusing on PDF files. LangChain serves as an open-source framework designed for developing applications that utilize LLMs. Once you become adept at working with PDFs, adapting to other data formats will be a seamless transition.

Chapter 2: Setting the Stage

Let's visualize a scenario: you are an aspiring football referee eager to master the rules of the game. While purchasing a physical copy of the laws of the game is one option, you may be intrigued by the potential of creating your own RefGPT. This system aims to provide a ChatGPT-like interface for referees, granting them access to the most current rules of the game.

For our RefGPT to be effective, it must have a reliable knowledge base to draw from, thereby ensuring it remains updated on the laws of the game. Below is a visual representation of the workflow involved in retrieval augmented generation (RAG).

Chapter 3: Document Loading and Processing

For this guide, we will utilize the 2023/24 edition of the Laws of the Game book, which you can download as a PDF. The first step is to load and segment this document into manageable chunks.

LangChain's document loaders are designed to transform our data source (in this case, a PDF) into a standardized document object that encompasses both the content and its associated metadata.

import os

from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader("./LawsofTheGame2022_23.pdf")

pages = loader.load() # Load the list of documents

trimmed_pages = pages[10:200] # Focus on relevant pages

As we examine the content, we can see that the Laws of the Game provide a universal framework for football, ensuring consistency across all levels of play.

Next, we can utilize a text splitter to break down the document into smaller, semantically coherent chunks.

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(

chunk_size=1500,

chunk_overlap=150

)

docs = text_splitter.split_documents(trimmed_pages)

Chapter 4: Understanding Vector Stores and Embeddings

With our documents segmented into manageable chunks, we need a method to retrieve them efficiently for question answering. This is where the concepts of embeddings and vector stores become essential.

Embeddings create a numerical representation of the text, capturing its semantic meaning. By converting text into vectors, we can evaluate the similarity between different pieces of content.

To facilitate this process, we will utilize OpenAI's embedding models.

from dotenv import load_dotenv

from langchain.embeddings.openai import OpenAIEmbeddings

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

embeddings = OpenAIEmbeddings()

For our vector store, we will implement Chroma, a lightweight and in-memory solution ideal for our needs.

from langchain.vectorstores import Chroma

vectordb = Chroma.from_documents(

documents=docs,

embedding=embeddings,

persist_directory="path_to_persist_directory"

)

Chapter 5: Retrieval and Question Answering

Retrieval is the cornerstone of our RAG framework. For this tutorial, we will use LangChain's ConversationalRetrievalChain, which enables interactive data querying while maintaining conversation history.

from langchain.chat_models import ChatOpenAI

from langchain.chains.conversation.memory import ConversationBufferWindowMemory

from langchain.chains import ConversationalRetrievalChain

llm = ChatOpenAI(

openai_api_key=OPENAI_API_KEY,

model_name='gpt-3.5-turbo',

temperature=0.0

)

conversational_memory = ConversationBufferWindowMemory(

memory_key='chat_history',

k=5,

return_messages=True

)

qa = ConversationalRetrievalChain.from_llm(

llm,

retriever=vectordb.as_retriever(),

memory=conversational_memory

)

This setup allows us to pose questions and receive answers, with the ability to ask follow-up queries based on previous interactions.

Chapter 6: Initializing Our Chat Agent

With a functional RAG system in place, we can enhance our capabilities by incorporating agents. Agents utilize LLMs to determine a series of actions based on user input, allowing for dynamic decision-making.

from langchain.agents import Tool

tools = [

Tool(

name='Knowledge Base',

func=qa.run,

description=(

'Utilize this tool for general knowledge inquiries.'

)

)

]

from langchain.agents import initialize_agent

agent = initialize_agent(

agent='chat-conversational-react-description',

tools=tools,

llm=llm,

verbose=True,

max_iterations=3,

early_stopping_method='generate',

memory=conversational_memory

)

Chapter 7: Interacting with the Chat Agent

Now, we can engage with our agent just as we did with the retrieval chain, posing various questions to assess its functionality.

query = "What is the maximum number of players allowed on the field?"

agent.run(query)

This interaction not only tests the agent's ability to recall previous exchanges but also showcases its potential for providing accurate answers.

Chapter 8: Tracing with LangSmith

Finally, we conclude by examining how to visualize and debug our agent's interactions using LangSmith. This tool simplifies the process of logging runs, enabling us to inspect the inputs and outputs of each component within our system.

To set up tracing, we must configure environment variables to connect with LangSmith.

from uuid import uuid4

unique_id = uuid4().hex[0:8]

os.environ["LANGCHAIN_TRACING_V2"] = "true"

os.environ["LANGCHAIN_PROJECT"] = f"RefGPT_RAG_Chromadb - {unique_id}"

os.environ["LANGCHAIN_API_KEY"] = os.getenv("LANGCHAIN_API_KEY")

By utilizing these tools, we can effectively track and evaluate our agent's performance, ensuring it operates as intended.

Credits

The diagrams illustrating the workings of embeddings and vector stores were inspired by the LangChain short course on DeepLearning.ai.

References

DeepLearning.AI: LangChain Chat with Your Data
LangChain Docs
LangSmith Docs

didismusings.com

Harnessing LangChain for Enhanced PDF Question Answering

Chapter 1: Introduction to LangChain and RAG

Chapter 2: Setting the Stage

Chapter 3: Document Loading and Processing

Chapter 4: Understanding Vector Stores and Embeddings

Chapter 5: Retrieval and Question Answering

Chapter 6: Initializing Our Chat Agent

Chapter 7: Interacting with the Chat Agent

Chapter 8: Tracing with LangSmith

Credits

References

Share the page:

Recent Post:

Embracing Solitude: Discovering Your Inner Strength in Isolation

Mastering the Art of Communication for Workplace Success

Creating an Effective Running Pace Calculator with R Shiny

Harnessing Digital Spiritual Tools for Modern Living

Unlocking Transferable Skills for Your Data Science Career

Overcoming Feelings of Inadequacy: Finding Your Inner Worth

Navigating the Risks of Personal Storytelling in Business

Finding Happiness Beyond High Expectations: A Balanced Approach