Advanced Conversational QA

Conversing with LLMs is a great way to demonstrate their capabilities. Adding chat history and external context can exponentially increase the complexity of the conversation. In this example, we'll show how to use Runnables to construct a conversational QA system that can answer questions, remember previous chats, and utilize external context.

The first step is to load our context (in this example we'll use the State Of The Union speech from 2022). This is also a good place to instantiate our retriever, and memory classes.

import { ChatOpenAI } from "langchain/chat_models/openai";
import { HNSWLib } from "langchain/vectorstores/hnswlib";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { BufferMemory } from "langchain/memory";
import * as fs from "fs";

/* Initialize the LLM to use to answer the question */
const model = new ChatOpenAI({});
/* Load in the file we want to do question answering over */
const text = fs.readFileSync("state_of_the_union.txt", "utf8");
/* Split the text into chunks */
const textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000 });
const docs = await textSplitter.createDocuments([text]);
/* Create the vectorstore and initialize it as a retriever */
const vectorStore = await HNSWLib.fromDocuments(docs, new OpenAIEmbeddings());
const retriever = vectorStore.asRetriever();
/* Initialize our BufferMemory store */
const memory = new BufferMemory({
  memoryKey: "chatHistory",
});

Next, we will need some helper utils to serialize our context (converting inputs to strings).

import { Document } from "langchain/document";

/* Ensure our chat history is always passed in as a string */
const serializeChatHistory = (chatHistory: string | Array<string>) => {
  if (Array.isArray(chatHistory)) {
    return chatHistory.join("\n");
  }
  return chatHistory;
};

Our conversational system performs two main LLM queries. The first is the question answering: given some context and chat history, answer the user's question. The second is when the LLM is passed chat history. In that case, the LLM will respond with a better formatted question, utilizing past history and the current question. Let's create our prompts for both.

import { PromptTemplate } from "langchain/prompts";

/**
 * Create a prompt template for generating an answer based on context and
 * a question.
 *
 * Chat history will be an empty string if it's the first question.
 *
 * inputVariables: ["chatHistory", "context", "question"]
 */
const questionPrompt = PromptTemplate.fromTemplate(
  `Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
----------------
CHAT HISTORY: {chatHistory}
----------------
CONTEXT: {context}
----------------
QUESTION: {question}
----------------
Helpful Answer:`
);

/**
 * Creates a prompt template for __generating a question__ to then ask an LLM
 * based on previous chat history, context and the question.
 *
 * inputVariables: ["chatHistory", "question"]
 */
const questionGeneratorTemplate =
  PromptTemplate.fromTemplate(`Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.
----------------
CHAT HISTORY: {chatHistory}
----------------
FOLLOWUP QUESTION: {question}
----------------
Standalone question:`);

Now we can start writing our main question answering sequence. For this, we'll put everything together using a RunnableSequence and one helper function that abstracts the last processing step.

import { RunnableSequence } from "langchain/schema/runnable";
import { StringOutputParser } from "langchain/schema/output_parser";
import { LLMChain } from "langchain/chains";
import { formatDocumentsAsString } from "langchain/util/document";

/**
 * A helper function which performs the LLM call and saves the context to memory.
 */
const handleProcessQuery = async (input: {
  question: string;
  context: string;
  chatHistory?: string | Array<string>;
}) => {
  const chain = new LLMChain({
    llm: model,
    prompt: questionPrompt,
    outputParser: new StringOutputParser(),
  });

  const { text } = await chain.call({
    ...input,
    chatHistory: serializeChatHistory(input.chatHistory ?? ""),
  });

  await memory.saveContext(
    {
      human: input.question,
    },
    {
      ai: text,
    }
  );

  return text;
};

const answerQuestionChain = RunnableSequence.from([
  {
    question: (input: {
      question: string;
      chatHistory?: string | Array<string>;
    }) => input.question,
  },
  {
    question: (previousStepResult: {
      question: string;
      chatHistory?: string | Array<string>;
    }) => previousStepResult.question,
    chatHistory: (previousStepResult: {
      question: string;
      chatHistory?: string | Array<string>;
    }) => serializeChatHistory(previousStepResult.chatHistory ?? ""),
    context: async (previousStepResult: {
      question: string;
      chatHistory?: string | Array<string>;
    }) => {
      // Fetch relevant docs and serialize to a string.
      const relevantDocs = await retriever.getRelevantDocuments(
        previousStepResult.question
      );
      const serialized = formatDocumentsAsString(relevantDocs);
      return serialized;
    },
  },
  handleProcessQuery,
]);

In the above code we're using a RunnableSequence which takes in one question input. This input then gets piped to the next step where we perform the following operations:

Pass the question through unchanged.
Serialize the chat history into a string, if it's been passed in.
Fetch relevant documents from the retriever and serialize them into a string.

After this we can create a RunnableSequence for generating questions based on past history and the current question.

const generateQuestionChain = RunnableSequence.from([
  {
    question: (input: {
      question: string;
      chatHistory: string | Array<string>;
    }) => input.question,
    chatHistory: async () => {
      const memoryResult = await memory.loadMemoryVariables({});
      return serializeChatHistory(memoryResult.chatHistory ?? "");
    },
  },
  questionGeneratorTemplate,
  model,
  // Take the result of the above model call, and pass it through to the
  // next RunnableSequence chain which will answer the question
  {
    question: (previousStepResult: { text: string }) => previousStepResult.text,
  },
  answerQuestionChain,
]);

The steps taken here are largely the same. We're taking a question as an input, and querying our memory store for the chatHistory. Next we pipe those values into our prompt template, and then the LLM model for performing the request. Finally, we take the result of the LLM query (unparsed) and pass it to the answerQuestionChain as a key-value pair where the key is question.

Now that we have our two main operations defined, we can create a RunnableBranch which given two inputs, it performs the first Runnable where the check function returns true. We also have to pass a fallback Runnable for cases where all checks return false (this should never occur in practice with our specific example).

import { RunnableBranch } from "langchain/schema/runnable";

const branch = RunnableBranch.from([
  [
    async () => {
      const memoryResult = await memory.loadMemoryVariables({});
      const isChatHistoryPresent = !memoryResult.chatHistory.length;

      return isChatHistoryPresent;
    },
    answerQuestionChain,
  ],
  [
    async () => {
      const memoryResult = await memory.loadMemoryVariables({});
      const isChatHistoryPresent =
        !!memoryResult.chatHistory && memoryResult.chatHistory.length;

      return isChatHistoryPresent;
    },
    generateQuestionChain,
  ],
  answerQuestionChain,
]);

The checks are fairly simple. We're just checking if the chat history is present or not.

Lastly we create our full chain which takes in a question, runs the RunnableBranch to determine which Runnable to use, and then returns the result!

/* Define our chain which calls the branch with our input. */
const fullChain = RunnableSequence.from([
  {
    question: (input: { question: string }) => input.question,
  },
  branch,
]);

/* Invoke our `Runnable` with the first question */
const resultOne = await fullChain.invoke({
  question: "What did the president say about Justice Breyer?",
});

console.log({ resultOne });
/**
 * {
 *   resultOne: 'The president thanked Justice Breyer for his service and described him as an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court.'
 * }
 */

/* Invoke our `Runnable` again with a followup question */
const resultTwo = await fullChain.invoke({
  question: "Was it nice?",
});

console.log({ resultTwo });
/**
 * {
 *   resultTwo: "Yes, the president's description of Justice Breyer was positive."
 * }
 */

That's it! Now we can have a full contextual conversation with our LLM.

The full code for this example can be found below.

import { ChatOpenAI } from "langchain/chat_models/openai";
import { HNSWLib } from "langchain/vectorstores/hnswlib";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { BufferMemory } from "langchain/memory";
import * as fs from "fs";
import { RunnableBranch, RunnableSequence } from "langchain/schema/runnable";
import { PromptTemplate } from "langchain/prompts";
import { StringOutputParser } from "langchain/schema/output_parser";
import { LLMChain } from "langchain/chains";
import { formatDocumentsAsString } from "langchain/util/document";

export const run = async () => {
  /* Initialize the LLM to use to answer the question */
  const model = new ChatOpenAI({});
  /* Load in the file we want to do question answering over */
  const text = fs.readFileSync("state_of_the_union.txt", "utf8");
  /* Split the text into chunks */
  const textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000 });
  const docs = await textSplitter.createDocuments([text]);
  /* Create the vectorstore */
  const vectorStore = await HNSWLib.fromDocuments(docs, new OpenAIEmbeddings());
  const retriever = vectorStore.asRetriever();

  const serializeChatHistory = (chatHistory: string | Array<string>) => {
    if (Array.isArray(chatHistory)) {
      return chatHistory.join("\n");
    }
    return chatHistory;
  };

  const memory = new BufferMemory({
    memoryKey: "chatHistory",
  });

  /**
   * Create a prompt template for generating an answer based on context and
   * a question.
   *
   * Chat history will be an empty string if it's the first question.
   *
   * inputVariables: ["chatHistory", "context", "question"]
   */
  const questionPrompt = PromptTemplate.fromTemplate(
    `Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
----------------
CHAT HISTORY: {chatHistory}
----------------
CONTEXT: {context}
----------------
QUESTION: {question}
----------------
Helpful Answer:`
  );

  /**
   * Creates a prompt template for __generating a question__ to then ask an LLM
   * based on previous chat history, context and the question.
   *
   * inputVariables: ["chatHistory", "question"]
   */
  const questionGeneratorTemplate =
    PromptTemplate.fromTemplate(`Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.
----------------
CHAT HISTORY: {chatHistory}
----------------
FOLLOWUP QUESTION: {question}
----------------
Standalone question:`);

  const handleProcessQuery = async (input: {
    question: string;
    context: string;
    chatHistory?: string | Array<string>;
  }) => {
    const chain = new LLMChain({
      llm: model,
      prompt: questionPrompt,
      outputParser: new StringOutputParser(),
    });

    const { text } = await chain.call({
      ...input,
      chatHistory: serializeChatHistory(input.chatHistory ?? ""),
    });

    await memory.saveContext(
      {
        human: input.question,
      },
      {
        ai: text,
      }
    );

    return text;
  };

  const answerQuestionChain = RunnableSequence.from([
    {
      question: (input: {
        question: string;
        chatHistory?: string | Array<string>;
      }) => input.question,
    },
    {
      question: (previousStepResult: {
        question: string;
        chatHistory?: string | Array<string>;
      }) => previousStepResult.question,
      chatHistory: (previousStepResult: {
        question: string;
        chatHistory?: string | Array<string>;
      }) => serializeChatHistory(previousStepResult.chatHistory ?? ""),
      context: async (previousStepResult: {
        question: string;
        chatHistory?: string | Array<string>;
      }) => {
        // Fetch relevant docs and serialize to a string.
        const relevantDocs = await retriever.getRelevantDocuments(
          previousStepResult.question
        );
        const serialized = formatDocumentsAsString(relevantDocs);
        return serialized;
      },
    },
    handleProcessQuery,
  ]);

  const generateQuestionChain = RunnableSequence.from([
    {
      question: (input: {
        question: string;
        chatHistory: string | Array<string>;
      }) => input.question,
      chatHistory: async () => {
        const memoryResult = await memory.loadMemoryVariables({});
        return serializeChatHistory(memoryResult.chatHistory ?? "");
      },
    },
    questionGeneratorTemplate,
    model,
    // Take the result of the above model call, and pass it through to the
    // next RunnableSequence chain which will answer the question
    {
      question: (previousStepResult: { text: string }) =>
        previousStepResult.text,
    },
    answerQuestionChain,
  ]);

  const branch = RunnableBranch.from([
    [
      async () => {
        const memoryResult = await memory.loadMemoryVariables({});
        const isChatHistoryPresent = !memoryResult.chatHistory.length;

        return isChatHistoryPresent;
      },
      answerQuestionChain,
    ],
    [
      async () => {
        const memoryResult = await memory.loadMemoryVariables({});
        const isChatHistoryPresent =
          !!memoryResult.chatHistory && memoryResult.chatHistory.length;

        return isChatHistoryPresent;
      },
      generateQuestionChain,
    ],
    answerQuestionChain,
  ]);

  const fullChain = RunnableSequence.from([
    {
      question: (input: { question: string }) => input.question,
    },
    branch,
  ]);

  const resultOne = await fullChain.invoke({
    question: "What did the president say about Justice Breyer?",
  });

  console.log({ resultOne });
  /**
   * {
   *   resultOne: 'The president thanked Justice Breyer for his service and described him as an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court.'
   * }
   */

  const resultTwo = await fullChain.invoke({
    question: "Was it nice?",
  });

  console.log({ resultTwo });
  /**
   * {
   *   resultTwo: "Yes, the president's description of Justice Breyer was positive."
   * }
   */
};

API Reference:

ChatOpenAI from langchain/chat_models/openai
HNSWLib from langchain/vectorstores/hnswlib
OpenAIEmbeddings from langchain/embeddings/openai
RecursiveCharacterTextSplitter from langchain/text_splitter
BufferMemory from langchain/memory
RunnableBranch from langchain/schema/runnable
RunnableSequence from langchain/schema/runnable
PromptTemplate from langchain/prompts
StringOutputParser from langchain/schema/output_parser
LLMChain from langchain/chains
formatDocumentsAsString from langchain/util/document