Use local LLMs

The popularity of projects like PrivateGPT, llama.cpp, and GPT4All underscore the importance of running LLMs locally.

LangChain integrates with Ollama to run several open source LLMs locally with GPU support.

For example, here we show how to run Llama 2 locally (e.g., on your laptop) using local embeddings, a local vector store, and a local LLM. You can check out other open-source models supported by Ollama here.

This tutorial is designed for Node.js running on Mac OSX with at least 16 GB of RAM.

Setup

First, install packages needed for local embeddings and vector storage. For this demo, we'll use Llama 2 through Ollama as our LLM, Transformers.js for embeddings, and HNWSLib as a vector store for retrieval. We'll also install cheerio for scraping, though you can use any loader.

npm
Yarn
pnpm

npm install @xenova/transformers
npm install hnswlib-node
npm install cheerio

yarn add @xenova/transformers
yarn add hnswlib-node
yarn add cheerio

pnpm add @xenova/transformers
pnpm add hnswlib-node
pnpm add cheerio

You'll also need to set up Ollama and run a local instance using these instructions.

Document loading

Next, we need to load some documents. We'll use a blog post on agents as an example.

import { CheerioWebBaseLoader } from "langchain/document_loaders/web/cheerio";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { HNSWLib } from "langchain/vectorstores/hnswlib";
import { HuggingFaceTransformersEmbeddings } from "langchain/embeddings/hf_transformers";

const loader = new CheerioWebBaseLoader(
  "https://lilianweng.github.io/posts/2023-06-23-agent/"
);
const docs = await loader.load();

const splitter = new RecursiveCharacterTextSplitter({
  chunkOverlap: 0,
  chunkSize: 500,
});

const splitDocuments = await splitter.splitDocuments(docs);

const vectorstore = await HNSWLib.fromDocuments(
  splitDocuments,
  new HuggingFaceTransformersEmbeddings()
);

const retrievedDocs = await vectorstore.similaritySearch(
  "What are the approaches to Task Decomposition?"
);

console.log(retrievedDocs[0]);

/*
  Document {
    pageContent: 'Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.',
    metadata: {
      source: 'https://lilianweng.github.io/posts/2023-06-23-agent/',
      loc: { lines: [Object] }
    }
  }
*/

API Reference:

CheerioWebBaseLoader from langchain/document_loaders/web/cheerio
RecursiveCharacterTextSplitter from langchain/text_splitter
HNSWLib from langchain/vectorstores/hnswlib
HuggingFaceTransformersEmbeddings from langchain/embeddings/hf_transformers

Composable chain

We can use a chain for retrieval by passing in the retrieved docs and a prompt.

It formats the prompt template using the input key values provided and passes the formatted string to Llama 2, or another specified LLM.

In this case, the documents retrieved by the vector-store powered retriever are converted to strings and passed into the {context} variable in the prompt:

import { CheerioWebBaseLoader } from "langchain/document_loaders/web/cheerio";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { HNSWLib } from "langchain/vectorstores/hnswlib";
import { Ollama } from "langchain/llms/ollama";
import { PromptTemplate } from "langchain/prompts";
import {
  RunnableSequence,
  RunnablePassthrough,
} from "langchain/schema/runnable";
import { StringOutputParser } from "langchain/schema/output_parser";
import { HuggingFaceTransformersEmbeddings } from "langchain/embeddings/hf_transformers";
import { formatDocumentsAsString } from "langchain/util/document";

const loader = new CheerioWebBaseLoader(
  "https://lilianweng.github.io/posts/2023-06-23-agent/"
);
const docs = await loader.load();

const splitter = new RecursiveCharacterTextSplitter({
  chunkOverlap: 0,
  chunkSize: 500,
});

const splitDocuments = await splitter.splitDocuments(docs);

const vectorstore = await HNSWLib.fromDocuments(
  splitDocuments,
  new HuggingFaceTransformersEmbeddings()
);

const retriever = vectorstore.asRetriever();

// Prompt
const prompt =
  PromptTemplate.fromTemplate(`Answer the question based only on the following context:
{context}

Question: {question}`);

// Llama 2 7b wrapped by Ollama
const model = new Ollama({
  baseUrl: "http://localhost:11434",
  model: "llama2",
});

const chain = RunnableSequence.from([
  {
    context: retriever.pipe(formatDocumentsAsString),
    question: new RunnablePassthrough(),
  },
  prompt,
  model,
  new StringOutputParser(),
]);

const result = await chain.invoke(
  "What are the approaches to Task Decomposition?"
);

console.log(result);

/*
  Based on the provided context, there are three approaches to task decomposition:

  1. Using simple prompts like "Steps for XYZ" or "What are the subgoals for achieving XYZ?" to elicit a list of tasks from a language model (LLM).
  2. Providing task-specific instructions, such as "Write a story outline" for writing a novel, to guide the LLM in decomposing the task into smaller subtasks.
  3. Incorporating human inputs to help the LLM learn and improve its decomposition abilities over time.
*/

API Reference:

CheerioWebBaseLoader from langchain/document_loaders/web/cheerio
RecursiveCharacterTextSplitter from langchain/text_splitter
HNSWLib from langchain/vectorstores/hnswlib
Ollama from langchain/llms/ollama
PromptTemplate from langchain/prompts
RunnableSequence from langchain/schema/runnable
RunnablePassthrough from langchain/schema/runnable
StringOutputParser from langchain/schema/output_parser
HuggingFaceTransformersEmbeddings from langchain/embeddings/hf_transformers
formatDocumentsAsString from langchain/util/document

RetrievalQA

For an even simpler flow, use the preconfigured RetrievalQAChain.

This will use a default QA prompt and will retrieve from the vector store.

You can still pass in a custom prompt if desired.

type: "stuff" (see here) means that all the docs will be added (stuffed) into a prompt.

import { RetrievalQAChain, loadQAStuffChain } from "langchain/chains";
import { CheerioWebBaseLoader } from "langchain/document_loaders/web/cheerio";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { HNSWLib } from "langchain/vectorstores/hnswlib";
import { Ollama } from "langchain/llms/ollama";
import { PromptTemplate } from "langchain/prompts";
import { HuggingFaceTransformersEmbeddings } from "langchain/embeddings/hf_transformers";

const loader = new CheerioWebBaseLoader(
  "https://lilianweng.github.io/posts/2023-06-23-agent/"
);
const docs = await loader.load();

const splitter = new RecursiveCharacterTextSplitter({
  chunkOverlap: 0,
  chunkSize: 500,
});

const splitDocuments = await splitter.splitDocuments(docs);

const vectorstore = await HNSWLib.fromDocuments(
  splitDocuments,
  new HuggingFaceTransformersEmbeddings()
);

const retriever = vectorstore.asRetriever();

// Llama 2 7b wrapped by Ollama
const model = new Ollama({
  baseUrl: "http://localhost:11434",
  model: "llama2",
});

const template = `Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.
{context}
Question: {question}
Helpful Answer:`;

const QA_CHAIN_PROMPT = new PromptTemplate({
  inputVariables: ["context", "question"],
  template,
});

// Create a retrieval QA chain that uses a Llama 2-powered QA stuff chain with a custom prompt.
const chain = new RetrievalQAChain({
  combineDocumentsChain: loadQAStuffChain(model, { prompt: QA_CHAIN_PROMPT }),
  retriever,
  returnSourceDocuments: true,
  inputKey: "question",
});

const response = await chain.call({
  question: "What are the approaches to Task Decomposition?",
});

console.log(response);

/*
  {
    text: 'Thanks for asking! There are several approaches to task decomposition, which can be categorized into three main types:\n' +
      '\n' +
      '1. Using language models with simple prompting (e.g., "Steps for XYZ."), or asking for subgoals for achieving XYZ.\n' +
      '2. Providing task-specific instructions, such as writing a story outline for writing a novel.\n' +
      '3. Incorporating human inputs to decompose tasks.\n' +
      '\n' +
      'Each approach has its advantages and limitations, and the choice of which one to use depends on the specific task and the desired level of complexity and adaptability. Thanks for asking!',
    sourceDocuments: [
      Document {
        pageContent: 'Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.',
        metadata: [Object]
      },
      Document {
        pageContent: 'Fig. 1. Overview of a LLM-powered autonomous agent system.\n' +
          'Component One: Planning#\n' +
          'A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\n' +
          'Task Decomposition#',
        metadata: [Object]
      },
      Document {
        pageContent: 'Challenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error.',
        metadata: [Object]
      },
      Document {
        pageContent: 'Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.',
        metadata: [Object]
      }
    ]
  }
*/

API Reference:

RetrievalQAChain from langchain/chains
loadQAStuffChain from langchain/chains
CheerioWebBaseLoader from langchain/document_loaders/web/cheerio
RecursiveCharacterTextSplitter from langchain/text_splitter
HNSWLib from langchain/vectorstores/hnswlib
Ollama from langchain/llms/ollama
PromptTemplate from langchain/prompts
HuggingFaceTransformersEmbeddings from langchain/embeddings/hf_transformers

Use local LLMs

Setup​

Document loading​

API Reference:

Composable chain​

API Reference:

RetrievalQA​

API Reference:

Setup

Document loading

Composable chain

RetrievalQA