Skip to main content

TokenTextSplitter

Finally, TokenTextSplitter splits a raw text string by first converting the text into BPE tokens, then split these tokens into chunks and convert the tokens within a single chunk back into text.

import { Document } from "langchain/document";
import { TokenTextSplitter } from "langchain/text_splitter";

const text = "foo bar baz 123";

const splitter = new TokenTextSplitter({
encodingName: "gpt2",
chunkSize: 10,
chunkOverlap: 0,
});

const output = await splitter.createDocuments([text]);