Sitemap
Extends from the WebBaseLoader
, SitemapLoader
loads a sitemap from a given URL, and then scrapes and loads all pages in the sitemap, returning each page as a Document.
The scraping is done concurrently. There are reasonable limits to concurrent requests, defaulting to 2 per second. If you aren't concerned about being a good citizen, or you control the scrapped server, or don't care about load you can increase this limit. Note, while this will speed up the scraping process, it may cause the server to block you. Be careful!
Overview
Integration details
Class | Package | Local | Serializable | JS support |
---|---|---|---|---|
SiteMapLoader | langchain_community | ✅ | ❌ | ✅ |
Loader features
Source | Document Lazy Loading | Native Async Support |
---|---|---|
SiteMapLoader | ✅ |