diff options
| author | Philipp Tanlak <philipp.tanlak@gmail.com> | 2025-11-24 20:54:57 +0100 |
|---|---|---|
| committer | Philipp Tanlak <philipp.tanlak@gmail.com> | 2025-11-24 20:57:48 +0100 |
| commit | b1e2c8fd5cb5dfa46bc440a12eafaf56cd844b1c (patch) | |
| tree | 49d360fd6cbc6a2754efe93524ac47ff0fbe0f7d /content/docs/configuration/url-filter.md | |
Docs
Diffstat (limited to 'content/docs/configuration/url-filter.md')
| -rw-r--r-- | content/docs/configuration/url-filter.md | 42 |
1 files changed, 42 insertions, 0 deletions
diff --git a/content/docs/configuration/url-filter.md b/content/docs/configuration/url-filter.md new file mode 100644 index 0000000..80d3544 --- /dev/null +++ b/content/docs/configuration/url-filter.md @@ -0,0 +1,42 @@ +--- +title: 'URL Filter' +weight: 4 +prev: /docs/getting-started +--- + +The `allowedURLs` and `blockedURLs` config options allow you to specify a list of URL patterns (in form of regular expressions) which are accessible or blocked during scraping. + +```javascript {filename="Configuration"} +export const options = { + url: "http://example.com/", + allowedURLs: ["/articles/.*", "/authors/.*"], + blockedURLs: ["/authors/admin"], + // ... +}; +``` + +## Allowed URLs + +This config option controls which URLs are allowed to be visted during scraping. When no value is provided all URLs are allowed to be visited if not otherwise blocked. + +When a list of URL patterns is provided, only URLs matching one or more of these patterns are allowed to be visted. + +```javascript {filename="Configuration"} +export const options = { + url: "http://example.com/", + allowedURLs: ["/products/"], +}; +``` + +## Blocked URLs + +This config option controls which URLs are blocked from being visted during scraping. + +When a list of URL patterns is provided, URLs matching one or more of these patterns are blocked from to be visted. + +```javascript {filename="Configuration"} +export const options = { + url: "http://example.com/", + blockedURLs: ["/restricted"], +}; +``` |