blob: e2feda8af7760a178f865f8751323b8b698c0e46 (
plain) (
blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
|
# URL Filter
The `allowedURLs` and `blockedURLs` config options allow you to specify a list of URL patterns (in form of regular expressions) which are accessible or blocked during scraping.
```javascript
export const options = {
url: "http://example.com/",
allowedURLs: ["/articles/.*", "/authors/.*"],
blockedURLs: ["/authors/admin"],
// ...
};
```
### `allowedURLs`
This config option controls which URLs are allowed to be visted during scraping. When no value is provided all URLs are allowed to be visited if not otherwise blocked.
When a list of URL patterns is provided, only URLs matching one or more of these patterns are allowed to be visted.
Example:
```javascript
export const options = {
url: "http://example.com/",
allowedURLs: ["/products/"],
};
```
### `blockedURLs`
This config option controls which URLs are blocked from being visted during scraping.
When a list of URL patterns is provided, URLs matching one or more of these patterns are blocked from to be visted.
Example:
```javascript
export const options = {
url: "http://example.com/",
blockedURLs: ["/restricted"],
};
```
|