summaryrefslogtreecommitdiff
path: root/docs/configuration
diff options
context:
space:
mode:
Diffstat (limited to 'docs/configuration')
-rw-r--r--docs/configuration/caching.md37
-rw-r--r--docs/configuration/depth.md23
-rw-r--r--docs/configuration/domain-filter.md44
-rw-r--r--docs/configuration/link-following.md29
-rw-r--r--docs/configuration/proxies.md13
-rw-r--r--docs/configuration/rate-limiting.md14
-rw-r--r--docs/configuration/starting-url.md14
-rw-r--r--docs/configuration/url-filter.md42
8 files changed, 0 insertions, 216 deletions
diff --git a/docs/configuration/caching.md b/docs/configuration/caching.md
deleted file mode 100644
index 4a06435..0000000
--- a/docs/configuration/caching.md
+++ /dev/null
@@ -1,37 +0,0 @@
-# Caching
-
-The `cache` config option allows you to enable file-based request caching. When enabled every request cached with its raw response. When the cache is populated and you re-run the scraper, requests will be served directly from cache.
-
-This also allows you to modify your scraping script afterwards and collect new results immediately.
-
-Example:
-
-```javascript
-export const config = {
- url: "http://example.com/",
- cache: "file",
- // ...
-};
-```
-
-### Cache File
-
-When caching is enabled using the `cache: "file"` option, a `.cache` file will be created with the name of your scraping script.
-
-Example:
-
-```bash
-$ flyscrape run hackernews.js # Will populate: hackernews.cache
-```
-
-### Shared cache
-
-In case you want to share a cache between different scraping scripts, you can specify where to store the cache file.
-
-```javascript
-export const config = {
- url: "http://example.com/",
- cache: "file:/some/path/shared.cache",
- // ...
-};
-```
diff --git a/docs/configuration/depth.md b/docs/configuration/depth.md
deleted file mode 100644
index cabb0fa..0000000
--- a/docs/configuration/depth.md
+++ /dev/null
@@ -1,23 +0,0 @@
-# Depth
-
-The `depth` config option allows you to specify how deep the scraping process should follow links from the initial URL.
-
-When no value is provided or `depth` is set to `0` link following is disabled and it will only scrape the initial URL.
-
-Example:
-
-```javascript
-export const config = {
- url: "http://example.com/",
- depth: 2,
- // ...
-};
-```
-
-With the config provided in the example the scraper would follow links like this:
-
-```
-http://example.com/ (depth = 0, initial URL)
-↳ http://example.com/deeply (depth = 1)
- ↳ http://example.com/deeply/nested (depth = 2)
-```
diff --git a/docs/configuration/domain-filter.md b/docs/configuration/domain-filter.md
deleted file mode 100644
index e8adc30..0000000
--- a/docs/configuration/domain-filter.md
+++ /dev/null
@@ -1,44 +0,0 @@
-# Domain Filter
-
-The `allowedDomains` and `blockedDomains` config options allow you to specify a list of domains which are accessible or blocked during scraping.
-
-```javascript
-export const options = {
- url: "http://example.com/",
- allowedDomains: ["subdomain.example.com"],
- // ...
-};
-```
-
-### `allowedDomains`
-
-This config option controls which additional domains are allowed to be visted during scraping. The domain of the initial URL is always allowed.
-
-You can also allow all domains to be accessible by setting `allowedDomains` to `["*"]`. To then further restrict access, you can specify `blockedDomains`.
-
-Example:
-
-```javascript
-export const options = {
- url: "http://example.com/",
- allowedDomains: ["*"],
- // ...
-};
-```
-
-### `blockedDomains`
-
-This config option controls which additional domains are blocked from being accessed. By default all domains other than the domain of the initial URL or those specified in `allowedDomains` are blocked.
-
-You can best use `blockedDomains` in conjunction with `allowedDomains: ["*"]`, allowing the scraping process to access all domains except what's specified in `blockedDomains`.
-
-Example:
-
-```javascript
-export const options = {
- url: "http://example.com/",
- allowedDomains: ["*"],
- blockedDomains: ["google.com", "bing.com"],
- // ...
-};
-```
diff --git a/docs/configuration/link-following.md b/docs/configuration/link-following.md
deleted file mode 100644
index 6522ce8..0000000
--- a/docs/configuration/link-following.md
+++ /dev/null
@@ -1,29 +0,0 @@
-# Link Following
-
-The `follow` config option allows you to specify a list of CSS selectors that determine which links the scraper should follow.
-
-When no value is provided the scraper will follow all links found with the `a[href]` selector.
-
-Example:
-
-```javascript
-export const config = {
- url: "http://example.com/",
- follow: [".pagination > a[href]", ".nav a[href]"],
- // ...
-};
-```
-
-### Following non `href` attributes
-
-For special cases where the link is not to be found in the `href`, you specify a selector with a different ending attribute.
-
-Example:
-
-```javascript
-export const config = {
- url: "http://example.com/",
- follow: [".articles > div[data-url]"],
- // ...
-};
-```
diff --git a/docs/configuration/proxies.md b/docs/configuration/proxies.md
deleted file mode 100644
index 19434dc..0000000
--- a/docs/configuration/proxies.md
+++ /dev/null
@@ -1,13 +0,0 @@
-# Proxies
-
-The `proxies` config option allows you to specify a list of HTTP(S) proxies that should used during scraping. When multiple proxies are provided, the scraper will prick a proxy at random for each request.
-
-Example:
-
-```javascript
-export const config = {
- url: "http://example.com/",
- proxies: ["https://my-proxy.com:3128", "https://my-other-proxy.com:8080"],
- // ...
-};
-```
diff --git a/docs/configuration/rate-limiting.md b/docs/configuration/rate-limiting.md
deleted file mode 100644
index c3014d1..0000000
--- a/docs/configuration/rate-limiting.md
+++ /dev/null
@@ -1,14 +0,0 @@
-# Rate Limiting
-
-The `rate` config option allows you to specify at which rate the scraper should send out requests. The rate is measured in _Requests per Second_ (RPS) and can be set as a whole or decimal number to account for shorter and longer request intervals.
-
-When no `rate` is specified, rate limiting is disabled and the scraper will send out requests as fast as it can.
-
-Example:
-
-```javascript
-export const options = {
- url: "http://example.com/",
- rate: 50,
-};
-```
diff --git a/docs/configuration/starting-url.md b/docs/configuration/starting-url.md
deleted file mode 100644
index d5c0965..0000000
--- a/docs/configuration/starting-url.md
+++ /dev/null
@@ -1,14 +0,0 @@
-# Stating URL
-
-The `url` config option allows you to specify the initial URL at which the scraper should start its scraping process.
-
-When no value is provided, the scraper will not start and exit immediately.
-
-Example:
-
-```javascript
-export const config = {
- url: "http://example.com/",
- // ...
-};
-```
diff --git a/docs/configuration/url-filter.md b/docs/configuration/url-filter.md
deleted file mode 100644
index e2feda8..0000000
--- a/docs/configuration/url-filter.md
+++ /dev/null
@@ -1,42 +0,0 @@
-# URL Filter
-
-The `allowedURLs` and `blockedURLs` config options allow you to specify a list of URL patterns (in form of regular expressions) which are accessible or blocked during scraping.
-
-```javascript
-export const options = {
- url: "http://example.com/",
- allowedURLs: ["/articles/.*", "/authors/.*"],
- blockedURLs: ["/authors/admin"],
- // ...
-};
-```
-
-### `allowedURLs`
-
-This config option controls which URLs are allowed to be visted during scraping. When no value is provided all URLs are allowed to be visited if not otherwise blocked.
-
-When a list of URL patterns is provided, only URLs matching one or more of these patterns are allowed to be visted.
-
-Example:
-
-```javascript
-export const options = {
- url: "http://example.com/",
- allowedURLs: ["/products/"],
-};
-```
-
-### `blockedURLs`
-
-This config option controls which URLs are blocked from being visted during scraping.
-
-When a list of URL patterns is provided, URLs matching one or more of these patterns are blocked from to be visted.
-
-Example:
-
-```javascript
-export const options = {
- url: "http://example.com/",
- blockedURLs: ["/restricted"],
-};
-```