diff options
Diffstat (limited to 'public/docs/configuration/index.xml')
| -rw-r--r-- | public/docs/configuration/index.xml | 686 |
1 files changed, 686 insertions, 0 deletions
diff --git a/public/docs/configuration/index.xml b/public/docs/configuration/index.xml new file mode 100644 index 0000000..85c2f0d --- /dev/null +++ b/public/docs/configuration/index.xml @@ -0,0 +1,686 @@ +<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"> + <channel> + <title>Flyscrape – Configuration</title> + <link>https://flyscrape.com/docs/configuration/</link> + <description>Recent content in Configuration on Flyscrape</description> + <generator>Hugo -- gohugo.io</generator> + <language>en-us</language> + + <atom:link href="https://flyscrape.com/docs/configuration/index.xml" rel="self" type="application/rss+xml" /> + + + + + + + + <item> + <title>Starting URL</title> + <link>https://flyscrape.com/docs/configuration/starting-url/</link> + <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate> + + <guid>https://flyscrape.com/docs/configuration/starting-url/</guid> + <description> + + + <p>The <code>url</code> config option allows you to specify the initial URL at which the scraper should start its scraping process.</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><div class="filename">Configuration</div><div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">config</span> <span class="o">=</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="nx">url</span><span class="o">:</span> <span class="s2">&#34;http://example.com/&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="c1">// ... +</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="p">};</span></span></span></code></pre></div></div><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-8"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> +<h2>Multiple starting URLs<span class="absolute -mt-20" id="multiple-starting-urls"></span> + <a href="#multiple-starting-urls" class="subheading-anchor" aria-label="Permalink for this section"></a></h2><p>In case you have more than one URL you want to scrape (or to start from) you can specify them with the <code>urls</code> config option.</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><div class="filename">Configuration</div><div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">config</span> <span class="o">=</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="nx">urls</span><span class="o">:</span> <span class="p">[</span> +</span></span><span class="line"><span class="cl"> <span class="s2">&#34;http://example.com/&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="s2">&#34;http://anothersite.com/&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="s2">&#34;http://yetanothersite.com/&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="p">],</span> +</span></span><span class="line"><span class="cl"> <span class="c1">// ... +</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="p">};</span></span></span></code></pre></div></div><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-8"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> + + </description> + </item> + + <item> + <title>Depth</title> + <link>https://flyscrape.com/docs/configuration/depth/</link> + <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate> + + <guid>https://flyscrape.com/docs/configuration/depth/</guid> + <description> + + + <p>The <code>depth</code> config option allows you to specify how deep the scraping process should follow links from the initial URL.</p> +<p>When no value is provided or <code>depth</code> is set to <code>0</code> link following is disabled and it will only scrape the initial URL.</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><div class="filename">Configuration</div><div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">config</span> <span class="o">=</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="nx">url</span><span class="o">:</span> <span class="s2">&#34;http://example.com/&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="nx">depth</span><span class="o">:</span> <span class="mi">2</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="c1">// ... +</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="p">};</span></span></span></code></pre></div></div><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-8"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> +<p>With the config provided in the example the scraper would follow links like this:</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><pre><code>http://example.com/ (depth = 0, initial URL) +↳ http://example.com/deeply (depth = 1) + ↳ http://example.com/deeply/nested (depth = 2)</code></pre><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-0"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> + + </description> + </item> + + <item> + <title>Domain Filter</title> + <link>https://flyscrape.com/docs/configuration/domain-filter/</link> + <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate> + + <guid>https://flyscrape.com/docs/configuration/domain-filter/</guid> + <description> + + + <p>The <code>allowedDomains</code> and <code>blockedDomains</code> config options allow you to specify a list of domains which are accessible or blocked during scraping.</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><div class="filename">Configuration</div><div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">options</span> <span class="o">=</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="nx">url</span><span class="o">:</span> <span class="s2">&#34;http://example.com/&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="nx">allowedDomains</span><span class="o">:</span> <span class="p">[</span><span class="s2">&#34;subdomain.example.com&#34;</span><span class="p">],</span> +</span></span><span class="line"><span class="cl"> <span class="c1">// ... +</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="p">};</span></span></span></code></pre></div></div><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-8"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> +<h2>Allowed Domains<span class="absolute -mt-20" id="allowed-domains"></span> + <a href="#allowed-domains" class="subheading-anchor" aria-label="Permalink for this section"></a></h2><p>This config option controls which additional domains are allowed to be visted during scraping. The domain of the initial URL is always allowed.</p> +<p>You can also allow all domains to be accessible by setting <code>allowedDomains</code> to <code>[&quot;*&quot;]</code>. To then further restrict access, you can specify <code>blockedDomains</code>.</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><div class="filename">Configuration</div><div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">options</span> <span class="o">=</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="nx">url</span><span class="o">:</span> <span class="s2">&#34;http://example.com/&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="nx">allowedDomains</span><span class="o">:</span> <span class="p">[</span><span class="s2">&#34;*&#34;</span><span class="p">],</span> +</span></span><span class="line"><span class="cl"> <span class="c1">// ... +</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="p">};</span></span></span></code></pre></div></div><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-8"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> +<h2>Blocked Domains<span class="absolute -mt-20" id="blocked-domains"></span> + <a href="#blocked-domains" class="subheading-anchor" aria-label="Permalink for this section"></a></h2><p>This config option controls which additional domains are blocked from being accessed. By default all domains other than the domain of the initial URL or those specified in <code>allowedDomains</code> are blocked.</p> +<p>You can best use <code>blockedDomains</code> in conjunction with <code>allowedDomains: [&quot;*&quot;]</code>, allowing the scraping process to access all domains except what&rsquo;s specified in <code>blockedDomains</code>.</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><div class="filename">Configuration</div><div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">options</span> <span class="o">=</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="nx">url</span><span class="o">:</span> <span class="s2">&#34;http://example.com/&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="nx">allowedDomains</span><span class="o">:</span> <span class="p">[</span><span class="s2">&#34;*&#34;</span><span class="p">],</span> +</span></span><span class="line"><span class="cl"> <span class="nx">blockedDomains</span><span class="o">:</span> <span class="p">[</span><span class="s2">&#34;google.com&#34;</span><span class="p">,</span> <span class="s2">&#34;bing.com&#34;</span><span class="p">],</span> +</span></span><span class="line"><span class="cl"> <span class="c1">// ... +</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="p">};</span></span></span></code></pre></div></div><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-8"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> + + </description> + </item> + + <item> + <title>URL Filter</title> + <link>https://flyscrape.com/docs/configuration/url-filter/</link> + <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate> + + <guid>https://flyscrape.com/docs/configuration/url-filter/</guid> + <description> + + + <p>The <code>allowedURLs</code> and <code>blockedURLs</code> config options allow you to specify a list of URL patterns (in form of regular expressions) which are accessible or blocked during scraping.</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><div class="filename">Configuration</div><div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">options</span> <span class="o">=</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="nx">url</span><span class="o">:</span> <span class="s2">&#34;http://example.com/&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="nx">allowedURLs</span><span class="o">:</span> <span class="p">[</span><span class="s2">&#34;/articles/.*&#34;</span><span class="p">,</span> <span class="s2">&#34;/authors/.*&#34;</span><span class="p">],</span> +</span></span><span class="line"><span class="cl"> <span class="nx">blockedURLs</span><span class="o">:</span> <span class="p">[</span><span class="s2">&#34;/authors/admin&#34;</span><span class="p">],</span> +</span></span><span class="line"><span class="cl"> <span class="c1">// ... +</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="p">};</span></span></span></code></pre></div></div><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-8"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> +<h2>Allowed URLs<span class="absolute -mt-20" id="allowed-urls"></span> + <a href="#allowed-urls" class="subheading-anchor" aria-label="Permalink for this section"></a></h2><p>This config option controls which URLs are allowed to be visted during scraping. When no value is provided all URLs are allowed to be visited if not otherwise blocked.</p> +<p>When a list of URL patterns is provided, only URLs matching one or more of these patterns are allowed to be visted.</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><div class="filename">Configuration</div><div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">options</span> <span class="o">=</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="nx">url</span><span class="o">:</span> <span class="s2">&#34;http://example.com/&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="nx">allowedURLs</span><span class="o">:</span> <span class="p">[</span><span class="s2">&#34;/products/&#34;</span><span class="p">],</span> +</span></span><span class="line"><span class="cl"><span class="p">};</span></span></span></code></pre></div></div><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-8"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> +<h2>Blocked URLs<span class="absolute -mt-20" id="blocked-urls"></span> + <a href="#blocked-urls" class="subheading-anchor" aria-label="Permalink for this section"></a></h2><p>This config option controls which URLs are blocked from being visted during scraping.</p> +<p>When a list of URL patterns is provided, URLs matching one or more of these patterns are blocked from to be visted.</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><div class="filename">Configuration</div><div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">options</span> <span class="o">=</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="nx">url</span><span class="o">:</span> <span class="s2">&#34;http://example.com/&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="nx">blockedURLs</span><span class="o">:</span> <span class="p">[</span><span class="s2">&#34;/restricted&#34;</span><span class="p">],</span> +</span></span><span class="line"><span class="cl"><span class="p">};</span></span></span></code></pre></div></div><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-8"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> + + </description> + </item> + + <item> + <title>Link Following</title> + <link>https://flyscrape.com/docs/configuration/link-following/</link> + <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate> + + <guid>https://flyscrape.com/docs/configuration/link-following/</guid> + <description> + + + <p>The <code>follow</code> config option allows you to specify a list of CSS selectors that determine which links the scraper should follow.</p> +<p>When no value is provided the scraper will follow all links found with the <code>a[href]</code> selector.</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><div class="filename">Configuration</div><div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">config</span> <span class="o">=</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="nx">url</span><span class="o">:</span> <span class="s2">&#34;http://example.com/&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="nx">follow</span><span class="o">:</span> <span class="p">[</span> +</span></span><span class="line"><span class="cl"> <span class="s2">&#34;.pagination &gt; a[href]&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="s2">&#34;.nav a[href]&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="p">],</span> +</span></span><span class="line"><span class="cl"> <span class="c1">// ... +</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="p">};</span></span></span></code></pre></div></div><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-8"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> +<h2>Following non <code>href</code> attributes<span class="absolute -mt-20" id="following-non-href-attributes"></span> + <a href="#following-non-href-attributes" class="subheading-anchor" aria-label="Permalink for this section"></a></h2><p>For special cases where the link is not to be found in the <code>href</code>, you specify a selector with a different ending attribute.</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><div class="filename">Configuration</div><div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">config</span> <span class="o">=</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="nx">url</span><span class="o">:</span> <span class="s2">&#34;http://example.com/&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="nx">follow</span><span class="o">:</span> <span class="p">[</span> +</span></span><span class="line"><span class="cl"> <span class="s2">&#34;.articles &gt; div[data-url]&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="p">],</span> +</span></span><span class="line"><span class="cl"> <span class="c1">// ... +</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="p">};</span></span></span></code></pre></div></div><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-8"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> + + </description> + </item> + + <item> + <title>Concurrency</title> + <link>https://flyscrape.com/docs/configuration/concurrency/</link> + <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate> + + <guid>https://flyscrape.com/docs/configuration/concurrency/</guid> + <description> + + + <p>The concurrency setting controls the number of simultaneous requests that the scraper can make. This is specified in the configuration object of your scraping script.</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">config</span> <span class="o">=</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="c1">// Specify the number of concurrent requests. +</span></span></span><span class="line"><span class="cl"><span class="c1"></span> <span class="nx">concurrency</span><span class="o">:</span> <span class="mi">5</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"><span class="p">};</span></span></span></code></pre></div></div><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-0"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> +<p>In the above example, the scraper will make up to 5 requests at the same time.</p> +<p>If the concurrency setting is not specified, there is no limit to the number of concurrent requests.</p> + + </description> + </item> + + <item> + <title>Rate Limiting</title> + <link>https://flyscrape.com/docs/configuration/rate-limiting/</link> + <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate> + + <guid>https://flyscrape.com/docs/configuration/rate-limiting/</guid> + <description> + + + <p>The <code>rate</code> config option allows you to specify at which rate the scraper should send out requests. The rate is measured in <em>Requests per Minute</em> (RPM).</p> +<p>When no <code>rate</code> is specified, rate limiting is disabled and the scraper will send out requests as fast as it can.</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><div class="filename">Configuration</div><div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">options</span> <span class="o">=</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="nx">url</span><span class="o">:</span> <span class="s2">&#34;http://example.com/&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="nx">rate</span><span class="o">:</span> <span class="mi">100</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"><span class="p">};</span></span></span></code></pre></div></div><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-8"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> + + </description> + </item> + + <item> + <title>Retry</title> + <link>https://flyscrape.com/docs/configuration/retry/</link> + <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate> + + <guid>https://flyscrape.com/docs/configuration/retry/</guid> + <description> + + + <p>The retry feature allows the scraper to automatically retry failed requests. This is particularly useful when dealing with unstable networks or servers that occasionally return error status codes.</p> +<p>The retry feature is automatically enabled and will retry requests that return the following HTTP status codes:</p> +<ul> +<li>403 Forbidden</li> +<li>408 Request Timeout</li> +<li>425 Too Early</li> +<li>429 Too Many Requests</li> +<li>500 Internal Server Error</li> +<li>502 Bad Gateway</li> +<li>503 Service Unavailable</li> +<li>504 Gateway Timeout</li> +</ul> +<h3>Retry Delays<span class="absolute -mt-20" id="retry-delays"></span> + <a href="#retry-delays" class="subheading-anchor" aria-label="Permalink for this section"></a></h3><p>After a failed request, the scraper will wait for a certain amount of time before retrying the request. The delay increases with each consecutive failed attempt, according to the following schedule:</p> +<ul> +<li>1st retry: 1 second delay</li> +<li>2nd retry: 2 seconds delay</li> +<li>3rd retry: 5 seconds delay</li> +<li>4th retry: 10 seconds delay</li> +</ul> + + </description> + </item> + + <item> + <title>Caching</title> + <link>https://flyscrape.com/docs/configuration/caching/</link> + <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate> + + <guid>https://flyscrape.com/docs/configuration/caching/</guid> + <description> + + + <p>The <code>cache</code> config option allows you to enable file-based request caching. When enabled every request cached with its raw response. When the cache is populated and you re-run the scraper, requests will be served directly from cache.</p> +<p>This also allows you to modify your scraping script afterwards and collect new results immediately.</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><div class="filename">Configuration</div><div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">config</span> <span class="o">=</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="nx">url</span><span class="o">:</span> <span class="s2">&#34;http://example.com/&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="nx">cache</span><span class="o">:</span> <span class="s2">&#34;file&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="c1">// ... +</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="p">};</span></span></span></code></pre></div></div><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-8"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> +<h3>Cache File<span class="absolute -mt-20" id="cache-file"></span> + <a href="#cache-file" class="subheading-anchor" aria-label="Permalink for this section"></a></h3><p>When caching is enabled using the <code>cache: &quot;file&quot;</code> option, a <code>.cache</code> file will be created with the name of your scraping script.</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><div class="filename">Terminal</div><div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ flyscrape run hackernews.js <span class="c1"># Will populate: hackernews.cache</span></span></span></code></pre></div></div><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-8"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> +<h3>Shared cache<span class="absolute -mt-20" id="shared-cache"></span> + <a href="#shared-cache" class="subheading-anchor" aria-label="Permalink for this section"></a></h3><p>In case you want to share a cache between different scraping scripts, you can specify where to store the cache file.</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><div class="filename">Configuration</div><div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">config</span> <span class="o">=</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="nx">url</span><span class="o">:</span> <span class="s2">&#34;http://example.com/&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="nx">cache</span><span class="o">:</span> <span class="s2">&#34;file:/some/path/shared.cache&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="c1">// ... +</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="p">};</span></span></span></code></pre></div></div><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-8"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> + + </description> + </item> + + <item> + <title>Proxies</title> + <link>https://flyscrape.com/docs/configuration/proxies/</link> + <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate> + + <guid>https://flyscrape.com/docs/configuration/proxies/</guid> + <description> + + + <p>The proxy feature allows you to route your scraping requests through a specified HTTP(S) proxy. This can be useful for bypassing IP-based rate limits or accessing region-restricted content.</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">config</span> <span class="o">=</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="c1">// Specify a single HTTP(S) proxy URL. +</span></span></span><span class="line"><span class="cl"><span class="c1"></span> <span class="nx">proxy</span><span class="o">:</span> <span class="s2">&#34;http://someproxy.com:8043&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"><span class="p">};</span></span></span></code></pre></div></div><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-0"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> +<p>In the above example, all scraping requests will be routed through the proxy at <code>http://someproxy.com:8043</code>.</p> +<h2>Multiple Proxies<span class="absolute -mt-20" id="multiple-proxies"></span> + <a href="#multiple-proxies" class="subheading-anchor" aria-label="Permalink for this section"></a></h2><p>You can also specify multiple proxy URLs. The scraper will rotate between these proxies for each request.</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">config</span> <span class="o">=</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="c1">// Specify multiple HTTP(S) proxy URLs. +</span></span></span><span class="line"><span class="cl"><span class="c1"></span> <span class="nx">proxies</span><span class="o">:</span> <span class="p">[</span> +</span></span><span class="line"><span class="cl"> <span class="s2">&#34;http://someproxy.com:8043&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="s2">&#34;http://someotherproxy.com:8043&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="p">],</span> +</span></span><span class="line"><span class="cl"><span class="p">};</span></span></span></code></pre></div></div><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-0"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> +<p>In this example, the scraper will randomly pick between the proxies at <code>http://someproxy.com:8043</code> and <code>http://someotherproxy.com:8043</code>.</p> +<p>Note: If both <code>proxy</code> and <code>proxies</code> are specified, all proxies will be respected.</p> + + </description> + </item> + + <item> + <title>Cookies</title> + <link>https://flyscrape.com/docs/configuration/cookies/</link> + <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate> + + <guid>https://flyscrape.com/docs/configuration/cookies/</guid> + <description> + + + <p>The Cookies configuration in the <code>flyscrape</code> script&rsquo;s configuration object allows you to specify the behavior of the cookie store during the scraping process. Cookies are often used for authentication and session management on websites.</p> +<h2>Cookies Configuration<span class="absolute -mt-20" id="cookies-configuration"></span> + <a href="#cookies-configuration" class="subheading-anchor" aria-label="Permalink for this section"></a></h2><p>To configure the cookie store behavior, set the <code>cookies</code> field in your configuration. The <code>cookies</code> option supports three values: <code>&quot;chrome&quot;</code>, <code>&quot;edge&quot;</code>, and <code>&quot;firefox&quot;</code>. Each value corresponds to using the cookie store of the respective local browser.</p> +<p>When the <code>cookies</code> option is set to <code>&quot;chrome&quot;</code>, <code>&quot;edge&quot;</code>, or <code>&quot;firefox&quot;</code>, <code>flyscrape</code> utilizes the cookie store of the user&rsquo;s installed browser.</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><div class="filename">Configuration</div><div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">config</span> <span class="o">=</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="nx">cookies</span><span class="o">:</span> <span class="s2">&#34;chrome&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"><span class="p">};</span></span></span></code></pre></div></div><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-8"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> +<p>In the above example, the <code>cookies</code> option is set to <code>&quot;chrome&quot;</code>, indicating that <code>flyscrape</code> should use the cookie store of the local Chrome browser.</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><div class="filename">Configuration</div><div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">config</span> <span class="o">=</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="nx">cookies</span><span class="o">:</span> <span class="s2">&#34;firefox&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"><span class="p">};</span></span></span></code></pre></div></div><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-8"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> +<p>In this example, the <code>cookies</code> option is set to <code>&quot;firefox&quot;</code>, instructing <code>flyscrape</code> to use the cookie store of the local Firefox browser.</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><div class="filename">Configuration</div><div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">config</span> <span class="o">=</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="nx">cookies</span><span class="o">:</span> <span class="s2">&#34;edge&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"><span class="p">};</span></span></span></code></pre></div></div><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-8"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> +<p>In this example, the <code>cookies</code> option is set to <code>&quot;edge&quot;</code>, indicating that <code>flyscrape</code> should use the cookie store of the local Edge browser.</p> + + </description> + </item> + + <item> + <title>Headers</title> + <link>https://flyscrape.com/docs/configuration/headers/</link> + <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate> + + <guid>https://flyscrape.com/docs/configuration/headers/</guid> + <description> + + + <p>The <code>headers</code> config option allows you to specify the custom HTTP headers sent with each request.</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><div class="filename">Configuration</div><div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">config</span> <span class="o">=</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="nx">headers</span><span class="o">:</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="s2">&#34;Authorization&#34;</span><span class="o">:</span> <span class="s2">&#34;Bearer ey....&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="s2">&#34;User-Agent&#34;</span><span class="o">:</span> <span class="s2">&#34;Mozilla/5.0 (Macintosh ...&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="p">},</span> +</span></span><span class="line"><span class="cl"> <span class="c1">// ... +</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="p">};</span></span></span></code></pre></div></div><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-8"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> + + </description> + </item> + + <item> + <title>Browser Mode</title> + <link>https://flyscrape.com/docs/configuration/browser-mode/</link> + <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate> + + <guid>https://flyscrape.com/docs/configuration/browser-mode/</guid> + <description> + + + <p>The Browser Mode controls the interaction with a headless Chromium browser. Enabling the browser mode allows <code>flyscrape</code> to download a Chromium browser once and use it to render JavaScript-heavy pages.</p> +<h2>Browser Mode<span class="absolute -mt-20" id="browser-mode"></span> + <a href="#browser-mode" class="subheading-anchor" aria-label="Permalink for this section"></a></h2><p>To enable Browser Mode, set the <code>browser</code> option to <code>true</code> in your configuration. This allows <code>flyscrape</code> to use a headless Chromium browser for rendering JavaScript during the scraping process.</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><div class="filename">Configuration</div><div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">config</span> <span class="o">=</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="nx">browser</span><span class="o">:</span> <span class="kc">true</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"><span class="p">};</span></span></span></code></pre></div></div><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-8"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> +<p>In the above example, Browser Mode is enabled, allowing <code>flyscrape</code> to render pages that rely on JavaScript execution.</p> +<h2>Headless Option<span class="absolute -mt-20" id="headless-option"></span> + <a href="#headless-option" class="subheading-anchor" aria-label="Permalink for this section"></a></h2><p>The <code>headless</code> option, when combined with Browser Mode, controls whether the Chromium browser should run in headless mode or not. Headless mode means the browser operates without a graphical user interface, which can be useful for background processes.</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><div class="filename">Configuration</div><div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">config</span> <span class="o">=</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="nx">browser</span><span class="o">:</span> <span class="kc">true</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="nx">headless</span><span class="o">:</span> <span class="kc">false</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"><span class="p">};</span></span></span></code></pre></div></div><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-8"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> +<p>In this example, the Chromium browser will run in non-headless mode. If you set <code>headless</code> to <code>true</code>, the browser will run without a visible GUI.</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><div class="filename">Configuration</div><div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">config</span> <span class="o">=</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="nx">browser</span><span class="o">:</span> <span class="kc">true</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="nx">headless</span><span class="o">:</span> <span class="kc">true</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"><span class="p">};</span></span></span></code></pre></div></div><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-8"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> +<p>In this example, the Chromium browser will run in headless mode, suitable for scenarios where graphical rendering is unnecessary.</p> + + </description> + </item> + + <item> + <title>Output File and Format</title> + <link>https://flyscrape.com/docs/configuration/output/</link> + <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate> + + <guid>https://flyscrape.com/docs/configuration/output/</guid> + <description> + + + <p>The output file and format are specified in the configuration object of your scraping script. They determine where the scraped data will be saved and in what format.</p> +<h2>Output File<span class="absolute -mt-20" id="output-file"></span> + <a href="#output-file" class="subheading-anchor" aria-label="Permalink for this section"></a></h2><p>The output file is the file where the scraped data will be saved. If not specified, the data will be printed to the standard output (stdout).</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><div class="filename">Configuration</div><div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">config</span> <span class="o">=</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="nx">output</span><span class="o">:</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="c1">// Specify the output file. +</span></span></span><span class="line"><span class="cl"><span class="c1"></span> <span class="nx">file</span><span class="o">:</span> <span class="s2">&#34;results.json&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="p">},</span> +</span></span><span class="line"><span class="cl"><span class="p">};</span></span></span></code></pre></div></div><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-8"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> +<p>In the above example, the scraped data will be saved in a file named <code>results.json</code>.</p> +<h2>Output Format<span class="absolute -mt-20" id="output-format"></span> + <a href="#output-format" class="subheading-anchor" aria-label="Permalink for this section"></a></h2><p>The output format is the format in which the scraped data will be saved. The options are <code>json</code> and <code>ndjson</code>.</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><div class="filename">Configuration</div><div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">config</span> <span class="o">=</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="nx">output</span><span class="o">:</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="c1">// Specify the output format. +</span></span></span><span class="line"><span class="cl"><span class="c1"></span> <span class="nx">format</span><span class="o">:</span> <span class="s2">&#34;json&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="p">},</span> +</span></span><span class="line"><span class="cl"><span class="p">};</span></span></span></code></pre></div></div><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-8"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> +<p>In the above example, the scraped data will be saved in JSON format.</p> +<div class="code-block relative mt-6 first:mt-0 group/code"><div class="filename">Configuration</div><div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">config</span> <span class="o">=</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="nx">output</span><span class="o">:</span> <span class="p">{</span> +</span></span><span class="line"><span class="cl"> <span class="c1">// Specify the output format. +</span></span></span><span class="line"><span class="cl"><span class="c1"></span> <span class="nx">format</span><span class="o">:</span> <span class="s2">&#34;ndjson&#34;</span><span class="p">,</span> +</span></span><span class="line"><span class="cl"> <span class="p">},</span> +</span></span><span class="line"><span class="cl"><span class="p">};</span></span></span></code></pre></div></div><div class="opacity-0 transition group-hover/code:opacity-100 flex gap-1 absolute m-[11px] right-0 top-8"> + <button + class="code-copy-btn group/copybtn transition-all active:opacity-50 bg-primary-700/5 border border-black/5 text-gray-600 hover:text-gray-900 rounded-md p-1.5 dark:bg-primary-300/10 dark:border-white/10 dark:text-gray-400 dark:hover:text-gray-50" + title="Copy code" + > + <div class="group-[.copied]/copybtn:hidden copy-icon pointer-events-none h-4 w-4"></div> + <div class="hidden group-[.copied]/copybtn:block success-icon pointer-events-none h-4 w-4"></div> + </button> + </div> +</div> +<p>In this example, the scraped data will be saved in newline-delimited JSON (NDJSON) format. Each line in the output file will be a separate JSON object.</p> + + </description> + </item> + + </channel> +</rss> |