From 0daefa86b400efe08245f4f2a386f7341b76b24e Mon Sep 17 00:00:00 2001 From: Philipp Tanlak Date: Thu, 19 Oct 2023 17:54:18 +0200 Subject: docs: Add documentation --- docs/using-flyscrape/development-mode.md | 53 +++++++++++++++++++++ docs/using-flyscrape/scraping-setup.md | 80 ++++++++++++++++++++++++++++++++ docs/using-flyscrape/start-scraping.md | 51 ++++++++++++++++++++ 3 files changed, 184 insertions(+) create mode 100644 docs/using-flyscrape/development-mode.md create mode 100644 docs/using-flyscrape/scraping-setup.md create mode 100644 docs/using-flyscrape/start-scraping.md (limited to 'docs/using-flyscrape') diff --git a/docs/using-flyscrape/development-mode.md b/docs/using-flyscrape/development-mode.md new file mode 100644 index 0000000..b2da076 --- /dev/null +++ b/docs/using-flyscrape/development-mode.md @@ -0,0 +1,53 @@ +# Development Mode + +Development Mode in flyscrape allows you to streamline the process of creating and fine-tuning your scraping scripts. With the `flyscrape dev` command, you can watch your scraping script for changes and see the results in real-time, making it easier to iterate and perfect your data extraction process during development. + +## The `flyscrape dev` Command + +The `flyscrape dev` command is a powerful tool that enhances your development workflow by automating the execution of your scraping script when changes are detected. This feature is incredibly useful for several reasons: + +1. **Immediate Feedback**: With Development Mode, you can make changes to your scraping script and instantly see the impact of those changes. There's no need to manually run the script after each modification. + +2. **Efficiency**: It eliminates the need to repeatedly run the `flyscrape run` command while you fine-tune your scraping logic. This boosts your efficiency and accelerates development. + +3. **Real-time Debugging**: If you encounter issues or unexpected behavior in your scraping script, you can quickly identify and fix problems with real-time feedback. + +## Using the `flyscrape dev` Command + +To activate Development Mode, use the `flyscrape dev` command followed by the name of your scraping script. For example: + +```bash +flyscrape dev my_scraping_script.js +``` + +This command will start watching your scraping script file (`my_scraping_script.js` in this case) for changes. Whenever you save changes to the script, flyscrape will automatically re-run it, allowing you to view the updated results in your terminal. + +## Tips for Development Mode + +Here are some tips to make the most of Development Mode: + +1. **Keep Your Editor Open**: Keep your code editor open and edit your scraping script as needed. When you save the changes, flyscrape will automatically pick them up. + +2. **Console Output**: Use `console.log()` statements within your scraping script to output debugging information to the console. This can be helpful for diagnosing issues. + +3. **Iterate and Experiment**: Take advantage of Development Mode to experiment with different data extraction queries and strategies. The rapid feedback loop makes it easy to iterate and find the right approach. + +## Example Workflow + +Here's an example of how a typical workflow might look in Development Mode: + +1. Create a new scraping script using `flyscrape new`. + +2. Use `flyscrape dev` to start watching the script. + +3. Edit the script, add data extraction logic, and save the changes. + +4. Observe the results in real-time in the terminal. + +5. If needed, make further changes and continue iterating until you achieve the desired data extraction results. + +Development Mode is an invaluable tool for scraping script development, enabling you to build and refine your scripts efficiently and effectively. + +--- + +This concludes the "Development Mode" section, which demonstrates how to use the `flyscrape dev` command to streamline your scraping script development process. Next, you can explore how to initiate scraping with the "Start scraping" section to gather data from websites. diff --git a/docs/using-flyscrape/scraping-setup.md b/docs/using-flyscrape/scraping-setup.md new file mode 100644 index 0000000..2b3183b --- /dev/null +++ b/docs/using-flyscrape/scraping-setup.md @@ -0,0 +1,80 @@ +# Scraping Setup + +In this section, we'll delve into the details of setting up your scraping script using the `flyscrape new script.js` command. This command is designed to streamline the process of creating a scraping script, providing you with a structured starting point for your web scraping endeavors. + +## The `flyscrape new` Command + +The `flyscrape new` command allows you to generate a new scraping script with a predefined structure and sample code. This is incredibly helpful because it provides a quick and easy way to begin your web scraping project. + +## Creating a New Scraping Script + +To create a new scraping script, use the `flyscrape new` command followed by the desired script filename. For example: + +```bash +flyscrape new my_scraping_script.js +``` + +This command will generate a file named `my_scraping_script.js` in the current directory. You can then open and edit this file with your preferred code editor. + +## Script Overview + +Let's take a closer look at the structure and components of the generated scraping script: + +```javascript +import { parse } from 'flyscrape'; + +export const options = { + url: 'https://example.com/', // Specify the URL to start scraping from. + depth: 1, // Specify how deep links should be followed. (default = 0, no follow) + allowedDomains: [], // Specify the allowed domains. ['*'] for all. (default = domain from url) + blockedDomains: [], // Specify the blocked domains. (default = none) + allowedURLs: [], // Specify the allowed URLs as regex. (default = all allowed) + blockedURLs: [], // Specify the blocked URLs as regex. (default = non-blocked) + proxy: '', // Specify the HTTP(S) proxy to use. (default = no proxy) + rate: 100 // Specify the rate in requests per second. (default = 100) +}; + +export default function ({ html, url }) { + const $ = parse(html); + + // Your data extraction logic goes here + + return { + // Return the structured data you've extracted + }; +} +``` + +## Implementing the Data Extraction Logic + +In the generated scraping script, you'll find the comment "// Your data extraction logic goes here." This is the section where you should implement your custom data extraction logic. You can use tools like [Cheerio](https://cheerio.js.org/) or other libraries to navigate and extract data from the parsed HTML. + +Here's an example of how you might replace the comment with data extraction code: + +```javascript +// Your data extraction logic goes here +const title = $('h1').text(); +const description = $('p').text(); + +// You can extract more data as needed +``` + +## Returning the Extracted Data + +After implementing your data extraction logic, you should structure the data you've extracted and return it from the scraping function. The comment "// Return the structured data you've extracted" is where you should place this code. + +Here's an example of how you might return the extracted data: + +```javascript +return { + title: title, + description: description + // Add more fields as needed +}; +``` + +With this setup, you can effectively scrape and structure data from web pages to meet your specific requirements. + +--- + +This concludes the "Scraping Setup" section, which provides insights into creating scraping scripts using the `flyscrape new` command, implementing data extraction logic, and returning extracted data. Next, you can explore more advanced topics in the "Development Mode" section to streamline your web scraping workflow. diff --git a/docs/using-flyscrape/start-scraping.md b/docs/using-flyscrape/start-scraping.md new file mode 100644 index 0000000..97b92cc --- /dev/null +++ b/docs/using-flyscrape/start-scraping.md @@ -0,0 +1,51 @@ +# Start Scraping + +In this section, we'll dive into the process of initiating web scraping using flyscrape. Now that you have created and fine-tuned your scraping script, it's time to run it and start gathering data from websites. + +## The `flyscrape run` Command + +The `flyscrape run` command is used to execute your scraping script and retrieve data from the specified website. This command is your gateway to turning your scraping logic into actionable results. + +## Running Your Scraping Script + +To run your scraping script, simply use the `flyscrape run` command followed by the name of your script file. For example: + +```bash +flyscrape run my_scraping_script.js +``` + +This command will initiate the scraping process as defined in your script. Flyscrape will execute your script and stream the JSON output of the extracted data directly to your terminal. + +## Saving Scraped Data to a File + +You can easily save the JSON output of the scraped data to a file using standard shell redirection. For example, to save the scraped data to a file named `result.json`, you can use the following command: + +```bash +flyscrape run my_scraping_script.js > result.json +``` + +This command will execute your scraping script and save the extracted data in the `result.json` file in the current directory. + +## Example Workflow + +Here's a simple workflow for starting web scraping with flyscrape, including saving the scraped data to a file: + +1. Create a scraping script using `flyscrape new` and fine-tune it using `flyscrape dev`. + +2. Save your script. + +3. Run the script using `flyscrape run`. + +4. Observe the terminal as flyscrape streams the JSON output of the extracted data in real-time. + +5. If you want to save the data to a file, use redirection as shown above (`flyscrape run my_scraping_script.js > result.json`). + +6. Customize the script to store, process, or further analyze the extracted data as needed. + +7. Continue scraping or iterate on your script for more complex scenarios. + +With this workflow, you can efficiently gather and process data from websites using flyscrape, with the option to save the extracted data to a file for later use or analysis. + +--- + +This concludes the "Start Scraping" section, which covers the process of initiating web scraping with the `flyscrape run` command, including an example of how to save the scraped data to a file. Next, you can explore various configuration options and advanced features in the "Options" section to further tailor your scraping experience. -- cgit v1.2.3