diff options
| author | Philipp Tanlak <philipp.tanlak@gmail.com> | 2023-08-17 20:49:49 +0200 |
|---|---|---|
| committer | Philipp Tanlak <philipp.tanlak@gmail.com> | 2023-08-17 20:49:49 +0200 |
| commit | 87c1136438b5f24fcb886b9771dd0245999c3e8a (patch) | |
| tree | 80eafc379ce431006d9a6d656b4b021289689eab | |
| parent | 1a9af21755a78bb8689bd1f3830239f81dadc324 (diff) | |
README
| -rw-r--r-- | README.md | 101 | ||||
| -rw-r--r-- | cmd/flyscrape/run.go | 3 |
2 files changed, 101 insertions, 3 deletions
diff --git a/README.md b/README.md new file mode 100644 index 0000000..e1c416a --- /dev/null +++ b/README.md @@ -0,0 +1,101 @@ +# flyscrape - Elegant Website Scraping Tool + +flyscrape is a powerful command-line tool designed to streamline the process of efficiently extracting data from websites. Whether you're a developer, data analyst, or researcher, flyscrape empowers you to effortlessly gather information from web pages and transform it into structured data. With its intuitive command-line interface and versatile capabilities, flyscrape simplifies the scraping process while delivering accurate and customizable results. + +## Features + +- **Simple and Intuitive**: **flyscrape** offers an easy-to-use command-line interface that allows you to interact with scraping scripts effortlessly. + +- **Create New Scripts**: The `new` command enables you to generate sample scraping scripts quickly, providing you with a solid starting point for your scraping endeavors. + +- **Run Scripts**: Execute your scraping script using the `run` command, and watch as **flyscrape** retrieves and processes data from the specified website. + +- **Watch for Development**: The `watch` command allows you to watch your scraping script for changes and quickly iterate during development, helping you find the right data extraction queries. + +## Installation + +To install **flyscrape**, follow these simple steps: + +1. Install Go: Make sure you have Go installed on your system. If not, you can download it from [https://golang.org/](https://golang.org/). + +2. Install **flyscrape**: Open a terminal and run the following command: + + ```bash + go install github.com/philippta/flyscrape@latest + ``` + +## Usage + +**flyscrape** offers several commands to assist you in your scraping journey: + +### Creating a New Script + +Use the `new` command to create a new scraping script: + +```bash +flyscrape new example.js +``` + +### Running a Script + +Execute your scraping script using the `run` command: + + +```bash +flyscrape run example.js +``` + +### Watching for Development + +The `watch` command allows you to watch your scraping script for changes and quickly iterate during development: + +```bash +flyscrape watch example.js +``` + +## Example Script + +Below is an example scraping script that showcases the capabilities of **flyscrape**: + +```javascript +import { parse } from 'flyscrape'; + +export const options = { + url: 'https://news.ycombinator.com/', + depth: 1, + allowedDomains: ['news.ycombinator.com'], + blockedDomains: [], + rate: 100, +}; + +export default function({ html, url }) { + const $ = parse(html); + const title = $('title'); + const entries = $('.athing').toArray(); + + if (!entries.length) { + return null; + } + + return { + title: title.text(), + entries: entries.map(entry => { + const link = $(entry).find('.titleline > a'); + const rank = $(entry).find('.rank'); + const points = $(entry).next().find('.score'); + + return { + title: link.text(), + url: link.attr('href'), + rank: parseInt(rank.text().slice(0, -1)), + points: parseInt(points.text().replace(' points', '')), + }; + }), + }; +} +``` + +## Contributing + +We welcome contributions from the community! If you encounter any issues or have suggestions for improvement, please [submit an issue](https://github.com/philippta/flyscrape/issues). + diff --git a/cmd/flyscrape/run.go b/cmd/flyscrape/run.go index 2d76a35..bd8541b 100644 --- a/cmd/flyscrape/run.go +++ b/cmd/flyscrape/run.go @@ -83,9 +83,6 @@ Examples: # Run the script. $ flyscrape run example.js - # Run the script with 10 concurrent requests. - $ flyscrape run -concurrent 10 example.js - # Run the script with pretty printing disabled. $ flyscrape run -no-pretty-print example.js `[1:]) |