1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
|
# flyscrape
flyscrape is an elegant scraping tool for efficiently extracting data from websites. Whether you're a developer, data analyst, or researcher, flyscrape empowers you to effortlessly gather information from web pages and transform it into structured data.
## Features
- **Simple and Intuitive**: **flyscrape** offers an easy-to-use command-line interface that allows you to interact with scraping scripts effortlessly.
- **Create New Scripts**: The `new` command enables you to generate sample scraping scripts quickly, providing you with a solid starting point for your scraping endeavors.
- **Run Scripts**: Execute your scraping script using the `run` command, and watch as **flyscrape** retrieves and processes data from the specified website.
- **Watch for Development**: The `dev` command allows you to watch your scraping script for changes and quickly iterate during development, helping you find the right data extraction queries.
## Installation
To install **flyscrape**, follow these simple steps:
1. Install Go: Make sure you have Go installed on your system. If not, you can download it from [https://golang.org/](https://golang.org/).
2. Install **flyscrape**: Open a terminal and run the following command:
```bash
go install github.com/philippta/flyscrape/cmd/flyscrape@latest
```
## Usage
**flyscrape** offers several commands to assist you in your scraping journey:
### Creating a New Script
Use the `new` command to create a new scraping script:
```bash
flyscrape new example.js
```
### Running a Script
Execute your scraping script using the `run` command:
```bash
flyscrape run example.js
```
### Watching for Development
The `dev` command allows you to watch your scraping script for changes and quickly iterate during development:
```bash
flyscrape dev example.js
```
## Example Script
Below is an example scraping script that showcases the capabilities of **flyscrape**:
```javascript
export const config = {
url: "https://news.ycombinator.com/", // Specify the URL to start scraping from.
// depth: 0, // Specify how deep links should be followed. (default = 0, no follow)
// allowedDomains: [], // Specify the allowed domains. ['*'] for all. (default = domain from url)
// blockedDomains: [], // Specify the blocked domains. (default = none)
// allowedURLs: [], // Specify the allowed URLs as regex. (default = all allowed)
// blockedURLs: [], // Specify the blocked URLs as regex. (default = non blocked)
// rate: 100, // Specify the rate in requests per second. (default = 100)
// cache: "file", // Enable file-based request caching. (default = no cache)
};
export default function({ doc, absoluteURL }) {
const title = doc.find("title");
const posts = doc.find(".athing");
return {
title: title.text(),
posts: posts.map((post) => {
const link = post.find(".titleline > a");
return {
title: link.text(),
url: absoluteURL(link.attr("href")),
};
}),
};
}
```
## Contributing
We welcome contributions from the community! If you encounter any issues or have suggestions for improvement, please [submit an issue](https://github.com/philippta/flyscrape/issues).
|