SearchEngineScrapy
Scrape data from Google.com, Bing.com, Baidu.com, Ask.com, Yahoo.com, Yandex.com
Project maintained by naqushab
Hosted on GitHub Pages — Theme by mattgraham
Intro
SearchEngineScrapy is a web crawler and scraper for scraping data off
various search engines such as Google.com, Bing.com, Yahoo.com,
Ask.com, Baidu.com, Yandex.com It is based on Python Scrapy project and is developed
using Python 2.7
Setup
virtualenv --python="2" env
env/bin/activate
git clone https://github.com/naqushab/SearchEngineScrapy.git
cd SearchEngineScrapy
pip install -r requirements.txt
Usage
Prefix : -a
Params
searchQuery=”" [Required Parameter]
searchEngine=”" [Options: Google/Bing/Ask/Yandex/Baidu/Yahoo] [Optional Parameter] [Default: Bing]
pages= [Number of pages to crawl] [Optional Parameter : Default- 3]
Prefix : -o
Params
[Output the resulta to a file] [Optional Parameter] [Supported:json/jsonl/csv/xml]
### Examples
```
scrapy crawl SearchEngineScrapy -a searchQuery="I'm Batman"
scrapy crawl SearchEngineScrapy -a searchQuery="I'm Batman" -o filename.json
scrapy crawl SearchEngineScrapy -a searchQuery="I'm Batman" -a searchEngine="Google" -o filename.xml
scrapy crawl SearchEngineScrapy -a searchQuery="I'm Batman" -a searchEngine="Google" -a pages=5 -o filename.csv
```
## TODO
- Add support for DDG
- Ability to provide parameter of what to save
- Ability to export to various formats (currently limited to JSON,
JSONLINES, CSV, XML)
- Contributing section