contact-scan/README.md

# Contact Scanner
## What is this?
The project is a small python web scraper with Selenium and BeautifulSoup.

## What does it do?
The scraper goes to the impressum page of a given website and scans it for an email address and a name, following the keywords defined in a supplied file. After it scrapes the page, it writes the results in a csv file.

**NOTE:** The scraper does **NOT** return a 100% correct email-name pairs. It returns the pairs that it can **build**. This means that you should always take the results with a grain of salt.

## How to use it?
### Prerequisites
You are going to need the following things installed:
* Chrome
* Python 3
* Pip3
* Selenium Chrome driver

After you have these 4 installed, go on.
### Dependecies
The dependencies are listed in [requirements.txt](requirements.txt). Install them with the following command:
```
pip3 install -r requirements.txt
```

### Usage
The application has the following synopsis:
```
SYNOPSIS

python3 app.py URL_FILE KEYWORD_FILE
```

where ```URL_FILE``` is a file with a list of URLs that should be scanned with each URL on new line and ```KEYWORD_FILE``` contains a list of keywords based on which you will search for names. The format of the file is the same(you should trim the trailing whitespaces for best results).

### Usage constraints
You should **NOT**
1. use this scraper for generating spam lists
2. use this scraper without acknowledging the `robots.txt` of the target
3. use this scraper when you have explicitly agreed with the website not to scrape it
4. use this scraper if you're not using it under fair use

## Fair use
The scraper falls under fair use because it is designed to search for *facts* in pages and not for *content*
Add initial scraping capabilities 2018-12-18 21:55:28 +00:00			`# Contact Scanner`
			`## What is this?`
			`The project is a small python web scraper with Selenium and BeautifulSoup.`

			`## What does it do?`
			`The scraper goes to the impressum page of a given website and scans it for an email address and a name, following the keywords defined in a supplied file. After it scrapes the page, it writes the results in a csv file.`

			`NOTE: The scraper does NOT return a 100% correct email-name pairs. It returns the pairs that it can build. This means that you should always take the results with a grain of salt.`

			`## How to use it?`
			`### Prerequisites`
			`You are going to need the following things installed:`
			`* Chrome`
			`* Python 3`
			`* Pip3`
			`* Selenium Chrome driver`

			`After you have these 4 installed, go on.`
			`### Dependecies`
			`The dependencies are listed in [requirements.txt](requirements.txt). Install them with the following command:`
			```
			`pip3 install -r requirements.txt`
			```

			`### Usage`
			`The application has the following synopsis:`
			```
			`SYNOPSIS`

			`python3 app.py URL_FILE KEYWORD_FILE`
			```

			where ```URL_FILE``` is a file with a list of URLs that should be scanned with each URL on new line and ```KEYWORD_FILE``` contains a list of keywords based on which you will search for names. The format of the file is the same(you should trim the trailing whitespaces for best results).

			`### Usage constraints`
			`You should NOT`
			`1. use this scraper for generating spam lists`
			2. use this scraper without acknowledging the `robots.txt` of the target
			`3. use this scraper when you have explicitly agreed with the website not to scrape it`
			`4. use this scraper if you're not using it under fair use`

			`## Fair use`
			`The scraper falls under fair use because it is designed to search for facts in pages and not for content`