# Contact Scanner ## What is this? The project is a small python web scraper with Selenium and BeautifulSoup. ## What does it do? The scraper goes to the impressum page of a given website and scans it for an email address and a name, following the keywords defined in a supplied file. After it scrapes the page, it writes the results in a csv file. **NOTE:** The scraper does **NOT** return a 100% correct email-name pairs. It returns the pairs that it can **build**. This means that you should always take the results with a grain of salt. ## How to use it? ### Prerequisites You are going to need the following things installed: * Chrome * Python 3 * Pip3 * Selenium Chrome driver After you have these 4 installed, go on. ### Dependecies The dependencies are listed in [requirements.txt](requirements.txt). Install them with the following command: ``` pip3 install -r requirements.txt ``` ### Usage The application has the following synopsis: ``` SYNOPSIS python3 app.py URL_FILE KEYWORD_FILE ``` where ```URL_FILE``` is a file with a list of URLs that should be scanned with each URL on new line and ```KEYWORD_FILE``` contains a list of keywords based on which you will search for names. The format of the file is the same(you should trim the trailing whitespaces for best results). ### Usage constraints You should **NOT** 1. use this scraper for generating spam lists 2. use this scraper without acknowledging the `robots.txt` of the target 3. use this scraper when you have explicitly agreed with the website not to scrape it 4. use this scraper if you're not using it under fair use ## Fair use The scraper falls under fair use because it is designed to search for *facts* in pages and not for *content*