You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Ivaylo Ivanov 00140c2b6b Add URL entry to CSV 5 years ago
lib Add URL entry to CSV 5 years ago
.gitignore Add initial scraping capabilities 5 years ago
LICENSE Add LICENSE 5 years ago Add initial scraping capabilities 5 years ago Add initial scraping capabilities 5 years ago
requirements.txt Add initial scraping capabilities 5 years ago

Contact Scanner

What is this?

The project is a small python web scraper with Selenium and BeautifulSoup.

What does it do?

The scraper goes to the impressum page of a given website and scans it for an email address and a name, following the keywords defined in a supplied file. After it scrapes the page, it writes the results in a csv file.

NOTE: The scraper does NOT return a 100% correct email-name pairs. It returns the pairs that it can build. This means that you should always take the results with a grain of salt.

How to use it?


You are going to need the following things installed:

  • Chrome
  • Python 3
  • Pip3
  • Selenium Chrome driver

After you have these 4 installed, go on.


The dependencies are listed in requirements.txt. Install them with the following command:

pip3 install -r requirements.txt


The application has the following synopsis:



where URL_FILE is a file with a list of URLs that should be scanned with each URL on new line and KEYWORD_FILE contains a list of keywords based on which you will search for names. The format of the file is the same(you should trim the trailing whitespaces for best results).

Usage constraints

You should NOT

  1. use this scraper for generating spam lists
  2. use this scraper without acknowledging the robots.txt of the target
  3. use this scraper when you have explicitly agreed with the website not to scrape it
  4. use this scraper if you're not using it under fair use

Fair use

The scraper falls under fair use because it is designed to search for facts in pages and not for content