In this guide, we will be giving you a full walkthrough of how to use Email Extractor and Search Engine Scraper By Creative Bear Tech. This guide will be divided into sections and will follow in a logic sequence.
Full Tutorial of Search Engine Scraper and Email Extractor by Creative Bear Tech
1 How to Run the Search Engine Scraper By Creative Bear Tech
2 Activating your Licence for the Search Engine Scraper
When you have purchased your copy of the Email Extractor and Search Engine Scraper by Creative Bear Tech, you should have received a username and a licence key. This licence key will allow you to run the software on one machine. Your copy of the software will be tied to your MAC address.
Go to "More Settings" and at the bottom left hand side corner, click on "License" button. You will now need to enter your username and license key. Once the registration is successful, you will see a green message reading "The app is licensed". At the right hand side bottom of the main GUI, you will also see a writing that you are running a "Registered Version".
3 Name your Project
On the main GUI, at the top left hand side, just under "Search Settings", you will see a field called "Project Name". Please enter a name for your project. This name will be used to create a folder where your scraped data will be stored and will also be used as the name of the file. I usually like to have a representative project name. For example, if I am scraping cryptocurrency and blockchain data, I would have a project name along the lines of "Cryptocurrency and Blockchain Database".
4 Specify the Folder path where the Scraped Data Should be Saved
Click on the "More Settings" button and go to "Save & Login Details" tab. You will need to select a folder on your computer where the results should be exported. Usually, it is a good idea to create a folder inside the software folder. I usually like to create a folder called "Scraped Data". The software will automatically use the project name to create a separate folder (using the project name). Inside that folder, the results will be exported in an Excel .csv file. The Excel file will have the same name as the project name. For example, if my project name is "Cryptocurrency and Blockchain Database" then my folder and the file will be named
"Cryptocurrency and Blockchain Database".
5 Configure your Proxy Settings
The next step will be to configure your proxies. You can still run the website scraper without proxies. However, if you are planning to do a lot of scraping using multiple sources and threads, it is recommended that you get some proxies. Click on "More Settings" button on the main graphical user interface (GUI) and click on the first tab "Proxy Settings". Inside the input pane, you will need to add your proxies, one per line, in the following format: IP address:Port:Username:Password Once you have entered you proxies, you can use the in-built proxy tester tool by click on the button "Test the proxies and remove if not working". The software will automatically test your proxies and remove non-working ones. I highly recommend that you get your proxies from
https://stormproxies.com or https://hashcell.com/ Private dedicated proxies are best. Do not even waste your time with public proxies as they are quite unreliable for scraping. It is recommended that you rotate your proxies every minute so that they do not get blacklisted. You can paste the proxies directly in the text input pane or upload them from file.
5 (b) A timed out VPN is an alternative to proxies (not recommended)
Instead of using proxies, you can also use VPN software such as Hide My Ass VPN! You would need to use the previous version that has a timed out IP change. This means that the VPN software will change the IP address every given number of minutes and seconds. You can even choose your countries. However, the problem with the VPNs is that sometimes they disconnect and stop working. This can interrupt the scraping. VPN proxies tend to be quite overused and blacklisted with the popular search engines such as Google. I thought I would cover this alternative for the sake of completeness, but I would not recommend it.
6 Configure remote Captcha Solving Service
Sometimes, when running the search engine scraper for prolonged periods of time, certain IP addresses may get blacklisted and you would need to solve the captcha (Google image captchas and text captchas). The website scraper has an integrated remote captcha solving service called 2captcha. You will need to create an account on https://2captcha.com/ and get your API key and paste it into the "API Key" box. You can click on "Get balance" button to see if your software has connected to 2captcha successfully. Captcha is not vital if you have configured the delay settings properly, but it is recommended to have it to avoid IP bans and interruptions (especially if you are not using proxies).
6 (b) Configure XEvil by Botmaster Labs to Solve Captchas for Free
You can use Xrumer and XEvil to solve the captchas for free. It is one of the most advanced captcha solving software that can solve even Google image captchas. You can read more about XEvil at http://www.botmasterlabs.net/
6 (c) How to Connect XEvil to the Search Engine Scraper by Creative Bear Tech
Go to XEvil and under the "Settings" tab, select "2captcha" then go to the "Captcha Settings" tab in the Search Engine Scraper by Creative Bear Tech, enter a random key (any length) and hit the "check balance" button. You should see a success message saying that your balance is 100. This means that your software is connected to XEvil. Under the settings tab, you will also see a code with your API key in this format: "21/05/2019 12:32:58: GET /res.php?key=70902597a9c4b9c4232926ac63395c5d&action=getbalance&json=0". This basically means that the Search Engine Scraper has connected to XEvil.
7 Configuring your Speed Settings
Click on "More Settings" on the main GUI and then click on the "Speed Settings" tab. Under this tab, you will be able to set how deep the software should scrape, which will impact on the scraping speed, hence the name. The first option is the "Total number of search results (websites) to parse per keyword". This just means how many search results the software should scrape per search. For example, when you search for something on Bing or Google search engines, you can go all the way up to page 20 or even further. Usually, 200 results/websites per keyword search are enough. You also have the option to tell the software "Maximum number of emails to extract from the same website". Sometimes, a website will have more than one email address (i.e. [email protected], [email protected], [email protected], etc). You can tell the software how many emails to scrape. Usually, a couple is enough. "Do not show pictures in integrated web-browser". This option is meant to save time and processing power by not loading the pictures from websites as those are not needed for our scraping endeavours. You also have the option to "parse the search results (websites) using web browser" which just means that the scraper will work at a single thread and you will be able to view the live scraping. You will not be able to use multi-threading options or hide the browser. This option is ideal if you want to see how the software works. I do not use this option.
8 Choose what Search Engines Or Websites to Scrape: Google, Bing, DuckDuckGo!, AOL, Yahoo, Yandex, Google Maps, Yellow Pages, Yelp, Linked In, Trust Pilot
The next step is for you to choose what search engines or websites to scrape. Go to "More Settings" on the main GUI and then head to "Search Engines/Dictionaries" tab. On the left hand side, you will see a list of different search engines and websites that you can scrape. To add a search engine or a website simply check on each one and the selected search engines and/or websites will appear on the right hand side.
ASK Search Engine Scraping Software and Email Extractor by Creative Bear Tech
Yandex Russian Website Scraper and Email Extractor by Creative Bear Tech
Yahoo Website Scraping Software and Email Extractor by Creative Bear Tech
SO Chinese Search Engine Harvester and Email Extractor by Creative Bear Tech
Ecosia Search Engine Scraper and Email Extractor by Creative Bear Tech
AOL Search Engine Scraper and Email Extractor by Creative Bear Tech
YELP Business Directory Scraper Software and Email Extraction Tool by Creative Bear Tech
DuckDuckGo Search Engine Results Scraper and Email Extraction Software
UK Yellow Pages Scraper by Creative Bear Tech
Google Search Engine Scraping Software and Email Extractor Tool
Google Maps Scraping Software and Email Extractor
Bing Search Engine Scraper and Email Extraction Software
Website Scraper and Email Extraction Software
Yellow Pages Scraper and Extractor Software for USA (yell.com)
8 b) Local Scraping Settings for Local Lead Generation
Inside the same tab, "Search Engines/Dictionaries", on the left hand side, you can expand some websites by double clicking on the plus sign next to them. This is going to open up a list of countries/cities which will allow you to scrape local leads. For example, you can expand Google Maps and select the relevant country. Likewise, you can expand Google and Bing and select a local search engine such as Google.co.uk. Otherwise, if you do not select a local search engine, the software will run international search, which are still fine.
8 c) Special Instructions for Scraping Google Maps and Footprint Configuration
Google Maps scraping is slightly different to scraping the search engines and other sites. Google Maps contains a lot of local businesses and sometimes it is not enough to search for a business category in one city. For example, if I am searching for "beauty salon in London", this search will only return me just under a hundred results which is not representative of the total number of beauty salons in London. Google Maps provides data on the basis of very targeted post code / town searches. It is therefore very important to use proper footprints for local businesses in order to get the most comprehensive set of results. If you are only searching for all beauty salons in London, you would want to get a list of all the towns in London along with their post codes and then add your keyword to each town and post code. On the Main GUI, enter one keyword. In our case, it would be, "beauty salon". Then click on the "Add FootPrint" button. Inside, you need to "Add the footprints or sub-areas". Inside the software, there are some footprints for some countries that you can use. Once you have uploaded your footprints, select the sources on the right hand side. The software will take your root keywords and add it to every single footprint / area. In our case, we would be running 20,000+ searches for beauty salon in different areas in the UK. This is perhaps the most comprehensive way of running Google Maps scraping searches. It takes longer but it is definitely the mot effective method. Please also note that Google Maps can only run on one thread as Google bans proxies very fast. I also highly recommend that you run Google Maps searches separately from search engine and other website searches simply because Google maps is comprehensive enough and you would not want to run the same detailed search with thousands of footprints say on Google or Bing! TIP: You should only be using footprints for Google maps. You do not need to run such detailed searches with the search engines.
9 Scraping your own Website List
Perhaps you have your own list of websites that you have created using Scrapebox or any other type of software and you would like to parse them for contact details. You will need to go to "More Settings" on the main GUI and navigate to the tab titled "Website List". Make sure that your list of websites is saved locally in a .txt notepad file with one url per line (no separators). Select your website list source by specifying the location of the file. You will then need to split up the file. I recommend to split your master list of websites into files of 100 websites per file. The software will do all the splitting automatically. The reason why it is important to split up larger files is to allow the software to run at multiple threads and process all the websites much faster.
10 Configuring the Domain Filters
The next step is to configure the domain name filters. Go to "More Settings" on the main interface, then select the "Domain Filters" tab. The first column should contain a list of keywords that the url must contain and the second column should contain a list of keywords that the URL should NOT contain. You have to enter one keyword per line, no separators. In essence, what we are doing here is narrowing down the relevancy of the results. For example, if I am searching for cryptocurrency websites, then I would add the following keywords to the first column:
Most websites will contain these words in the url. However, the domain filter MUST CONTAIN column presupposes that you know your niche quite well. For some niches, it is fairly easy to come up with a list of keywords. Others may be more tricky. In the second column, you can enter the keywords and website extensions that the software should avoid. These are the keywords that are guaranteed to be spammy. We are constantly working on expanding our list of spam keywords. The third column contains a list of blacklisted sites that should not be scraped. Most of the time, this will include massive sites from which you cannot extract value. Some people prefer to add all the sites that are in the Majestic million. I think that it is enough to add the sites that will definitely not pass you any value. Ultimately, it is a judgement call as to what you want and do not want to scrape.
11 Configuring the Content Filters
The limitation with the domain filters discussed above is that not every website will necessarily contain your keywords. For example, there are many brands that do not necessarily contain the keywords in the domain. This is where the "Content Filter" comes in. The role of the content filter is to check a website's meta title, meta description and if you wish, the html code and the visible body text. By default, the software will only scan the meta title and meta description of each website and check whether it contains your keyword. Additionally, you can also get the software to check the body text and html code for your keywords as well. However, this will produce very expansive results which may be less relevant. You can also tell the software to check and scrape websites that contain a certain number of your keywords (you can specify it). The idea behind this content filter is that it will only scrape websites that contain your keywords in the meta title and description. Usually, all relevant websites will contain your keywords in the meta fields. So if you choose to search the meta title, meta description and the html code and visible text for your keywords, the software will scrape a website if it contains your keywords in either of the places. It is recommended that you spend some time thinking about your keywords. You should also decide whether you would like to use the domain filters and content filters. Generally, it is quite enough to use one set of filters. I usually go just for the content filters. This content filter is what makes this email extractor and search engine scraper the most powerful scraping tool on the market.
12 Configuring the Main Settings in the Main Interface
Enter your project name, keywords and then select "Crawl and Scrape E-Mails from Search Engines" or "Scrape E-Mails from your Website List". If you are doing both, you can select both options. Otherwise, most people would select the former option. Select "Use Proxies" if you are going to use proxies. You can select "Invisible Mode" if you do not want the software to open the browser windows. As the software performs data scraping inside browser windows, it would normally bring the browser windows up and you could see the entire scraping process in real time view. However, most people prefer to hide the browser windows as they tend to interfere with their work. You can run the software in "Fast Mode" and configure the number of threads. "Sub Scrapers" mean every source. For example, Google, Bing, Google Maps, etc are Sub Scrapers. Then you should select the number of "threads per scraper". This means how many keywords you would like to process at the same time per website/source. For example, if I select 3 sub scrapers and 2 threads per scraper, this would mean that the software would scrape Google, Bing and Google Maps at 2 keywords per website. So, the software would simultaneously scrape Google for 2 keywords, Bing for 2 keywords and Google Maps for 2 keywords. This scraper consumes a fair amount of CPU and processing power so it is recommended to keep your threads relatively low, unless you are running your software on a powerful Windows VPS or a dedicated server or maybe even a gaming laptop. You should really only be using the "integrated web browser" if you are using a VPN such as Nord VPN or Hide my Ass VPN (HMA VPN). I do not recommend VPNs as they tend to be unreliable. The "Delay Request in Milliseconds" helps to keep the scraping activity relatively "human" and helps to avoid IP bans. You can also "delete results without emails". The software will not save data for websites that do not have emails.
13 Running the Scraper (invisible mode)
Once your settings are configured, this is how the software should run.
13 Running the Scraper (visible mode)
This is how the scraper looks like when it is running in a visible mode.
14 Email List Cleaner
Once the software has finished scraping your data, the next step would be to clean the entire email list according to your filter. At the bottom of the software, click on the pink button titled "Email List Cleaner". Let me give you a quick run through of what each filter means:
- "Email Must match Domain" - this is a filter to filter out all the generic and non-company emails such as gmail, yandex, mail.ru, yahoo, protonmail, aol, virginmedia and so on. A lot of website owners put their personal emails on the website and social media. This filter is especially helpful for complying with the GDPR and similar data and privacy laws.
- "Only Save One Email per Domain Name" - some domains / websites have a couple of e-mails, one for customer service, one for marketing, one for returns and so on. This option will save only one email as you would not want to contact the same company many times. This is your spam reduction/control filter if you will.
- "Remove the Duplicated Emails" - by default, the scraper will remove all the duplicate emails. This is a precautionary filter.
- "Enter a list of keywords that part of the email must contain (either in the username or the domain name" - this should be your list of keywords that you would like to see in the email. For cryptocurrency sites, I would want to see keywords such as crypto, coin, chain, block, finance, tech, bit, etc. However, as was the case with the domain filter above, not all emails will necessarily contain your set of keywords.
- "Enter a list of keywords that the e-mail username must contain" - here our aim is to increase the relevancy of our emails and reduce spam at the same time. For example, I may want to contact all emails starting with info, hello, sayhi, etc.
- "Enter a list of keywords, symbols or characters that the e-mail MUST NOT contain" - this is a filter to weed out spam emails and honey traps. For example, it is obvious that we would have a non-functional email if we have any of these characters inside it: !"£$%^&*()_+=
14 b) Email List Cleaner - Export Data vs. Export Emails
Once you have cleaned your email list, you can export all the data and also Export Emails in a separate file. Usually, it is a good idea to "Export Emails" if you intend to use email addresses for e-mail or newsletter marketing. The emails will be saved in a .csv file, one email per row. This makes it very easy to copy and move emails. DO NOTE: YOU CAN ALWAYS IMPORTED YOUR ORIGINAL SCRAPED DATA AND CLEAN IT. THE SOFTWARE DOES NOT AUTOMATICALLY APPLY THESE EMAIL FILTERS JUST IN CASE YOU WANT TO CHANGE SOMETHING AT A LATER STAGE. MAKE SURE TO SAVE THE FILTERED EMAILS USING A SLIGHTLY DIFFERENT NAME WITHOUT REPLACING THE MAIN FILE.
15 a) Locating your data scraping results folder
As I mentioned previously, the software will automatically create a results folder for every project using the project name. Inside that folder, you will find the entire set of scraped results titled using your project name.
15 b) Opening your Excel Files
Sometimes, double clicking on an Excel .CSV file may not open it. Here is what you should do. Open Excel, click on "Data" tab and select "From text/CSV", navigate to the folder containing the scraped data.
15 c) Exporting Just Emails
As I mentioned earlier, when you click on the "Email List Cleaner" pink button at the bottom of the main GUI, you can export all the data in a standard Excel .csv file or you can export just all the emails (one email per row). It is helpful to export just the emails if you plan to use the scraped data just for newsletters and emails.
16 Resuming your previous search in case the Scraper closes or crashes unexpectedly
In case the software is closed unexpectedly or crashes for some reason, you will be able to resume the previous search before the app crashed. Just re-open the software and select the "Complete Previous Search" checkbox next to the green start button. The software will resume the previous search right before the software crashed or closed unexpectedly.
17 Adding your Facebook and LinkedIn login details
When the email extractor scrapes websites from the search engines, to get the contact data from a website, the software would check to determine if it can find all the information it is looking for directly on the website. However, if the website scraper cannot find some contact information on the website, it will proceed to check LinkedIn, Facebook, Twitter and Instagram for the missing contact details. LinkedIn and Facebook tend to block/disallow access to the company pages where all the data is stored, unless a person is logged in. It is therefore important that you enter your login details for LinkedIn and Facebook so that the software can scrape these two social media networks too as they contain a lot of valuable contact information. On the main GUI, click on the "More Settings" option, then navigate to "Save & Login Settings". Go down to the Accounts section and enter your login details. Do not forget to click on "Update". The software will now use your login details to access Facebook and LinkedIn.
Troubleshooting: Common Problems and how to resolve them
Search Engine Scrape by Creative Bear Tech has stopped working
If you are running your copy of the search engine scraper on a VPS or a dedicated server, it is not uncommon to see this error message. The software usually stops working due to low memory issues. It is recommended to either upgrade your VPS / dedicated server to higher specs or reduce the number of threads you are using.
Problem Event Name: CLR20r3
Problem Signature 01: scraping_Forms.exe
Problem Signature 02: 188.8.131.52
Problem Signature 03: 5ce3b892
Problem Signature 04: System.Drawing
Problem Signature 05: 4.7.3062.0
Problem Signature 06: 5ab9527e
Problem Signature 07: 1eb
Problem Signature 08: 18
Problem Signature 09: System.OutOfMemoryException
OS Version: 6.3.9600.2.0.0.272.7
Locale ID: 1033
Additional Information 1: 1ffa
Additional Information 2: 1ffa1eb9f76f3c3f0830add71925c64e
Additional Information 3: 37fc
Additional Information 4: 37fc2c7b82d67e997a6c9f4fc78dacce
Read our privacy statement online:
If the online privacy statement is not available, please read our privacy statement offline:
Random Pop Up Windows
Sometimes, you may notice random pop up windows. These are just website generated messages. Just click ok and close them all. They do not interfere with the scraping process. Our devs are working on making the pop ups close automatically.
The website scraper tries to save a file from a website
Sometimes, the website scraper will try to save a file from a website onto your local disk. Just click on cancel whenever this happens. Please note, this will not accept the scraping. Our devs are looking for a solution to get the website scraper to automatically close the windows.