Page 1 of 1

Please help extract email from a sites

Posted: Wed Sep 25, 2019 10:25 pm
by petelius
I need to extract email addresses from a webpages. Results can be saved into a CSV file.
How can I do it with the Human Emulator?

Re: Please help extract email from a sites

Posted: Thu Sep 26, 2019 12:04 am
by support
It depends on a programming language which you wish using in your script and a HTML code a webpage where you want extract email addresses.

Re: Please help extract email from a sites

Posted: Thu Sep 26, 2019 12:16 am
by petelius
I using php

Re: Please help extract email from a sites

Posted: Thu Sep 26, 2019 6:45 pm
by support
Example of script - Scraping Email Address . The logic of the script:

1. Get keywords from a file.
2. Insert the keywords into the search engine Google.
3. Grab websites from a Google search results.
4. Go to websites and search page Contacts or About us.
5. Extract emails to a TXT file.

Code: Select all

$xhe_host ="";
// The following code is required to properly run XWeb Human Emulator
// //////////////////////// settings /////////////////////////
// data file for the script
$keys = file("data/keys.txt");
// the results file
// depth of passage in search results
$cnt_pages = 10;
// current page
$crnt_page =1; 
// the script runs in debug mode
$dbg = true;
// //////////////////////// additional functions///////////////
// /////////////////////// script ///////////////////////////////////////////
debug_mess(date("\[ d.m.y H:i:s\] ")." start script");
// count
	// get search query
	$key = trim($keys[$ii]);
   // navigate to google
   // set the word to search
   // press the space bar to disable the google tooltip
   // press enter
	// wait 
      // reset to zero before the next pass
		 // get all links to sites enclosed in tags <cite>
	        // let's go through all the links received
			// go to the website
			// output to debug panel
			debug_mess("link : ".$site); 
			// open and make a new browser active
         // go to the website
         // go to contacts
         $anchor->click_by_inner_text("About us");
         $anchor->click_by_inner_text("about us");
         // looking for all email on the page
			preg_match_all('#[\w\d.-_]+@([\w\d.-_]+\.)+[a-zA-Z]{2,6}#i', $webpage->get_source(), $matches);
			// let's go through the results
			foreach ($matches[0] as $key=>$value)
            // remove the excess
            // write to file
            $textfile->add_string_to_file($file_res,trim($str_mail)."\n",60) ;
         // close and go back
         // remove duplicates from file
		 // did not go to the next page
debug_mess(date("\[ d.m.y H:i:s\] ")."the script is finished<br>");
// Quit

Download script in russian: ... ?script=23

Re: Please help extract email from a sites

Posted: Thu Sep 26, 2019 9:02 pm
by petelius
Excellent, thanks.