Hi
I need to extract email addresses from a webpages. Results can be saved into a CSV file.
How can I do it with the Human Emulator?
Please help extract email from a sites
Re: Please help extract email from a sites
It depends on a programming language which you wish using in your script and a HTML code a webpage where you want extract email addresses.
Re: Please help extract email from a sites
I using php
Re: Please help extract email from a sites
Example of script - Scraping Email Address . The logic of the script:
1. Get keywords from a file.
2. Insert the keywords into the search engine Google.
3. Grab websites from a Google search results.
4. Go to websites and search page Contacts or About us.
5. Extract emails to a TXT file.
Download script in russian: http://www.x-scripts.com/scripts/downlo ... ?script=23
1. Get keywords from a file.
2. Insert the keywords into the search engine Google.
3. Grab websites from a Google search results.
4. Go to websites and search page Contacts or About us.
5. Extract emails to a TXT file.
Code: Select all
<?php
$xhe_host ="127.0.0.1:7010";
// The following code is required to properly run XWeb Human Emulator
require("../../Templates/xweb_human_emulator.php");
// //////////////////////// settings /////////////////////////
// data file for the script
$keys = file("data/keys.txt");
// the results file
$file_res="res/email.txt";
// depth of passage in search results
$cnt_pages = 10;
// current page
$crnt_page =1;
// the script runs in debug mode
$dbg = true;
// //////////////////////// additional functions///////////////
require_once("functions.php");
// /////////////////////// script ///////////////////////////////////////////
debug_mess(date("\[ d.m.y H:i:s\] ")." start script");
// count
for($ii=0;$ii<count($keys);$ii++)
{
// get search query
$key = trim($keys[$ii]);
// navigate to google
$browser->navigate("google.com");
// set the word to search
$input->set_value_by_name("q",$key);
$input->click_by_name("q");
// press the space bar to disable the google tooltip
$keyboard->send_key(32,true);
// press enter
$keyboard->send_key(13,true);
// wait
sleep(3);
// reset to zero before the next pass
$crnt_page=1;
while(true)
{
// get all links to sites enclosed in tags <cite>
$sites=$webpage->get_body_inter_prefix_all("<cite>","</cite>");
$sites=explode("<br>",$sites);
// let's go through all the links received
for($i=0;$i<count($sites);$i++)
{
// go to the website
$site=str_replace("<b>","",trim($sites[$i]));
$site=str_replace("</b>","",$site);
if($site=="")
continue;
// output to debug panel
debug_mess("link : ".$site);
// open and make a new browser active
$browser->set_count(2);
$browser->set_active_browser(1,true);
// go to the website
$browser->navigate($site);
sleep(1);
// go to contacts
$anchor->click_by_inner_text("contacts");
$anchor->click_by_inner_text("Contacts");
$anchor->click_by_inner_text("About us");
$anchor->click_by_inner_text("about us");
sleep(2);
// looking for all email on the page
preg_match_all('#[\w\d.-_]+@([\w\d.-_]+\.)+[a-zA-Z]{2,6}#i', $webpage->get_source(), $matches);
// let's go through the results
foreach ($matches[0] as $key=>$value)
{
// remove the excess
$str_mail=str_replace(">","",$value);
$str_mail=str_replace("<","",$str_mail);
$str_mail=str_replace("mailto:","",$str_mail);
$str_mail=str_replace("/","",$str_mail);
$str_mail=str_replace("mail:","",$str_mail);
// write to file
$textfile->add_string_to_file($file_res,trim($str_mail)."\n",60) ;
}
// close and go back
$browser->set_active_browser(0,true);
$browser->close_all_tabs();
// remove duplicates from file
dedupe($file_res);
}
// did not go to the next page
if(!next_page($crnt_page))
break;
}
}
debug_mess(date("\[ d.m.y H:i:s\] ")."the script is finished<br>");
// Quit
$app->quit();
?>
Download script in russian: http://www.x-scripts.com/scripts/downlo ... ?script=23
Re: Please help extract email from a sites
Excellent, thanks.