SCRIPT: Parser of twitter.com posts

Example of automation scripts (PHP, C#, Java Script sand Phyton) and tutorials.
Post Reply
User avatar
support
Site Admin
Posts: 146
Joined: Fri Feb 22, 2019 3:42 pm

SCRIPT: Parser of twitter.com posts

Post by support » Thu Jun 18, 2020 2:25 pm

The script is a post collector from the specified twitter.com accounts.

Program version: Human Emulator Studio 7.0.50.

Browser: Chromium.

Logic :
1. Get the twitter account from the file and navigate to it.
2. We get the time of posts published in the account.
3. If we have the saved time and date of the last message collected for this account, take it form file; if not, take the date and time from the script settings.
4. We save all posts in a file with the name of the channel, which are suitable for the time of publication.
5. Take the next twitter account and repeat the previous actions until the accounts in the file run out.

The script consists of:
The launched file parser_twitter.php . In the tools folder in the functions.php[/ i] file with additional commands.
Folders and files with the results of the collection are recorded in the res folder. The file name twitter account.txt contains the collected messages from the twitter account in the format of one line one post.
In the data folder there is a file with twitter accounts for collecting posts chanels.txt in the format of one line one account. There are files with the date and time of the last collected post last_tme_ account name.txt .
The log folder contains the logs of the script. For each day, a separate log file.

Download the script:
parser_twitter_en.zip
(3.71 KiB) Not downloaded yet

Script settings: ]

// path to twitter accounts file
$arr_chanels = file("data/chanels.txt");
// results folder
$path_to_res = $debug->get_cur_script_folder().'res\\';
// starting time if there is no time last collected
$start_time = strtotime('-1 hours');

Script:

///////////////////////// скрипт ///////////////////////////////////////////

debug_mess("start script");

foreach ($arr_chanels as $twiter_chanel)
{
	debug_mess("work with account ".$twiter_chanel);
	// get the time of the last collected tweet from the channel
	if ($file_os->is_exist("data//last_tme_".trim($twiter_chanel).".txt"))
		$start_time=strtotime(trim($textfile->get_line_from_file("data//last_tme_".trim($twiter_chanel).".txt",false,0))); 

	// navigate to twitter
	$browser->navigate("https://twitter.com/".$twiter_chanel);
	// pause
	sleep($wt);

        $element->wait_element_exist_by_outer_html("<time datetime", false);

	// get all elements with tag time
	$arr_html=$element->get_all_by_tag("time");
	// show array in debug panel
	//print_r($arr_html->get_attribute("datetime"));
	// set $ii start number 0
	$ii=0;
	// skip pinned tweet
	if($span->is_exist_by_inner_text("Pinned tweet"))
	$ii=1;

	// to enter the cycle we get the time of the first
	$last_time=strtotime($arr_html[$ii]->get_attribute("datetime"));
	// save the time of the latest tweet
	if($start_time<$last_time)
		$textfile->write_file('data/last_tme_'.trim($twiter_chanel).'.txt',$arr_html[$ii]->get_attribute("datetime")); 

	debug_mess("start date :".date('d.m.Y H:i', $start_time));
	debug_mess("date of last tweet collected :".date('d.m.Y H:i', $last_time));
	// get the right amount of tweets by time
	while($start_time<$last_time)
	{
		// getting the DOMinterface object
		$in_html = $arr_html[$ii];
		// 
		if(!$in_html)
		{
			// Find the "time" element by tag
			$arr_html=$element->get_all_by_tag("time");
			//print_r($arr_html->get_attribute("datetime"));
			// get last always added in new array
			$ii = array_search($last_value, $arr_html->get_attribute("datetime"));
			// next
			$ii ++;
			continue;
		}

		// tweet time 
		$last_value = $arr_html[$ii]->get_attribute("datetime");
		// set focus
		$in_html->focus();
		// pause
		sleep($wt);
		
		// parent div
		$parent = $in_html->get_parent(6);
		// get post text
		$inner_html=$parent->get_child_by_number(1)->get_child_by_number(0)->get_inner_text();
		
                // get tweet time
		debug_mess("publication time : ".$last_value);
                debug_mess("Save the tweet text to a file");
		// save tweet text to channel file
		$textfile->add_string_to_file($path_to_res."//".trim($twiter_chanel).".txt", trim($inner_html)."\r\n", 60);

		// Tweet time to compare with start date
		$last_time = strtotime($last_value);

		// next tweet
		$ii++;
	}
}

debug_mess("script finished work");

// Quit
$app->quit();

Post Reply