Dynamic content webscraping using the Human Emulator

Dynamic content is site content that is loaded after the main page of the site is loaded. This can be any data that is loaded onto the page, for example, using JS scripts. If you look at the source code of such a page, then this data will not be there. Data loading using scripts is used by online stores, sites for displaying various financial data, statistics sites, banking sites, etc. Parsing such sites without the participation of a browser will be a rather time-consuming process, since you have to understand how scripts work and try to emulate requests in order to simulate the operation of a browser. This requires some knowledge and a lot of time. Moreover, this work will have to be done for each site from which you need to parse such data.

Parsing dynamic content in the Human Emulator is much easier, since all content is loaded into the built-in browser and you don’t need to imitate anything, you just need to parse the data of interest. The algorithm of actions is very simple:

go to a web page
waiting for data loading
parse a data

As an example, we will get the Key data from the Nasdaq website for a few stocks as MSFT, BABA, AAPL and AMZN.

Php example:

// stocks for collect data
$stocks = array("MSFT", "BABA", "AAPL", "AMZN");

// go to nasdaq website
$browser->navigate("https://www.nasdaq.com/market-activity/stocks");

// for each stock
foreach($stocks as $stock)
{
   // enter stock name
   $input->send_keyboard_input_by_name("q",$stock."\n", "20:40");

   sleep(3);
   // get all rows with data 
   $dts = $tr->get_all_by_attribute("class","summary-data__row", true);
   // show collected data
   print_r($dts->get_inner_text());

}

// stocks for collect data

$stocks = array("MSFT", "BABA", "AAPL", "AMZN");

// go to nasdaq website

$browser->navigate("https://www.nasdaq.com/market-activity/stocks");

// for each stock

foreach($stocks as $stock)

{

// enter stock name

$input->send_keyboard_input_by_name("q",$stock."\n", "20:40");

sleep(3);

// get all rows with data

$dts = $tr->get_all_by_attribute("class","summary-data__row", true);

// show collected data

print_r($dts->get_inner_text());

}

Result is:

Array
(
&#91;0] => Exchange NASDAQ-GS
&#91;1] => Sector Technology
&#91;2] => Industry Computer Software: Prepackaged Software
&#91;3] => 1 Year Target $230.00
&#91;4] => Today`s High/Low $210.99/$205.54
&#91;5] => Share Volume 33,154,781
&#91;6] => Average Volume 34,157,587
&#91;7] => Previous Close $212.46
&#91;8] => 52 Week High/Low $232.86/$132.52
&#91;9] => Market Cap 1,560,374,358,668
&#91;10] => P/E Ratio 35.86
&#91;11] => Forward P/E 1 Yr. 32.27
&#91;12] => Earnings Per Share(EPS) $5.75
&#91;13] => Annualized Dividend $2.24
&#91;14] => Ex Dividend Date Nov 18, 2020
&#91;15] => Dividend Pay Date Dec 10, 2020
&#91;16] => Current Yield 0.99%
&#91;17] => Beta 1
)
Array
(
&#91;0] => Exchange NYSE
&#91;1] => Sector Miscellaneous
&#91;2] => Industry Business Services
&#91;3] => 1 Year Target $325.00
&#91;4] => Today`s High/Low $291.98/$286.51
&#91;5] => Share Volume 11,482,752
&#91;6] => Average Volume 15,465,812
&#91;7] => Previous Close $290.05
&#91;8] => 52 Week High/Low $299.00/$161.68
&#91;9] => Market Cap 779,683,299,022
&#91;10] => P/E Ratio 31.05
&#91;11] => Forward P/E 1 Yr. 38.02
&#91;12] => Earnings Per Share(EPS) $9.28
&#91;13] => Annualized Dividend N/A
&#91;14] => Ex Dividend Date N/A
&#91;15] => Dividend Pay Date N/A
&#91;16] => Current Yield N/A
&#91;17] => Beta 1
)
Array
(
&#91;0] => Exchange NASDAQ-GS
&#91;1] => Sector Technology
&#91;2] => Industry Computer Manufacturing
&#91;3] => 1 Year Target $123.12
&#91;4] => Today`s High/Low $115.37/$112.22
&#91;5] => Share Volume 144,711,986
&#91;6] => Average Volume 184,737,725
&#91;7] => Previous Close $116.79
&#91;8] => 52 Week High/Low $137.98/$53.15
&#91;9] => Market Cap 1,959,466,166,800
&#91;10] => P/E Ratio 34.38
&#91;11] => Forward P/E 1 Yr. 34.88
&#91;12] => Earnings Per Share(EPS) $3.29
&#91;13] => Annualized Dividend $0.82
&#91;14] => Ex Dividend Date Aug 7, 2020
&#91;15] => Dividend Pay Date Aug 13, 2020
&#91;16] => Current Yield 0.73%
&#91;17] => Beta 1
)
Array
(
&#91;0] => Exchange NASDAQ-GS
&#91;1] => Sector Consumer Services
&#91;2] => Industry Catalog/Specialty Distribution
&#91;3] => 1 Year Target $3,700.00
&#91;4] => Today`s High/Low $3,195.80/$3,123.00
&#91;5] => Share Volume 5,613,098
&#91;6] => Average Volume 4,801,254
&#91;7] => Previous Close $3221.26
&#91;8] => 52 Week High/Low $3,552.25/$1,626.03
&#91;9] => Market Cap 1,565,280,159,375
&#91;10] => P/E Ratio 120.15
&#91;11] => Forward P/E 1 Yr. 98.24
&#91;12] => Earnings Per Share(EPS) $26.01
&#91;13] => Annualized Dividend N/A
&#91;14] => Ex Dividend Date N/A
&#91;15] => Dividend Pay Date N/A
&#91;16] => Current Yield N/A
&#91;17] => Beta 1
)

Array

(

[0] => Exchange NASDAQ-GS

[1] => Sector Technology

[2] => Industry Computer Software: Prepackaged Software

[3] => 1 Year Target $230.00

[4] => Today`s High/Low $210.99/$205.54

[5] => Share Volume 33,154,781

[6] => Average Volume 34,157,587

[7] => Previous Close $212.46

[8] => 52 Week High/Low $232.86/$132.52

[9] => Market Cap 1,560,374,358,668

[10] => P/E Ratio 35.86

[11] => Forward P/E 1 Yr. 32.27

[12] => Earnings Per Share(EPS) $5.75

[13] => Annualized Dividend $2.24

[14] => Ex Dividend Date Nov 18, 2020

[15] => Dividend Pay Date Dec 10, 2020

[16] => Current Yield 0.99%

[17] => Beta 1

)

Array

(

[0] => Exchange NYSE

[1] => Sector Miscellaneous

[2] => Industry Business Services

[3] => 1 Year Target $325.00

[4] => Today`s High/Low $291.98/$286.51

[5] => Share Volume 11,482,752

[6] => Average Volume 15,465,812

[7] => Previous Close $290.05

[8] => 52 Week High/Low $299.00/$161.68

[9] => Market Cap 779,683,299,022

[10] => P/E Ratio 31.05

[11] => Forward P/E 1 Yr. 38.02

[12] => Earnings Per Share(EPS) $9.28

[13] => Annualized Dividend N/A

[14] => Ex Dividend Date N/A

[15] => Dividend Pay Date N/A

[16] => Current Yield N/A

[17] => Beta 1

)

Array

(

[0] => Exchange NASDAQ-GS

[1] => Sector Technology

[2] => Industry Computer Manufacturing

[3] => 1 Year Target $123.12

[4] => Today`s High/Low $115.37/$112.22

[5] => Share Volume 144,711,986

[6] => Average Volume 184,737,725

[7] => Previous Close $116.79

[8] => 52 Week High/Low $137.98/$53.15

[9] => Market Cap 1,959,466,166,800

[10] => P/E Ratio 34.38

[11] => Forward P/E 1 Yr. 34.88

[12] => Earnings Per Share(EPS) $3.29

[13] => Annualized Dividend $0.82

[14] => Ex Dividend Date Aug 7, 2020

[15] => Dividend Pay Date Aug 13, 2020

[16] => Current Yield 0.73%

[17] => Beta 1

)

Array

(

[0] => Exchange NASDAQ-GS

[1] => Sector Consumer Services

[2] => Industry Catalog/Specialty Distribution

[3] => 1 Year Target $3,700.00

[4] => Today`s High/Low $3,195.80/$3,123.00

[5] => Share Volume 5,613,098

[6] => Average Volume 4,801,254

[7] => Previous Close $3221.26

[8] => 52 Week High/Low $3,552.25/$1,626.03

[9] => Market Cap 1,565,280,159,375

[10] => P/E Ratio 120.15

[11] => Forward P/E 1 Yr. 98.24

[12] => Earnings Per Share(EPS) $26.01

[13] => Annualized Dividend N/A

[14] => Ex Dividend Date N/A

[15] => Dividend Pay Date N/A

[16] => Current Yield N/A

[17] => Beta 1

)

If you need collect data in real time you just run the script each 10 seconds and get all results with changing.

For example we need to collect data of Stock Activity.

Then the our php script will be:

// go to nasdaq website
$browser->navigate("https://www.nasdaq.com/market-activity/stocks");
// set vertical scroll for loading data
$browser->set_vertical_scroll_pos(500);

sleep(1);
// get all tables rows of Stock Activity
$trs = $tr->get_all_by_attribute("class","mini-asset-class-tables__row m", false);
// show data in debug panel
print_r($trs->get_inner_text());

// go to nasdaq website

$browser->navigate("https://www.nasdaq.com/market-activity/stocks");

// set vertical scroll for loading data

$browser->set_vertical_scroll_pos(500);

sleep(1);

// get all tables rows of Stock Activity

$trs = $tr->get_all_by_attribute("class","mini-asset-class-tables__row m", false);

// show data in debug panel

print_r($trs->get_inner_text());

Result is:

Array
(
&#91;0] => MYOK MyoKardia, Inc.
$220.40
 80.80
 57.88%
&#91;1] => EIDX Eidos Therapeutics, Inc.
$68.79
 16.87

 32.49%
&#91;2] => CRVS Corvus Pharmaceuticals, Inc.
$5.28
 1.21
 29.93%
&#91;3] => OIIM O2Micro International Limited
$3.95
 0.55
 16.35%
&#91;4] => AOSL Alpha and Omega Semiconductor Limited
$14.95
 1.88
 14.42%
&#91;5] => HMHC Houghton Mifflin Harcourt Company
$2.32
 -0.17
 -6.83%
&#91;6] => BBIO BridgeBio Pharma, Inc.
$37.14
 -2.46
 -6.21%
&#91;7] => NCMI National CineMedia, Inc.
$2.63
 -0.17
 -6.18%
&#91;8] => LMNL Liminal BioSciences Inc.
$10.37
 -0.65
 -5.9%
&#91;9] => GILT Gilat Satellite Networks Ltd.
$4.98
 -0.23
 -4.41%
&#91;10] => AAPL Apple Inc.
$114.19
 1.17
 6,561,775
&#91;11] => AAL American Airlines Group, Inc.
$12.85
 -0.14
 5,198,792
&#91;12] => CRVS Corvus Pharmaceuticals, Inc.
$5.28
 1.21
 4,385,358
&#91;13] => TSLA Tesla, Inc.
$427.50
 12.41
 3,874,375
&#91;14] => NKLA Nikola Corporation
$24.91
 0.66
 3,482,596
&#91;15] => TSLA Tesla, Inc.
$424.20
 9.11
 2.19%
&#91;16] => AMZN Amazon.com, Inc.
$3142.70
 17.70
 0.57%
&#91;17] => AAPL Apple Inc.
$114.06
 1.04
 0.92%
&#91;18] => MYOK MyoKardia, Inc.
$220.54
 80.94
 57.98%
&#91;19] => REGN Regeneron Pharmaceuticals, Inc.
$612.17
 47.37
 8.39%
)

Array

(

[0] => MYOK MyoKardia, Inc.

$220.40

80.80

57.88%

[1] => EIDX Eidos Therapeutics, Inc.

$68.79

16.87

32.49%

[2] => CRVS Corvus Pharmaceuticals, Inc.

$5.28

1.21

29.93%

[3] => OIIM O2Micro International Limited

$3.95

0.55

16.35%

[4] => AOSL Alpha and Omega Semiconductor Limited

$14.95

1.88

14.42%

[5] => HMHC Houghton Mifflin Harcourt Company

$2.32

-0.17

-6.83%

[6] => BBIO BridgeBio Pharma, Inc.

$37.14

-2.46

-6.21%

[7] => NCMI National CineMedia, Inc.

$2.63

-0.17

-6.18%

[8] => LMNL Liminal BioSciences Inc.

$10.37

-0.65

-5.9%

[9] => GILT Gilat Satellite Networks Ltd.

$4.98

-0.23

-4.41%

[10] => AAPL Apple Inc.

$114.19

1.17

6,561,775

[11] => AAL American Airlines Group, Inc.

$12.85

-0.14

5,198,792

[12] => CRVS Corvus Pharmaceuticals, Inc.

$5.28

1.21

4,385,358

[13] => TSLA Tesla, Inc.

$427.50

12.41

3,874,375

[14] => NKLA Nikola Corporation

$24.91

0.66

3,482,596

[15] => TSLA Tesla, Inc.

$424.20

9.11

2.19%

[16] => AMZN Amazon.com, Inc.

$3142.70

17.70

0.57%

[17] => AAPL Apple Inc.

$114.06

1.04

0.92%

[18] => MYOK MyoKardia, Inc.

$220.54

80.94

57.98%

[19] => REGN Regeneron Pharmaceuticals, Inc.

$612.17

47.37

8.39%

)

It is not difficult to guess in this way you can receive data for any period of time. At the same time, using the script schedule, you can run such a script before opening trades and receive data during the entire time the trades are in progress.

Dynamic content webscraping using the Human Emulator

Leave a Reply Cancel reply