Parsing a site with dynamic content

Dynamic content webscraping using the Human Emulator

Dynamic content is site content that is loaded after the main page of the site is loaded. This can be any data that is loaded onto the page, for example, using JS scripts. If you look at the source code of such a page, then this data will not be there. Data loading using scripts is used by online stores, sites for displaying various financial data, statistics sites, banking sites, etc. Parsing such sites without the participation of a browser will be a rather time-consuming process, since you have to understand how scripts work and try to emulate requests in order to simulate the operation of a browser. This requires some knowledge and a lot of time. Moreover, this work will have to be done for each site from which you need to parse such data.

Parsing dynamic content in the Human Emulator is much easier, since all content is loaded into the built-in browser and you don’t need to imitate anything, you just need to parse the data of interest. The algorithm of actions is very simple:

  • go to a web page
  • waiting for data loading
  • parse a data

As an example, we will get the Key data from the Nasdaq website for a few stocks as MSFT, BABA, AAPL and AMZN.

Php example:

Result is:

If you need collect data in real time you just run the script each 10 seconds and get all results with changing.

For example we need to collect data of Stock Activity.

Then the our php script will be:

Result is:

It is not difficult to guess in this way you can receive data for any period of time. At the same time, using the script schedule, you can run such a script before opening trades and receive data during the entire time the trades are in progress.