Any Page Parsing Plugin

Introduction

Plugin for extracting data from websites. It's a great tool for marketers, store owners, and more. With its help, you can easily analyze the availability of products from competitors, monitor price changes, etc.
Using the web parser, you can quickly download product catalogs with the required characteristics. This feature will help you optimize your work with large amounts of data.
Note: A website made on Bubble.io (and some other websites) cannot be parsed with this plugin.

Important

❗
We cannot guarantee successful parsing of some pages. Many marketplaces or just large projects use protection against bots and parsing.
⚠️
Attention: Pages made on bubble.io (and possibly other no-code projects) cannot be processed due to the peculiarities of their system.

How To Setup

Place the Page Preview element on the page.

One Page Parsing

  1. Place an input element on the page. A link to the page that we will process will be entered in this field.
Image without caption
2. Place a PagePreview element on the page. In this element will be placed content of the parsed page.
Image without caption
3. Add a new workflow. You need to pass the link to the plugin action "Web Page Parser - Get HTML From One URL"
Image without caption
4. Set 2 states of our PagePreview element: ​
Image without caption
Image without caption
5. Passing states to our element.
Image without caption
6. For further actions, we need 3 data tables.
  • Data table for templates with one field: Title ( type: Text )
Image without caption
  • Data table for template fields: 1. Template ID ( type: Text ) - Here we will store the template id from the Templates table. 2. Class List ( type: Text ) - Here we will store data about the position of the field on the parsed page. 3. Title ( type: Text ) - Here we will store the title of the field.
Image without caption
  • Data table for fields selected direct from PagePreview element: 1. Title ( type: Text ) - Here we will store the title of the selected field. 2. Text ( type: Text ) - Here we will store the text content of the selected field.
Image without caption
7. Add new workflow
Image without caption
For Collecting Data From Page
Image without caption
Image without caption
:::hint Note: You can pass the name for the data as you like (for example, do everything through a popup with an input field)​ :::
For Creating Template
  1. Need to create a template
Image without caption
2. Add Field To Template
Image without caption
:::hint We got the unique id directly from the database. You can get them from wherever you like, for example, as on the demo page, from the dropdown. :::

Optional Functions

Finding tags on the page

  1. Add a dropdown to the page
Image without caption
2. Add a new step in the workflow.
Image without caption
3. Create a new state
Image without caption
4. Set value of this state
Image without caption
5. Set the source of the dropdown to this state
Image without caption
6. Add a new workflow
Image without caption
Now on change a tag in the dropdown, all elements of the parsed page with this tag will be highlighted

Multiple Page Parsing

⚠️
If the page was opened with single parsing, it is not a fact that with multiple parsing, requests will not be blocked for incomprehensible activity from IP.
❗
For multiple parsing of pages, you need at least 1 template and a CSV file with links. Due to differences in the HTML code structure, the same template can't be used for different sites.
βœ…
Example of correct .csv file: Download
  1. Place a file uploader on the page
Image without caption
2. Add a new workflow
Image without caption
3. Add action from the plugin
Image without caption
5. Set link from the file uploader
Image without caption
6. Add step in the workflow
Image without caption
7. Set data of action
Image without caption
❕
The template can be chosen in any way convenient for you (see demo page).
8. Now return data in a convenient format
  • As link to .csv file with data
Image without caption
Image without caption
Image without caption
  • As formatted text
Image without caption
Image without caption
Image without caption

Plugin Elements Properties

The plugin proprieties:
HTML Code Head - this field get HTML Code of "head" element from parsed page. Value for this field is returned from the action "Get HTML From One URL".
HTML Code Body - this field get HTML Code of "body" element from parsed page. Value for this field is returned from the action "Get HTML From One URL".
Image without caption

Page Preview Actions

  • Color Tags - color on the parsed page elements with the given tag name.
  • Unset selected fields - reload the page and discard all selected elements to default.
Image without caption

Page Preview Events

  • Click - Trigger event when Page Preview is clicked

Page Preview States

  • Field Text - return the inner text of clicked element from Page Preview element
  • Class List - return the full path of the element in DOM Tree

Plugin Actions

1. Get HTML From One URL

Image without caption

Input Fields

  • Url - Full URL (including http:// or https://) to the page to be parsed.
    • Type: Text

Returned Values

JSON Object with fields:
  • Head - return all HTML code as a text of <head></head> tag from parsed page.
    • Type: Text
  • Body - return all HTML code text of <body></body> tag from parsed page.
    • Type: Text
  • Tags - return a list of strings containing all tags from the page.​
    • Type: List of Text

2. Get Number Of Tags

Image without caption

Input Fields

  • HTML Code - the HTML code in which you need to find the tags.
    • Type: Text
  • Tag Name - name of the tag to find.
    • Type: Text

Returned Values

JSON Object with fields:
  • Num of entries - return number of tags in HTML Code.​
    • Type: Number

3. Generate Download Link From Data

This action is similar to "Generate Download Link From Multiple Parse"
Image without caption

Input Fields

  • Content - List of text contains content to write in the file.
    • Type: List of Text
  • Title - Name of rows.
    • Type: List of Text
  • File Title - Download file name.
    • Type: Text

Returned Values

JSON Object with fields:
  • Link - return an HTML<a href="ContentInBase64String" download="File Title.csv">Download CSV</a> tag as text.
    • Type: Text

4. Extract Links From CSV

Image without caption

Input Fields

  • CSV File - CSV file ( format .csv ) with one column containts links. Links must be complete (include http:// or https://)
    • Type: File

Returned Values

JSON Object with fields:
  • URL's - return an list of urls from the uploaded file
    • Type: List of Text

5. Get Data From Multiple URL

Image without caption

Input Fields

  • URL - List of links you want to parse. Action "Extract Links FROM CSV" return the necessary value or you can use a list of text with URLs.
    • Type: List of Text
  • Classes - Path list of elements in DOM tree.
    • Type: List of Text

Returned Values

JSON Object with fields:
  • Data - return a list of inner text of elements with Path indicates in Classes​
    • Type: List of Text

6. Generate Download Link From Multiple Parse

Image without caption

Input Fields

  • Data - Data's to write in CSV file. Action "Get Data From Multiple URL" return the necessary values.
    • Type: List of Text
  • Fields - Name of columns.
    • Type: List of Text
  • File Title - Download file name.
    • Type: Text
  • URLs - List of links from which the data was taken. Action "Extract Links From CSV" return the necessary value. Is writing in the first cell in a row to assign it with parsed page.
    • Type: List of Text

Returned Values

JSON Object with fields:
  • Link - return an HTML <a href="DataInBase64String" download="File Title.csv">Download CSV</a> tag as text.
    • Type: List of Text

7. Generate Output

Image without caption

Input Fields

  • Fields - List of names of fields( ex. from template )
    • Type: List of Text
  • URLs Data - Data from parsed pages. Action "Get Data From Multiple URL" return the necessary value.
    • Type: List of Text

Returned Values

JSON Object with fields:
  • Generated Text - return a list of texts Field: URLs Data for this field . Each field of an object is a text for one field.​
    • Type: List of Text

Troubleshooting

  1. Pages created on bubble.io ( and possibly on other no-code platforms ) are not supported.
  1. Many marketplaces or just large projects use protection against bots and parsing.
  1. Possible problems with the rendering of the page in the page preview element.
  1. Error handling is in process.

Demo to preview the settings: