Skip to main content

Fetch HTML Node

The Fetch HTML Node in Fleak is used to retrieve the HTML content of a specified web page. It can return the raw HTML as a string or extract only the visible text content. This node is useful for web content analysis and basic data extraction tasks. However, it is essential to use this node responsibly and adhere to standard internet protocols and legal guidelines.

Key Functions

  • URL Fetching: Downloads the HTML content of a web page from a given URL.

  • Content Processing:

    • Full HTML: Returns the complete HTML document, including all tags and structure.
    • Text-Only: Strips all HTML tags, returning only the visible text content of the page.

Usage Guidelines

  • Respect Internet Protocols: Ensure that your use of the Fetch HTML Node complies with the terms of service and robots.txt policies of the websites you are accessing.
  • No Proxy Services: The Fetch HTML Node does not provide proxy services, IP rotation, or other web scraping features that may be used to bypass site protections. Users are responsible for ensuring their activities are ethical and legal.
  • Permissions: Confirm that you have the appropriate permissions to access and extract content from the target website.

Configurations

  • URL Template: Enter the URL template. You can use the json path syntax like {{$.path.to.value}} to inject dynamic values.

  • Extract Only Text: Enable to extract text content from HTML and discard markup.

Example

This example extracts the visible text from a specific web page.

Step 1: Input an example URL inside the HTTP Data Input node.

Copy and paste the URL in JSON format inside the HTTP Data Input node.

{"url":"https://lu.ma/emr098l7"}

Step 2: Add a Fetch HTML node under the HTTP Data Input node.

  • Add the Fetch HTML node
  • Fill in the URL Template dynamically using string template and JSONPath.
{{$.url}}
  • Enable the ability to only extract text content.
  • Run the Fetch HTML node.
  • See the results
info