# Browser Automation with Node.js

You can leverage headless browser automations within Pipedream workflows for web scraping, generating screenshots, or programmatically interacting with websites - even those that make heavy usage of frontend Javascript.

Pipedream manages a specialized package (opens new window) that includes Puppeteer and Playwright bundled with a specialized Chromium instance that's compatible with Pipedream's Node.js Execution Environment.

All that's required is importing the @pipedream/browsers (opens new window) package into your Node.js code step and launch a browser. Pipedream will start Chromium and launch a Puppeteer or Playwright Browser instance for you.

# Usage

The @pipedream/browsers package exports two modules: puppeteer & playwright. Both modules share the same interface:

  • browser(opts?) - method to instantiate a new browser (returns a browser instance)
  • launch(opts?) - an alias to browser()
  • newPage() - creates a new page instance and returns both the page & browser

# Puppeteer

First import the puppeteer module from @pipedream/browsers and use browser() or launch() method to instantiate a browser.

Then using this browser you can open new Pages (opens new window), which have individual controls to open URLs:

 import { puppeteer } from '@pipedream/browsers';

export default defineComponent({
  async run({steps, $}) {
    const browser = await puppeteer.browser();
    
    // Interact with the web page programmatically
    // See Puppeeter's Page documentation for available methods:
    // https://pptr.dev/api/puppeteer.page
    const page = await browser.newPage();

    await page.goto('https://pipedream.com/');
    const title = await page.title();
    const content = await page.content();

    $.export('title', title);
    $.export('content', content);

    // The browser needs to be closed, otherwise the step will hang
    await browser.close();
  },
})

# Screenshot a webpage

Puppeteer can take a full screenshot of a webpage rendered with Chromium. For full options see the Puppeteer Screenshot method documentation. (opens new window)

    # Generate a PDF of a webpage

    Puppeteer can render a PDF of a webpage. For full options see the Puppeteer Screenshot method documentation. (opens new window)

      # Scrape content from a webpage

      Puppeteer can scrape individual elements or return all content of a webpage.

        # Submit a form

        Puppeteer can also programmatically click and type on a webpage.

          # Playwright

          First import the playwright module from @pipedream/browsers and use browser() or launch() method to instantiate a browser.

          Then using this browser you can open new Pages (opens new window), which have individual controls to open URLs, click elements, generate screenshots and type and more:

          import { playwright } from '@pipedream/browsers';
          
          export default defineComponent({
            async run({steps, $}) {
              const browser = await playwright.browser();
              
              // Interact with the web page programmatically
              // See Playwright's Page documentation for available methods:
              // https://playwright.dev/docs/api/class-page
              const page = await browser.newPage();
          
              await page.goto('https://pipedream.com/');
              const title = await page.title();
              const content = await page.content();
          
              $.export('title', title);
              $.export('content', content);
          
              // The browser context and browser needs to be closed, otherwise the step will hang
              await page.context().close();
              await browser.close();
            },
          })
          

          Don't forget to close the Browser Context

          Playwright differs from Puppeteer slightly in that you have to close the page's BrowserContext before closing the Browser itself.

          // Close the context & browser before returning results
          // Otherwise the step will hang
          await page.context().close();
          await browser.close();
          

          # Screenshot a webpage

          Playwright can take a full screenshot of a webpage rendered with Chromium. For full options see the Playwright Screenshot method documentation. (opens new window)

            # Generate a PDF of a webpage

            Playwright can render a PDF of a webpage. For full options see the Playwright Screenshot method documentation. (opens new window)

              # Scrape content from a webpage

              Playwright can scrape individual elements or return all content of a webpage.

                # Submit a form

                Playwright can also programmatically click and type on a webpage.

                  # Frequently Asked Questions

                  # Can I use this package in sources or actions?

                  Yes, the same @pipedream/browsers package can be used in actions (opens new window) as well as sources (opens new window).

                  The steps are the same as usage in Node.js code. Open a browser, create a page, and close the browser at the end of the code step.

                  Memory limits

                  At this time it's not possible to configure the allocated memory to a Source. You may experience a higher rate of Out of Memory errors on Sources that use Puppeteer or Playwright due to the high usage of memory required by Chromium.

                  # Workflow exited before step finished execution

                  Remember to close the browser instance before the step finishes. Otherwise, the browser will keep the step "open" and not transfer control to the next step.

                  # Out of memory errors or slow starts

                  For best results, we recommend increasing the amount of memory available to your workflow to 2 gigabytes. You can adjust the available memory in the workflow settings (opens new window).

                  # Which browser are these packages using?

                  The @pipedream/browsers package includes a specific version of Chromium that is compatible with Pipedream Node.js execution environments that run your code.

                  For details on the specific versions of Chromium, puppeeter and playwright bundled in this package, visit the package's README (opens new window).

                  # How to customize puppeteer.launch()?

                  To pass arguments to puppeteer.launch() to customize the browser instance, you can pass them directly to puppeteer.browser().

                  For example, you can alter the protocolTimeout length just by passing it as an argument:

                  import { puppeteer } from '@pipedream/browsers';
                  
                  export default defineComponent({
                    async run({steps, $}) {
                      // passing a `protocolTimeout` argument to increase the timeout length for a puppeteer instance
                      const browser = await puppeteer.browser({ protocolTimeout: 480000 });
                      // rest of code
                    },
                  })
                  
                  

                  Please see the @pipedream/browsers source code (opens new window) for the default arguments that Pipedream provides.

                  # How do I use puppeteer.connect()?

                  To use puppeteer.connect() to connect to a remote browser instance, you can use the puppeteer-core (opens new window) package:

                  import puppeteer from "puppeteer-core";
                  

                  puppeteer-core does not download Chrome when installed, which decreases the size of your deployment and can improve cold start times.

                  To connect to a remote browser instance using Playwright, you can use the playwright-core (opens new window) package, which is the no-browser Playwright package:

                  import playwright from "playwright-core";