getPageSource()

Estimated reading: 3 minutes 34 views

Overview

In Selenium WebDriver, the getPageSource() method retrieves the entire page source of the current web page, including the HTML content. It returns the source code of the page as a string.

Syntax

				
					String pageSource = driver.getPageSource();

Usage

Asserting Page Content:

				
					// Assuming 'driver' is an instance of WebDriver
String pageSource = driver.getPageSource();
System.out.println("Page Source: " + pageSource);

Asserting Page Content:

				
					// Assuming 'driver' is an instance of WebDriver
String expectedContent = "<title>Example Title</title>";
String pageSource = driver.getPageSource();
Assert.assertTrue(pageSource.contains(expectedContent));

Example

				
					public class GetPageSourceExample {
    public static void main(String[] args) {
        // Set path to the ChromeDriver executable
        System.setProperty("webdriver.chrome.driver", 
                                  "path/to/chromedriver");

        // Initialize ChromeDriver
        WebDriver driver = new ChromeDriver();

        // Open a webpage
        driver.get("https://www.selenium.dev");

        // Retrieve page source
        String pageSource = driver.getPageSource();
        System.out.println("Page Source: " + pageSource);

        // Close the browser
        driver.quit();
    }
}

Importance

1. Debugging:

getPageSource() is an essential tool for debugging web pages during automation. By retrieving the full HTML source, you can check the structure and content of the page at any given moment, helping you understand why a test might be failing (for example, missing elements, incorrect text, etc.).

2. Verifying Content:

It allows you to verify the presence of specific content or elements on the page, such as checking if certain text exists in the HTML, which can be useful for assertions or validating that the page is rendered correctly.

3. Automating Dynamic Content Checks:

In dynamic web applications, where the content is frequently changing (such as SPAs or AJAX-driven pages), you can use getPageSource() to inspect the raw HTML to ensure that elements are present or updated correctly after specific interactions.

Limitations

1. Large Page Source:

For large or complex web pages, the page source returned by getPageSource() can be quite large, making it difficult to inspect manually. This might lead to performance issues or difficulty in finding the relevant content within the source code.

2. Does Not Reflect Real-time DOM:

The page source returned by getPageSource() is a snapshot of the page’s HTML at the moment the method is called. If the page is dynamically updated using JavaScript or AJAX, getPageSource() may not always reflect the live state of the page unless the updates have been committed to the DOM at that point.

3. Performance Overhead:

In cases where you call getPageSource() repeatedly within your tests, especially for pages with heavy dynamic content, it can introduce performance overhead as the browser needs to process and return the entire HTML content each time.

4. Limited to HTML Content:

The method only returns the HTML source and does not include other content like images, scripts, stylesheets, or JavaScript execution results. It won’t show changes made to the page after JavaScript has modified the DOM (unless those changes are part of the HTML at the time the method is called).

Conclusion

The getPageSource() method in Selenium WebDriver is a valuable tool for retrieving the HTML content of a page. It’s especially useful for debugging, content verification, and inspecting dynamic pages. While it provides a snapshot of the page’s HTML at a specific moment, it’s important to understand its limitations, such as performance overhead and its inability to capture real-time changes made by JavaScript.