Extracting Dynamic Content from an iFrame with Selenium in Python

Accessing content inside an iFrame can be tricky, especially when the content is loaded dynamically. In this blog post, we’ll walk through an example of how to navigate an iFrame, click on an interactive tab, and save the loaded content to a file using Selenium in Python. This example is particularly useful when dealing with web applications that embed their core data or statistics in an iFrame, like in a sports stats dashboard.

Overview of the Problem

For our example, we’re looking at a page containing an iFrame with embedded player statistics. To fully capture this content, we’ll:

Access the main page containing the iFrame.
Switch into the iFrame to interact with its content.
Click on a tab within the iFrame to reveal the desired information.
Extract the HTML content and save it locally.

Implementing the Code

Here’s the code we’ll use, including detailed explanations of each part.

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

# Function to initialize the Chrome WebDriver
def setup_driver():
    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
    driver.maximize_window()
    return driver

# Function to access iFrame content, click on a tab, and save it
def fetch_iframe_content():
    driver = setup_driver()

    try:
        # Step 1: Open the page containing the iFrame
        url = 'https://estadisticascabb.gesdeportiva.es/partido/jYYXLuyG3WYVnbWC_RER9A==?a=1'
        driver.get(url)

        # Step 2: Wait until the iFrame loads
        WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.TAG_NAME, 'iframe')))
        
        # Step 3: Switch to the iFrame
        iframe = driver.find_element(By.TAG_NAME, 'iframe')
        driver.switch_to.frame(iframe)

        # Step 4: Click on the 'Statistics' tab within the iFrame
        stats_tab = WebDriverWait(driver, 10).until(
            EC.element_to_be_clickable((By.CSS_SELECTOR, "li.pestana-partido.pestana-estadisticas"))
        )
        stats_tab.click()

        # Step 5: Wait for the stats content to load within the tab
        WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.CSS_SELECTOR, "table.tabla-estadisticas"))
        )

        # Step 6: Get the loaded page source within the iFrame
        page_source = driver.page_source
        
        # Step 7: Save the content to a file
        with open("output.txt", "w", encoding='utf-8') as file:
            file.write(page_source)

        print("The iFrame content has been saved in 'output.txt'.")

    finally:
        # Close the driver
        driver.quit()

# Run the function
if __name__ == "__main__":
    fetch_iframe_content()

Code Breakdown

Setup the Driver: setup_driver() initializes the Chrome WebDriver using webdriver-manager to ensure that the ChromeDriver is correctly installed and configured.
Access the iFrame: We load the main page with driver.get() and wait for the iFrame to load with WebDriverWait. Using an explicit wait ensures that the code waits until the element we’re interacting with is ready.
Switch to the iFrame: After locating the iFrame element, we switch the WebDriver’s focus into it, allowing us to interact with elements inside.
Interact with the Tab: We use WebDriverWait and expected_conditions to wait until the desired tab is clickable. After it’s clicked, we add another wait to ensure the statistics content is fully loaded.
Extract and Save Content: With the page source loaded inside the iFrame, we save it to an HTML file, allowing us to inspect or parse it further.

Additional Tips and Considerations

Avoid Using Time-Based Waits: In many cases, time.sleep() isn’t ideal since it pauses for a fixed duration. Instead, WebDriverWait checks until the specific element appears, making the process more efficient.
Dynamic Content: If the content still doesn’t load as expected, you may need to identify a unique element within the statistics area and wait for it to appear. This ensures that you’re capturing the full data.

Potential Use Cases

This approach is ideal for scraping data from embedded content in web pages:

Sports Statistics: Websites often embed player stats and game results in iFrames.
Financial Dashboards: Interactive financial data or stock dashboards may be embedded and require specific access to extract.

Conclusion

Selenium makes it possible to interact with complex web structures like iFrames, allowing us to retrieve dynamic content. By integrating waits and switching between frames, we’ve created a solution that captures the data we need. This guide should serve as a strong foundation for similar projects involving embedded web content.