Skip to content
>GLB_
Go back

Debugging Spark DataFrame .show() Timeouts in PyCharm and VSCode

When working with PySpark, one of the first commands developers use to quickly inspect data is:

raw_df.show()

However, in certain environments (especially when running inside PyCharm or VSCode with a debugger), you may encounter a warning like the following:

Evaluating: raw_df.show() did not finish after 3.00 seconds.
This may mean a number of things:
- This evaluation is really slow and this is expected.
- The evaluation may need other threads running while it's running.
- The evaluation is deadlocked.

At first glance, this message looks like an error in Spark itself, but in reality it is raised by the Python debugger (pydevd). The debugger expects quick evaluations when you inspect variables, and if the operation takes longer than 3 seconds, it triggers this warning.

Why does this happen?

There are several common scenarios:

  1. Lazy evaluation in Spark
    The .show() action triggers a job execution. If your DataFrame is large, this can take several seconds or more.
  2. Heavy scans
    If the DataFrame comes from a source like S3, Hive, or Iceberg, Spark may need to scan many files, especially if no filters or partition pruning are applied.
  3. Spark initialization overhead
    The first action (show, collect, etc.) often triggers session initialization, job planning, and executor startup.
  4. Debugger thread interference
    The IDE’s debugger pauses execution and monitors threads, which can block or slow down Spark tasks.

Solutions

1. Increase the debugger timeout

You can configure the environment variable PYDEVD_WARN_EVALUATION_TIMEOUT to give Spark more time:

export PYDEVD_WARN_EVALUATION_TIMEOUT=10

2. Work with smaller samples

Instead of running .show() directly on the entire DataFrame, restrict it:

raw_df.limit(5).collect()

3. Apply partition filters

If your dataset is partitioned (e.g., by event_date), filter before calling .show():

raw_df.filter("event_date >= '2025-01-01'").show(5)

Best Practices


Conclusion

The warning “Evaluating: raw_df.show() did not finish after 3.00 seconds” is not an error in Spark itself, but a side-effect of how the debugger evaluates expressions. By tuning timeouts, sampling data, and using partition filters, you can make your development workflow smoother and avoid confusion during debugging.


Share this post:

Previous Post
Choosing Between saveAsTable and Iceberg’s writeTo in AWS Glue and Athena
Next Post
Incremental Data Loads: Choosing Between resource_version and created_at/updated_at