My First Bad Databricks Execution Plan
Lately we’ve been tuning and refactoring several Databricks workflows. Nothing radical — mostly sensible engineering improvements: replacing count-based existence checks with is_empty swapping singl
Search for a command to run...
Lately we’ve been tuning and refactoring several Databricks workflows. Nothing radical — mostly sensible engineering improvements: replacing count-based existence checks with is_empty swapping singl
Experience is a great teacher. After several years working with Databricks, I’ve noticed patterns that made perfect sense at the start but eventually became costly. At small scale they work fine — we
Lately, I've been working for an enterprise customer with hundreds of data engineers in a chapter. They're new in their Databricks journey, and their language of choice is SQL with a sprinkling of ora
A seasoned data engineer is deliberate on their use of expensive operations such as counts. It’s fairly common to see a misuse of this. Such as using a count to check if a query returns some rows. Or running a count query to figure out the number of ...
Optimizing Databricks Workflows with For Each Task Type
A Practical Approach to Planning Databricks Environments
Naming conventions are often a contentious topic among architects. In Databricks, these debates become even more pronounced due to the extensive platform work required to establish a Databricks implementation. In an AWS-based Databricks setup, naming...
Data Engineers often have to juggle both data processing and DevOps pipelines. I’ve worked with many skilled data professionals, but they tend to shy away from the DevOps side of things. Often, I get asked to take over this aspect due to my backgroun...
Schema changes can be a significant challenge for many ETL pipelines. In the past, changes were more manageable because systems operated independently. However, with the rise of SaaS products, out-of-the-box solutions, and frequent release cycles, ch...