kurdapyo does data

kurdapyo does data

kurdapyo does data

K
kurdapyo does data
10 posts
My First Bad Databricks Execution Plan
Lately we’ve been tuning and refactoring several Databricks workflows. Nothing radical — mostly sensible engineering improvements: replacing count-based existence checks with is_empty swapping singl
May 29, 20265 min read
Databricks Antipatterns: Two Hard Lessons from the Field
Experience is a great teacher. After several years working with Databricks, I’ve noticed patterns that made perfect sense at the start but eventually became costly. At small scale they work fine — we
May 15, 20266 min read
Kurdapyo's Newbie Guide to PySpark
Lately, I've been working for an enterprise customer with hundreds of data engineers in a chapter. They're new in their Databricks journey, and their language of choice is SQL with a sprinkling of ora
May 5, 20266 min read
Counting when it really counts
A seasoned data engineer is deliberate on their use of expensive operations such as counts. It’s fairly common to see a misuse of this. Such as using a count to check if a query returns some rows. Or running a count query to figure out the number of ...
Nov 3, 20254 min read
From For Loops to For Each
Optimizing Databricks Workflows with For Each Task Type
Sep 30, 20253 min read
Databricks Environment Splits
A Practical Approach to Planning Databricks Environments
Aug 5, 20254 min read
What's with a name (in Databricks)
Naming conventions are often a contentious topic among architects. In Databricks, these debates become even more pronounced due to the extensive platform work required to establish a Databricks implementation. In an AWS-based Databricks setup, naming...
Jul 11, 20253 min read
IAC Face-Off: Databricks Asset Bundles vs Terraform
Data Engineers often have to juggle both data processing and DevOps pipelines. I’ve worked with many skilled data professionals, but they tend to shy away from the DevOps side of things. Often, I get asked to take over this aspect due to my backgroun...
Jul 6, 20253 min read
Fivetran Schema Change Handling
Schema changes can be a significant challenge for many ETL pipelines. In the past, changes were more manageable because systems operated independently. However, with the rise of SaaS products, out-of-the-box solutions, and frequent release cycles, ch...
Jul 4, 20253 min read