Data Engineer
Best Practice
- Data Contracts : https://medium.com/gocardless-tech/data-contracts-at-gocardless-6-months-on-bbf24a37206e
BigQuery
Dataflow
- Different Types of Joins : https://spotify.github.io/scio/examples/JoinExamples.scala.html
- Use case patterns : https://cloud.google.com/blog/products/data-analytics/guide-to-common-cloud-dataflow-use-case-patterns-part-1
Spark
- Spark tips : DataFrame API https://luminousmen.com/post/spark-tips-dataframe-api
- Spark Tips : Don’t collect data on driver https://luminousmen.com/post/spark-tips-dont-collect-data-on-driver
- Spark Tips : Partition Tuning https://luminousmen.com/post/spark-tips-partition-tuning
- Spark tips : Caching https://luminousmen.com/post/spark-tips-caching
Benchmark
- BigQuery, Spark, Dataflow : https://medium.com/cts-technologies/bigquery-spark-or-dataflow-a-story-of-speed-and-other-comparisons-fb1b8fea3619
Data Warehouse
- Best Practice Medallion Architecture : https://piethein.medium.com/medallion-architecture-best-practices-for-managing-bronze-silver-and-gold-486de7c90055
- Normalization vs De-normalization : https://www.linkedin.com/pulse/normalization-vs-denormalization-rohit-prasad