Versatile Flow | Blog
ArticlesPortfolioTagsCatégoriesÀ propos|
Versatile Flow | Blog
ArticlesPortfolioTagsCatégoriesÀ propos

Data Engineer Resources

Data Engineer

Best Practice

  • Data Contracts : https://medium.com/gocardless-tech/data-contracts-at-gocardless-6-months-on-bbf24a37206e

BigQuery

  • Schema Evolution : https://indatawetrust.blog/2020/02/02/automated-schema-evolution-for-bigquery/

Dataflow

  • Different Types of Joins : https://spotify.github.io/scio/examples/JoinExamples.scala.html
  • Use case patterns : https://cloud.google.com/blog/products/data-analytics/guide-to-common-cloud-dataflow-use-case-patterns-part-1

Spark

  • Spark tips : DataFrame API https://luminousmen.com/post/spark-tips-dataframe-api
  • Spark Tips : Don’t collect data on driver https://luminousmen.com/post/spark-tips-dont-collect-data-on-driver
  • Spark Tips : Partition Tuning https://luminousmen.com/post/spark-tips-partition-tuning
  • Spark tips : Caching https://luminousmen.com/post/spark-tips-caching

Benchmark

  • BigQuery, Spark, Dataflow : https://medium.com/cts-technologies/bigquery-spark-or-dataflow-a-story-of-speed-and-other-comparisons-fb1b8fea3619

Data Warehouse

  • Best Practice Medallion Architecture : https://piethein.medium.com/medallion-architecture-best-practices-for-managing-bronze-silver-and-gold-486de7c90055
  • Normalization vs De-normalization : https://www.linkedin.com/pulse/normalization-vs-denormalization-rohit-prasad
Versatile Flow - All Rights Reserved
2019 - 2025 Anthony SSI YAN KAI