The dbt community is growing quickly (109 companies as of today!), so I took some time to write up a post targeted at new users. If you’re not familiar with dbt, I hope this post will provide a good intro.
dbt is the T in ETL. It doesn’t extract or load data, but it’s extremely good at transforming data that’s already loaded into your warehouse. This “transform after load” architecture is becoming known as ELT.
This is an atypically thoughtful benchmark from Fivetran founder George Fraser. He found that that the performance of each warehouse under most common scenarios is very similar:
These three warehouses all have excellent price and performance. We shouldn’t be surprised that they are similar: the basic techniques for making a fast columnar data warehouse have been well-known since the C-Store paper was published in 2005. These three data warehouses undoubtedly use the standard performance tricks: columnar storage, cost-based query planning, pipelined execution, and just-in-time compilation. We should be skeptical of any benchmark claiming that one of these warehouses is more than 2x faster than another.
This is insightful. This technology is approaching maturity, and as such, we should be less concerned with performance and more focused on user experience (query dialect, maintenance, ecosystem, etc).
Data scientists are becoming popular within software teams, e.g., Facebook, LinkedIn and Microsoft are creating a new career path for data scientists. In this paper, we present a large-scale survey with 793 professional data scientists at Microsoft to understand their educational background, problem topics that they work on, tool usages, and activities.
Great paper. The most interesting part to me was the nine clusters of data scientists the researchers identified. Worth thinking about where you fit in.
The article presents two books that make Calculus and Linear Algebra accessible.
This author’s story is so common: learned math, had direct application for it, forgot it. While you can copy-paste your way through some data science without understanding the math you’re relying on, solidifying your math fundamentals is critical to taking the next step.
At Fishtown Analytics, we work with venture-funded startups to implement Redshift, Snowflake, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.