This is the single best post I’ve read on Redshift optimization. It’s written by the co-founder of Intermix, a company that specializes in (you guessed it) helping companies optimize their Redshift performance. It covers everything from table design to workload management config to query optimization. Very comprehensive advice you won’t find elsewhere.
My only critique of this post is that it doesn’t mention my favorite trick for optimizing Redshift performance: migrating to Snowflake! 😉
Last week I learned about an interesting JupyterCon talk given by Joel Grus titled “I Don’t Like Notebooks”.
And thus begins the “2018 Notebook War”. Hillary Parker and Roger Peng both weighed in on Twitter, and Hadley Wickham called it “the spaces vs tabs of data science.” Touchè.
Notebooks have in turn been lionized as the next great scientific communication revolution and been decried as a shit way to write code. Both are probably true to some extent. Netflix just wrote about their extensive infrastructure to support productionizing notebooks. Is this a good idea? The industry is still deciding on how data analysis work should be conducted and it’s fascinating to watch this conversation play out.
(…) the optimal pattern for collaboration relies on architecting and building systems where I (and the other data folks on my team) can write and deploy code / scripts without: a) needing to get that code approved by software engineers, b) having to deal with hosting or networking concerns, c) having to interface with non-familiar languages and paradigms.
Another great post by Michael Kaminsky. I absolutely 100% agree with his viewpoint: engineers build frameworks, analysts write code that runs in a framework and implements business logic. This allows both to do what they are best at, have direct knowledge of, and most incentivized to do.
The catch? Building good frameworks is hard. Expect this pattern to continue to be deployed as frameworks get built, generalized to work across environments, and open sourced.
Or: “The Amazon Echo as an anatomical map of human labor, data and planetary resources.”
Just wow. Here’s how the authors set the context:
[in fulfilling a single Alexa request], a vast matrix of capacities is invoked: interlaced chains of resource extraction, human labor and algorithmic processing across networks of mining, logistics, distribution, prediction and optimization. The scale of this system is almost beyond human imagining. How can we begin to see it, to grasp its immensity and complexity as a connected form?
This one-of-a-kind microsite is perspective-changing. Click through to see what I mean.
Building effective machine learning (ML) systems means asking a lot of questions. It’s not enough to train a model and walk away. Instead, good practitioners act as detectives, probing to understand their model better: How would changes to a datapoint affect my model’s prediction? Does it perform differently for various groups–for example, historically marginalized people? How diverse is the dataset I am testing my model on?
Today, we are launching the What-If Tool, a new feature of the open-source TensorBoard web application, which let users analyze an ML model without writing code. Given pointers to a TensorFlow model and a dataset, the What-If Tool offers an interactive visual interface for exploring model results.
This problem has gotten a lot of attention in recent years; it’s great to see Google investing resources in exploring solutions.
The new Weather Channel storm surge visualization is really something—it’s a fascinating use case of how visualization can help people understand things that are outside their experience. A bar graph just wouldn’t have communicated the impact of the rising water.