Have you ever visited /r/The_Donald/? If not, you should take just a minute to do so before reading this. Try not to fall down the black hole.
In this post, 538 analyzed the comments on thousands of subreddits and then used an algorithm to “add” and “subtract” the various communities from one another. The results are compelling.
I’m really genuinely impressed with the work that 538 put into this article: the analysis is sophisticated, the visualizations are high quality, and the storytelling is compelling. They’re setting a very high bar for what good data journalism looks like.
A joint launch between OpenAI, Google Brain, and YCombinator, Distill aims to provide a better mechanism for disseminating research on ML. From the Google announcement:
Science isn’t just about discovering new results. It’s also about human understanding. Scientists need to develop notations, analogies, visualizations, and explanations of ideas. This human dimension of science isn’t a minor side project. It’s deeply tied to the heart of science.
That’s why, in collaboration with OpenAI, DeepMind, YC Research, and others, we’re excited to announce the launch of Distill, a new open science journal and ecosystem supporting human understanding of machine learning. Distill is an independent organization, dedicated to fostering a new segment of the research community.
If you’ve ever read an ML paper, you know it’s not a great experience. I’m excited to see how much traction Distill gets.
Data scientists know how to call libraries but frequently don’t go as deep in important software engineering skills like designing modular code, managing projects with git, and contributing to open source repos. This post focuses on how to write good code, collaboratively, within an ecosystem.
Ever wondered how a truly data-driven organization functions? This interactive website takes you on a tour through the entire operations of Stitch Fix and explains how data impacts every part of their org. I’ve never seen a company put together something quite like this before—unique and fascinating.
datashader makes points and pixels first class entities in the graphics rendering pipeline. It admits they exist (many plotting systems render to an imaginary infinite resolution abstract plane) and allows the user to specify scale dependent calculations and re-calculations over them.
This paper aims to layout the current state of Customer Lifetime Value calculation research. It is entirely practical, so mathematical descriptions will only be discussed where they are important from a practical perspective. It also aims to provide both code and spreadsheets to allow for usage of the models discussed.
This is the single best reference on calculating customer lifetime value I’ve ever seen. Bookmark this—you’ll need it at some point.
One academic’s argument that a purely theoretical mathematics undergraduate and graduate education is even more relevant now than it has ever been despite the lack of funding for traditional academic careers.
Great post. I can’t imagine a more valuable undergrad degree today.
We just recently used Markov chains to do some marketing attribution for a client—the approach was surprisingly straightforward and the results were compelling. If you’ve never used Markov chains, this is a great resource.
Carefully curated by The Data Science Roundup with Revue.
If you were forwarded this newsletter and you like it, you can subscribe here.
If you don't want these updates anymore, please unsubscribe here.