Data Science Roundup #47 - Text analysis, tabular design, and a massive Airflow Tutorial!

Hi! Welcome to the redesigned Data Science Roundup! I would love to hear your feedback on the new for
Data Science Roundup #47 - Text analysis, tabular design, and a massive Airflow Tutorial!
By The Data Science Roundup • Issue #47
Hi! Welcome to the redesigned Data Science Roundup! I would love to hear your feedback on the new format; I’ve switched to using a product called Revue, which seems pretty awesome so far.
- Tristan

This week's best data science articles
This is a strong piece of data journalism and includes all of the R code necessary to replicate the results. My favorite line: “I’d rather get inside the head of this anonymous staffer, whose job is to imitate Trump’s unique cadence.” (Can you imagine having that job?) Highly recommended to aspiring data journalists—someone should publish a follow-on article in a month!
Design Better Data Tables
We’ve all seen poor visual design of tables: left-aligned numbers? Tons of useless formatting? There’s a lot that goes into making tabular data easy to consume, and with all the attention that goes into data viz today, the UI of tabular data often gets overlooked. No longer.
I’m deeply interested in how to run effective data science projects. I’ve written in the past about the workflow problem that data scientists and analysts have today, and this podcast goes deeper into the project methodology component. The guest recommends an Agile approach, focusing on minimizing the cycle time between questions and answers.
Clustering R packages based on Github Data in Google BigQuery
Still haven’t played with BigQuery? Now’s your chance. This post contains a detailed walkthrough on analyzing data on R in R, using BigQuery to churn through the massive amounts of raw R code in Github. Just be careful to select from the data subset they provide or you’ll find yourself querying more than a terabyte of data and racking up charges fast :)
Holy crap. I’ve linked to Mark’s stuff before, and this article doesn’t disappoint. In it, he walks through the complete process of setting up Airflow (now an Apache project) using a simple example of grabbing foreign exchange rates from an API, storing them in Postgres, and then caching them in Redis. It’s not simple to get Airflow up and running, but this article gives you everything you need.
Getting into Data Science: A Guide for Students and Parents
There are so many posts focused on “getting into data science”, but most of them are focused on mid-career folks looking to acquire new skills. This is the first guide I’ve come across that answers the question from the perspective of a student (or the parents of that student). It’s a good start, but there is a lot more thinking that needs to be done in this area: the data scientists of the future will be using these tools and mental models from a young age.
Data viz of the week
Immediate visual impact: Canada is a big fan of US oil.
Immediate visual impact: Canada is a big fan of US oil.
Thanks to our sponsors!
Fishtown Analytics is a boutique analytics consultancy serving venture-funded startups. We partner with CEOs and senior execs to implement advanced analytics.
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
Did you enjoy this issue?
The Data Science Roundup
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
Carefully curated by The Data Science Roundup with Revue. If you were forwarded this newsletter and you like it, you can subscribe here. If you don't want these updates anymore, please unsubscribe here.