Data Science Roundup #48: Databases, Scalability, and Shipping Routes

This week is a deep dive into databases, programming, and scalability. Plus, a useful color palette t
Data Science Roundup #48: Databases, Scalability, and Shipping Routes
By The Data Science Roundup • Issue #48
This week is a deep dive into databases, programming, and scalability. Plus, a useful color palette tool and an impressive visualization of global shipping. Enjoy!

This week's best data science articles
This is a seriously intense discussion of the implementation details of databases, breaking down the functionality provided by both the query engine and storage engine. While much of this goes far beyond what you need to know to do your job, it’s incredibly helpful to understand what’s going on under the hood in a database. This is a long post, but well worth it.
With R, you’ll immediately run into problems when your dataset size exceeds the memory size on your local machine. At that point, you have three options: scale up, scale out, or using R as an abstraction layer. This post walks you through the decision.
Want to incorporate content filtering in your new app? Now it’s easy. Google’s Cloud Vision API can detect inappropriate content in images using the same machine learning models that power Google SafeSearch. Another very hard problem solved and packaged up as an API.
If you use Python, you almost definitely use pip. But did you know the history behind Python package management? As someone who’s joined the Python community relatively recently, this was an interesting history lesson to me. To others, it’ll be a fun walk down memory lane :)
Ever run into a package that exists in R but not in Python? Or vice versa? This article goes through a package development technique that is becoming increasingly common for major packages: write the underlying implementation in C and then develop language bindings in both Python and R. This article won’t make you a C developer, but it’s a useful technique to understand.
This tool is an easy way to come up with color palettes for non-designers. It gives you a ton of configuration options and outputs color values. Easy. Now please, stop making all of your charts in the same colors that Excel chose as defaults in 1997.
Data viz of the week
Click on the image for an interactive world shipping map. Very impressive.
Thanks to our sponsors!
Fishtown Analytics is a boutique analytics consultancy serving high-growth, venture-funded startups. Have analytics questions? Let’s chat.
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
Did you enjoy this issue?
The Data Science Roundup
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
Carefully curated by The Data Science Roundup with Revue. If you were forwarded this newsletter and you like it, you can subscribe here. If you don't want these updates anymore, please unsubscribe here.