I got pretty tired of reading “guides to deep learning” a while ago, but am always on the lookout for ones that bring something new to the table. This is the first rundown I’ve seen on the major advances in network architecture over the past couple of years. Very digestible, very interesting. Even if you’re not going to put this to work tomorrow, highly recommended.
In order to combat the high expense of collecting thousands of training images, image augmentation has been developed in order to generate training data from an existing dataset. Image Augmentation is the process of taking images that are already in a training dataset and manipulating them to create many altered versions of the same image. This both provides more images to train on, but can also help expose our classifier to a wider variety of lighting and coloring situations so as to make our classifier more robust.
This is the single best post I’ve seen on the topic of image pre-processing, an increasingly critical skill in a wide range of use cases. The writeup and code for histogram normalization (pictured above) was particularly cool.
Whether of not you work with image data today, this is a must-read.
Histograms are a way to summarize a numeric variable. They use counts to aggregate similar values together and show you the overall distribution. However, they can be sensitive to parameter choices! We’re going to take you step by step through the considerations with lots of data visualizations.
The OpenAI article I linked last week churned up quite a storm in the geek community, where overlap in interests between gaming and AI is high. Apparently several pros were able to beat the bot consistently within six hours of its release. Here’s a lengthy thread on Hacker News about the topic.
OpenAI’s accomplishment is still impressive, but its work in this type of real-time, collaborative, informationally-obscured environment is still very early.
Currently, the three primary cloud analytic database platforms (Redshift / Snowflake / BigQuery) use CPUs. Other data-intensive applications have made the switch to GPUs to take advantage of their superior parallel processing, but this change is only beginning in the world of analytic databases.Â
Several companies have begun to play in this space; my hope is that the tech gets incorporated into an offering from AWS or GCP. This represents real opportunity for a decrease in query response times.
My aim with this research is to allow me to quickly find the most relevant and appreciated articles [on AI], so that I can improve my knowledge about the subject, without having to read 100 articles to find the 4–5 of them that are interesting…
This is a solid piece of data journalism, an interesting new open data set to play with, and a great index of content to peruse if you’re just getting into the space. Chris Dixon’s posts, in particular, are foundational.
It’s always satisfying when you find the very best version of a thing. This entire site is the most informative presentation I’ve ever seen of global arms trade information. Click around—it’s worth it.
At Fishtown Analytics, we work with venture-funded startups to implement Redshift, Snowflake, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.