Introducing KSQL, a streaming SQL engine for Apache Kafka. KSQL provides a simple and completely interactive SQL interface for processing data in Kafka.
This is incredibly cool. SQL-based analytics lives in a batch-based world today. Moving towards SQL syntax for querying real-time data streams is a major development. If you could write SQL queries on data and get answers with < 1s latency, that would unlock a completely new set of capabilities for data analysts and scientists.
It remains to be seen exactly what the performance characteristics are of KSQL are, but this worth paying attention to.
This post got forwarded around internally at Fishtown Analytics this past week; it goes through a SQL design pattern that none of us had ever considered before. I love that after spending thousands and thousands of hours writing SQL, there are still plenty of unexplored ideas.
The NYTimes made a very impressive visualization of Harvey this past week. This post goes through the process, including a dead end or two, that the author went through to get to the final version. Quite a lot of work went into making the image and on a tight time horizon.
Over the last three years, Storybench, a website from Northeastern University’s School of Journalism’s Media Innovation graduate program, has interviewed 72 data journalists, web developers, interactive graphics editors, and project managers from around the world to provide an “under the hood” look at the ingredients and best practices that go into today’s most compelling digital storytelling projects.
Michelangelo enables internal teams to seamlessly build, deploy, and operate machine learning solutions at Uber’s scale. It is designed to cover the end-to-end ML workflow: manage data, train, evaluate, and deploy models, make predictions, and monitor predictions. The system also supports traditional ML models, time series forecasting, and deep learning.
Cool piece of data journalism. Languages focused on web and mobile development are relatively overrepresented in developing economies and languages focused on data processing are overrepresented in wealthy economies.
The government [is] extremely fond of amassing great quantities of statistics. These are raised to the nth degree, the cube roots are extracted, and the results are arranged into elaborate and impressive displays. What must be kept ever in mind, however, is that in every case, the figures are first put down by a village watchman, and he puts down anything he damn well pleases. - Sir Josiah Charles Stamp, 1880-1941
An excellent reminder to treat all data with skepticism.
At Fishtown Analytics, we work with venture-funded startups to implement Redshift, Snowflake, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.