Something I don’t hear talked about enough in the data space is delivering work in a timely way. Software engineers spend a tremendous amount of time attempting to predict and control delivery timelines; why don’t data professionals do the same? There seems to be consensus that predicting the time required of a data project is inherently impossible because it’s fundamentally an act of discovery.
I think this is false. I’ve personally delivered hundreds of analytics sprints, and I’ve trained others to do the same. We deliver almost all of these sprints (95%+) on time. I think that setting clear timelines for the delivery of analytics work is critical in building the data team into a trusted advisor for business stakeholders, and in this post I share our thinking on how we do it. The core of our approach is:
Eliminate as much uncertainty as possible before writing stories.
This is not rocket science—you can easily apply it in your org.
We are happy to announce the open source release of Waltz. (…) Waltz is what we describe as a write-ahead log. This recorded log is neither the output of a change-data-capture from a database nor a secondary output from an application. It is the primary information of the system state transition. This is different from a typical transaction system built around a database system where the database is the source of truth. In the new model, the log is the source of truth (the primary information), and the database is derived from the log (the secondary information).
This is similar to Kafka in concept, but Waltz provides different guarantees and can therefore be used for different applications:
Waltz is similar to existing log systems like Kafka in that it accepts / persists / propagates transaction data produced / consumed by many services. However, unlike other systems, Waltz provides a machinery that facilitates a serializable consistency in distributed applications.
This podcast was started last year but I’m just now coming across it. The host is a research scientist @ MIT, and every episode features a guest you’ll likely have heard of (I’m a bit surprised just how consistently famous these folks are!). It’s queued up for my commutes for the coming week, give it a try and let me know how you like it!
At Coursera, we’ve built data products whose missions range from facilitating better content discovery to scaling learner interventions to benchmarking learners’ performance of various skills. Each data product is a collaboration among product leaders, business leaders, data scientists, and engineers. Effective data products need effective collaborations between data scientists and engineers.
I think this is one of the hottest areas in all of data: getting the various members of the data team (analysts, engineers, and scientists) collaborating together. And this is one of the best posts I’ve seen at encouraging that collaboration.
Most young Data Scientists expect to spend most of the time tinkering with and building fancy ML models or presenting ground-breaking business insights with colorful visualizations. Sure, these are still part of the job.
But, as enterprises got more educated, they focus more on real operation values. This means enterpriseswant to deploy more ML systems; they care less about how many new models or fancy dashboards they have. As a result, Data Scientists are asked to do the non-ML works. This drives boringness.
I agree with this. I would expand on a bit, though, to say that all jobs will seem boring if you expected you’d only have to do the fun parts. There’s plenty drudgery in even the best jobs, and I’m saying that as someone with a pretty cool one ;)
We’re not doing junior data scientists any credit if we create unrealistic expectations for them (and there is now an entire commercial industry incentivized to do just that).