The popular job site Glassdoor published a list of 50 Best Jobs in America, and Data Scientist is again the no. 1 job in the US, with a job score 4.8 out of 5, $110,000 median base salary, and 4,000 job openings. What’s more, 5 of the top 10 US jobs are related to analytics, data engineering, and data science.
To those of us actively working in the space, this isn’t surprising: it’s clear that data is an amazing place to build a career right now. To those of you waiting to make the jump, there’s never been a better time.
I spend my days largely doing descriptive statistics. Sometimes it’s really important to build a predictive model, but frequently what your data consumers actually need is some really well-though-out descriptive statistics. Which is why I love this post. Here’s the intro:
Statistics professors tend to gloss over basic descriptive statistics because they want to spend as much time as possible on margins of error and t-tests and regression. Fair enough, but the result is that it’s easier to find a machine learning expert than someone who can talk about numbers. Forget what you think you know about descriptives and let me give you a whirlwind tour of the real stuff.
This is an excellent article. It actually has very little to do with devops, rather, it talks about the challenges of integrating a data science team effectively into a larger organization. Here is a wonderful observation, that after you read it will seem obvious:
…[data science] needs to be deeply integrated into the business processes in order to be effective as a decision making system. This is by far the biggest source for the troubles created by data science efforts. In order to successfully integrate data science, one needs to transform and modify the core business processes, which is a difficult task.
This is a must-read for both data scientists and any manager who interacts with a data science team.
With machine learning successfully applied to new daunting problems almost every day, general AI starts looking like an attainable goal. However, most current research focuses instead on important but narrow applications, such as image classification or machine translation. We believe this to be largely due to the lack of objective ways to measure progress towards broad machine intelligence. In order to fill this gap, we propose here a set of concrete desiderata for general AI, together with a platform to test machines on how well they satisfy such desiderata, while keeping all further complexities to a minimum.
From the paper, their four desiderata:
Communication through natural language
Learning to learn
This paper is very readable / scannable. If you are at all interested in the topic of general-purpose AI, this is a must-read.
The SQL join operation is one of the most powerful and commonly used SQL operations, but little attention is paid to how the internal SQL engine breaks down the tasks of join operations.
I’ll be the first to admit that this article is really quite boring (it got literally 0 recommends on Feedly), but let me just say that really, truly understanding the query planner and knowing how to read an explain plan are just so unbelievably important in being day-to-day effective. Invest the time to understand this stuff. This post is the best resource I’ve found on the topic.
Julia first appeared in 2012 and has since become popular in academic environments. While its inclusion into the Jupyter project in 2014 (it’s the Ju- in Jupyter) marked a significant increase in awareness and adoption, Julia still isn’t particularly common in commercial environments.
This post, written by one of the co-creators of the language, makes the case that core elements of Julia’s design make it a superior choice for performance-intensive numerical computing applications. Read this post, and put Julia on your list of things to play around with. It’s maturing quickly.