The ideal analytics tech stack, building election models and the inherent subjectivity of polls, the problems with pixellation, detecting logos with deep learning, and a mega-roundup of algorithms. Enjoy! 😂 😂 😂
Business intelligence tech has changed really dramatically over the past 3–4 years, and the most common question I get from folks in the industry is “What’s your analytics tech stack?” This post lays out my recommendations, from ETL to data warehousing to data modeling to analysis. There are surprisingly few people doing this right.
Election forecasting has become a big deal since Nate Silver’s success in 2008. Not to be outdone, many major publications now not only report on polling data, they have their own statisticians building proprietary election forecast models. This post walks through the details of how to implement such an election model, step by step. Fascinating.
This is the best article on polling I’ve ever read. From the article: “Pollsters usually make statistical adjustments to make sure that their sample represents the population. They usually do so by giving more weight to respondents from underrepresented groups.” Read this to learn what’s behind the polling numbers in the news.
KDNuggets recently did a poll where the asked “Which methods/algorithms you used in the past 12 months for an actual Data Science-related application?” The 844 respondents’ most often used algorithm? Regression, of course. Hard to beat it. This post is the followup article, where they walk through every top algorithm and provide amazing resources. Want to know more about PCA? Random forests? This is the post.
Pixellation is no longer an effective way to obscure visual information. Look at the image below, blurred with YouTube’s blur feature. Using deep learning, researchers were able to identify blurred faces like these at a shocking rate: “On an industry standard dataset where humans had 0.19% chance of identifying a face, the algorithm had 71% accuracy (or 83% if allowed to guess five times).” Wow.
What did you do for the final project in your data science boot camp? A recent Metis grad developed a DCNN that classified images scraped from Instagram into two classifications: those with, and those without, a Patagonia logo in them. This is an excellent walkthrough of the process the author went through, including links to resources like this TensorFlow transfer learning retraining script. Valuable.