It’s good to see some awesome data content coming out of Squarespace. This post outlines how the data engineering team has been able to scale access control across their data warehouse in an environment with hundreds of users.
This problem—ensuring that the right people have access to the right data—can be surprisingly tricky. Often, organizations simply throw up their hands and give everyone superuser access, but this is not a good answer. Especially as compliance and data governance grow ever-more-important, this is stuff you need to care about. Do all users at your company need to see customer email addresses? It only takes one UNLOAD command by one user to create a lot of pain.
Word on the street is that the team at Squarespace that built this tool are close to releasing it as open source. I’ll be sure to include a link here if and when they do.
Updates from Google Research: they’ve successfully used an evolutionary algorithm to beat out reinforcement learning-based approaches to autoML. The most fascinating result (to me) is that the evolutionary approach is not simply higher performance—it requires far less computation to arrive there (graph on right, above).
We’ve been running Kubernetes for deep learning research for over two years. While our largest-scale workloads manage bare cloud VMs directly, Kubernetes provides a fast iteration cycle, reasonable scalability, and a lack of boilerplate which makes it ideal for most of our experiments. We now operate several Kubernetes clusters (some in the cloud and some on physical hardware), the largest of which we’ve pushed to over 2,500 nodes.
The data infrastructure at your org probably doesn’t come anywhere close to a 2,500-node Kubernetes cluster(!), but it’s fascinating to know how one of the most bleeding-edge AI research organizations in the world sets up their experimental environments. This stuff is hard.
We’re releasing a new batch of seven unsolved problems which have come up in the course of our research at OpenAI.
Sometimes the hardest thing in research is coming up with unsolved, but solvable, problems. Here are the problems that OpenAI is looking at right now; these are great indicators of what cutting edge looks like today. One of the hardest: regularization in reinforcement learning. 👍👍
In this blog post we describe a Listing Embedding technique we developed and deployed at Airbnb for the purpose of improving Similar Listing Recommendations and Real-Time Personalization in Search Ranking. The embeddings are vector representations of Airbnb homes learned from search sessions that allow us to measure similarities between listings. They effectively encode many listing features, such as location, price, listing type, architecture and listing style, all using only 32 float numbers. We believe that the embedding approach for personalization and recommendation is very powerful and useful for any type of online marketplace on the Web.
The approach is almost comically effective (the above pictures are of different locations!). Really interesting work, and detailed writeup.
At Fishtown Analytics, we work with venture-funded startups to implement Redshift, Snowflake, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.