In early 2015, Uber Engineering migrated its business entities from integer identifiers to UUID identifiers as part of an initiative towards using multiple active data centers.
To achieve this, our Data Warehouse team was tasked with identifying every foreign-key relationship between every table in the data warehouse to backfill all the ID columns with corresponding UUIDs.¹
Given the decentralized ownership of our tables, this was not a simple endeavor. The most promising solution was to crowdsource the information by scraping all the SQL queries submitted to the warehouse and observing which columns were joined together. To serve this need, we built and open sourced Queryparser, our tool for parsing and analyzing SQL queries.
Fascinating. So many users, so many tables, so many queries that it was infeasible to do standard data discovery. Very cool technology solution to a common problem. Wonder what other cool applications there are for this tool…