Spotify MLOps Meetup Notes
·2 mins
On 2023-06-22 I attended the NYC MLOps Community meetup at Spotify. Here are some photos and notes from the talks:
David Xia: How to Build a Frictionless ML Platform
- Spotify uses ray.
- End-user teams manage their own ray clusters on multi-tenant infra.
- Hendrix is a (beta) internal spotify tool to save folks from needing to learn K8s.
- How to solve local development problems?
- Cloud developer environment (CDE).
- They built something like GitHub codespaces, but with Ray and GPUs
- VSCode in the browser: Works the same on any device
- Everything runs as workloads on K8s
- They use Istio to get routing done right
- Lessons Learned:
- Must be HA: Don’t have a reverse proxy that is a SPOF (single point of failure)
- Needs to be customizable and extensible
- Needs telemetry to show that the CDE actually makes people more productive
- Use K8s etcd as your DB
- Use K8s operator with CRD, it’s neat
Ryan Culbertson: Near Real-Time Features w/ Jukebox NRT
- Ryan:
- Spotify senior engineer
- ML Infra
- Feature Mgmt tooling
- Ideally, the Spotify app makes personalized recommendations right after the user does stuff
- Therefore, near real-time is desirable
- Cold-start problem is particularly challenging
- Near Real-Time (NRT) is just, like, streaming data that gets processed in minutes not seconds
- (Spotify runs all its stuff on Google Cloud)
- Scale: Operating on the order of like 3M messages/sec, high-cardinality too
- Their NRT tool Jukebox uses Flink, Bigtable
- Idea is to use SQL for experimentation and also prod
- Jukebox operates in 5-min windows of aggregation
- Small window: Fresh data
- Big window: Cheap data
- Had to find some arbitrary sweet spot
- 5-minute window aggregations get re-aggregated on reads later
- Some teams might want a 1-hour window, some might want 6-hour, etc.
- Lessons Learned:
- Flink integration w/ GCloud needed custom connectors..! Probably smoother on the more mature Kafka
- Instrument everything to help find bottlenecks (in multi-thread stuff, e.g.)