🎬🎥💻 A computer vision (object detection) pet project 👨💻🍿🎤
A live demo should still be working if you’re reading this. It’s served by HuggingFace 🤗 Spaces. If it doesn’t work embedded below here, try clicking through the link.
Back in ~2016-2018 I used to spend many, many hours in a dark room in SoHo scrubbing through video files to see where the slate claps were, so that I could fix the audio sync. It got very boring, to the point where I quit to go learn how to code.
Why not have AI automate the boring stuff? Given enough labeled examples, a pre-trained image model should be able to spot slates, after some fine-tuning. I achieved a COCOmetric of like 88% with a few hundred images. It’s just a fine-tuned faster-rcnn model.
The natural next step would be to train it on two classes: “Open” and “closed” slates, so that you could find the inflection point where the slate goes from open to closed. That might save some dailies technicians a lot of time.
My data science mentor pointed me to IceVision, FastAI, and various TIMM models. I got set up with LabelStudio and Weights and Biases. I wrote a long LinkedIn post about how I accomplished the fine-tuning.
In case you didn’t already know, it can be incredibly frustrating to do data-intensive machine learning stuff on an Apple Silicone laptop. Rather than swapping out my local hardware, I took this opportunity to get really dirty with Ec2 spot instances and used Terraform to make it dead simple to spin up and tear down spot GPU instances.