AI-Enhanced Bluesky Bot with Multi-Modal GPT-4o

🤖 🧠 💻 A multi-modal AI bot for Bluesky social media 👨‍💻 🚴‍♂️ 🏙️

The Project

Monday-Tuesday, I build and deploy a computer vision (“AI”) powered Bluesky bot, along with a web control panel. Today I shut it down, because I’m told its replies are indistinguishable from a human user—which is annoying people. To be clear: I never type any replies myself. I just push a button, and the bot (powered by GPT-4o) generates and posts all the responses automatically.

The bot uses a multi-modal OpenAI model (4o) to analyze image posts and generate replies based on the actual image content. I give it a “personality” (Seattle-based cyclist) and infuse it with my own “values” (accessible urban infrastructure advocacy) via prompt engineering. I am impressed when it reads the slogan off a lady’s shirt in a photo and works that into a comedic reply—all without any manual input from me.

The idea is to spur conversation around topics I genuinely care about.

GitHub Repository: bluesky-image-reply-bot

The Reality Check

Ultimately, another user reads the profile description (explicitly says “bot account”) and tells me what I need to hear: He doesn’t like it one bit. Social media platforms like Bluesky don’t need more AI slop.

Honestly, I know this in my heart of hearts. Nobody wants headless bots leaving dad jokes all over their feed, thematically relevant or not.

The Learning Experience

That said, I have great fun learning and experimenting with:

Agentic programming in the Cursor IDE
The OpenAI API + sending it hybrid text-image prompts
Bluesky’s API (and the AT protocol)
Python/Flask (for the API and GUI)

Moving Forward

These are powerful tools, so I want my next project to be more genuinely useful to real folks in my actual community. Well, what I really want is a job leveraging these sorts of tools. But you’ll see me doing these hackathon-y portfolio projects until then!