Building a WhatsApp Food Bot with Qwen on Alibaba Cloud
My journey using Qwen for chat, vision, and embeddings — all behind one API key

Am a software developer with lots of experience in designing and building high-quality, scalable web applications, I am confident in my ability to contribute to a team and deliver projects on time and on budget. My skills in problem-solving and my passion for writing clean, efficient code have allowed me to thrive in fast-paced environments. I am currently seeking employment opportunities and am open to discussing potential roles.
A few months ago I set out to build something that bugged me about every food app I'd ever used: they hand you a menu and leave you guessing. Will this fit my fitness goals? Is this too many calories? What's actually good here? So I built Foodie Robot — a WhatsApp bot that recommends meals tailored to your fitness goals and lets you order them directly. No new app. No new account. You just text it like a friend who happens to know every restaurant in town.
The brain behind it is Qwen, Alibaba Cloud's family of models, served through Model Studio (DashScope). What surprised me most was how far a single API key went. Here's the journey.
Why Qwen
I didn't want to stitch together three different providers — one for chat, one for vision, one for embeddings. Qwen covers all three, and the DashScope endpoint is OpenAI-compatible, which meant I could use the official openai Python SDK and just point it at a different base URL:
from openai import OpenAI
client = OpenAI(
api_key=settings.AI_API_KEY,
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
That one detail saved me a lot of time. Any tutorial, snippet, or library written for the OpenAI SDK basically just works.
The three jobs I gave Qwen
1. Chat + tool calling — qwen-max
The bot doesn't follow a script. When you message it, qwen-max decides which action to run — recommend a meal, save your location, place an order, check your balance — using tool calling. I describe each tool, the model picks the right one and fills in the arguments, and my backend does the rest.
This was the single biggest "aha" of the project: tool calling beats menus. Instead of building rigid button trees, I let the model interpret free-form language and map it to real actions. The bot feels like a conversation, not a form.
2. Vision — qwen-vl-max
Send the bot a photo of a meal and it estimates the calories and nutrition. qwen-vl-max reads the image and returns a structured breakdown.
One thing to know: the vision model fetches image URLs server-side, so your images have to live somewhere reliably reachable. I host meal photos on a CDN (Cloudinary) so the model can always pull them.
3. Embeddings — text-embedding-v4
This is what makes recommendations feel smart. Every meal becomes a vector, and so does each user request. I rank meals by cosine similarity:
$$\text{sim}(\mathbf{q}, \mathbf{m}) = \frac{\mathbf{q} \cdot \mathbf{m}}{\lVert \mathbf{q} \rVert , \lVert \mathbf{m} \rVert}$$
So "Jollof Rice" and "Party Rice" land close together even though the words differ. Keyword search could never. To keep costs down, I compute each meal's embedding once and cache it, only recomputing when the meal actually changes (tracked with a content hash).
Challenges I ran into
Structured JSON from vision. Getting clean, schema-shaped JSON out of the image analysis took prompt tuning plus a forgiving parser that strips code fences and stray text — so one odd response never breaks the flow.
Embedding dimensions. My database assumed 1536-dim vectors, so I pinned the embedding size to keep everything consistent.
Keeping the bill small. Caching embeddings and filtering the tool list before each call kept token usage low without hurting quality.
What I'd tell someone starting out
Use the OpenAI-compatible endpoint. You get the whole OpenAI ecosystem for free and only change the base URL.
Lean on tool calling. If your app has actions, let the model orchestrate them. It's less code and a far better UX than menus.
Cache your embeddings. They rarely change — compute once, store, and only refresh on content changes.
One key, three modalities. Chat, vision, and embeddings under one Qwen account kept my stack refreshingly simple.
What's next
Foodie Robot is live in Lagos, Nigeria today. Next up: scaling to more regions, smarter personalization as taste signals accumulate, and multi-language support so the bot can chat the way each user actually talks.
Qwen turned out to be a genuinely capable, all-in-one backbone for an AI product — and the OpenAI compatibility meant I spent my time building features, not fighting SDKs. If you've been curious about Alibaba Cloud's models, this was a great place to start.
Thanks for reading! If you build something with Qwen, I'd love to hear about it.


