Using FastAPI for an OpenAI chat backend
When building web APIs that make calls to OpenAI servers, we really want a backend that supports concurrency, so that it can handle a new user request while waiting for the OpenAI server response....
View ArticleDeveloper relations & motherhood: Will they blend?
My very first job out of college was in developer relations at Google, and it was absolutely perfect for me; a way to combine my love for programming with my interest in teaching. I got to code, write...
View ArticleEvaluating a RAG chat app: Approach, SDKs, and Tools
When we’re programming user-facing experiences, we want to feel confident that we’re creating a functional user experience - not a broken one! How do we do that? We write tests, like unit tests,...
View ArticleConverting HTML pages to PDFs with Playwright
In this post, I'll share a fairly easy way to convert HTML pages to PDF files using the Playwright E2E testing library.Background: I am working on a RAG chat app solution that has a PDF ingestion...
View ArticleRAG techniques: Cleaning user questions with an LLM
When I introduce app developers to the concept of RAG (Retrieval Augmented Generation), I often present a diagram like this:The app receives a user question, uses the user question to search a...
View ArticleRAG techniques: Function calling for more structured retrieval
Retrieval Augmented Generation (RAG) is a popular technique to get LLMs to provide answers that are grounded in a data source. When we use RAG, we use the user's question to search a knowledge base...
View ArticleEvaluating RAG chat apps: Can your app say "I don't know"?
In a recent blog post, I talked about the importance of evaluating the answer quality from any RAG-powered chat app, and I shared my ai-rag-chat-evaluator repo for running bulk evaluations.In that...
View ArticleTruncating conversation history for OpenAI chat completions
When I build chat applications using the OpenAI chat completions API, I often want to send a user's previous messages to the model so that the model has more context for a user's question. However,...
View ArticleDoing RAG? Vector search is *not* enough
I'm concerned by the number of times I've heard, "oh, we can do RAG with retriever X, here's the vector search query." Yes, your retriever for a RAG flow should definitely support vector search, since...
View ArticleRAG on a database table with PostgreSQL
RAG (Retrieval Augmented Generation) is one of the most promising uses for large language models. Instead of asking an LLM a question and hoping the answer lies somewhere in its weights, we instead...
View ArticleUsing SLMs in GitHub Codespaces
Today I went on a quest to figure out the best way to use SLMs (small language models) like Phi-3 in a GitHub Codespace, so that I can provide a browser-only way for anyone to start working with...
View Articlepgvector for Python developers
Lately, I've been digging into vector embeddings, since they're such an important part of the RAG (Retrieval Augmented Generation) pattern that we use in our most popular AI samples. I think that when...
View ArticleShould you use Quart or FastAPI for an AI app?
As I have discussed previously, it is very important to use an async framework when developing apps that make calls to generative AI APIs, so that your backend processes can concurrently handle other...
View ArticlePlaywright and Pytest parametrization for responsive E2E tests
I am a big fan of Playwright, a tool for end-to-end testing that was originally built for Node.JS but is also available in Python and other languages.Playwright 101For example, here's a simplified test...
View ArticleMaking an Ollama-compatible RAG app
Ollama is a tool that makes it easy to run small language models (SLMs) locally on your own machine - Mac, Windows, or Linux - regardless of whether you have a powerful GPU. It builds on top of...
View ArticleIntegrating vision into RAG applications
Retrieval Augmented Generation (RAG) is a popular technique to get LLMs to provide answers that are grounded in a data source. What do you do when your knowledge base includes images, like graphs or...
View ArticleMy parenting strategy: earn enough $ to outsource
Two kids are a lot. I know, its really not a lot in comparison to the many kids that women have had to birth and care for over the history of humanity. But still, it feels like a lot to me. My partner...
View ArticleEntity extraction using OpenAI structured outputs mode
The relatively new structured outputs mode from the OpenAI gpt-4o model makes it easy for us to define an object schema and get a response from the LLM that conforms to that schema.Here's the most...
View ArticleMy first PyBay: Playing improv with Python
A few months ago in September, I attended my very first PyBay: an annual conference in San Francisco bringing together Pythonistas from across the bay area. It was a 2-track single-day conference, with...
View ArticleMaking a dev container with multiple data services
A dev container is a specification that describes how to open up a project in VS Code, GitHub Codespaces, or any other IDE supporting dev containers, in a consistent and repeatable manner. It builds on...
View ArticleRunning Azurite inside a Dev Container
I recently worked on an improvement to the flask-admin extension to upgrade the Azure Blob Storage SDK from v2 (an old legacy SDK) to v12 (the latest). To make it easy for me to test out the change...
View ArticleAdd browser speech input & output to your app
One of the amazing benefits of modern machine learning is that computers can reliably turn text into speech, or transcribe speech into text, across multiple languages and accents. We can then use those...
View ArticleObservations: Using Python with DeepSeek-R1
Everyone's going ga-ga for DeepSeek-R1, so I thought I'd try it out in a live stream today:I'll summarize my experience in this post.I tried Python through two different hosts, via the OpenAI Python...
View ArticleSafety evaluations for LLM-powered apps
When we build apps on top of Large Language Models, we need to evaluate the app responses for quality and safety. When we evaluate the quality of an app, we're making sure that it provides answers that...
View ArticleEvaluating gpt-4o-mini vs. gpt-3.5-turbo for RAG applications
The azure-search-openai-demo repository was first created in March 2023 and is now the most popular RAG sample solution for Azure. Since the world of generative AI changes so rapidly, we've made many...
View Article