Quantcast
Browsing latest articles
Browse All 188 View Live
↧

Image may be NSFW.
Clik here to view.

Using FastAPI for an OpenAI chat backend

When building web APIs that make calls to OpenAI servers, we really want a backend that supports concurrency, so that it can handle a new user request while waiting for the OpenAI server response....

View Article


Developer relations & motherhood: Will they blend?

My very first job out of college was in developer relations at Google, and it was absolutely perfect for me; a way to combine my love for programming with my interest in teaching. I got to code, write...

View Article


Image may be NSFW.
Clik here to view.

Evaluating a RAG chat app: Approach, SDKs, and Tools

When we’re programming user-facing experiences, we want to feel confident that we’re creating a functional user experience - not a broken one! How do we do that? We write tests, like unit tests,...

View Article

Converting HTML pages to PDFs with Playwright

In this post, I'll share a fairly easy way to convert HTML pages to PDF files using the Playwright E2E testing library.Background: I am working on a RAG chat app solution that has a PDF ingestion...

View Article

Image may be NSFW.
Clik here to view.

RAG techniques: Cleaning user questions with an LLM

When I introduce app developers to the concept of RAG (Retrieval Augmented Generation), I often present a diagram like this:The app receives a user question, uses the user question to search a...

View Article


RAG techniques: Function calling for more structured retrieval

Retrieval Augmented Generation (RAG) is a popular technique to get LLMs to provide answers that are grounded in a data source. When we use RAG, we use the user's question to search a knowledge base...

View Article

Image may be NSFW.
Clik here to view.

Evaluating RAG chat apps: Can your app say "I don't know"?

In a recent blog post, I talked about the importance of evaluating the answer quality from any RAG-powered chat app, and I shared my ai-rag-chat-evaluator repo for running bulk evaluations.In that...

View Article

Truncating conversation history for OpenAI chat completions

When I build chat applications using the OpenAI chat completions API, I often want to send a user's previous messages to the model so that the model has more context for a user's question. However,...

View Article


Image may be NSFW.
Clik here to view.

Doing RAG? Vector search is *not* enough

I'm concerned by the number of times I've heard, "oh, we can do RAG with retriever X, here's the vector search query." Yes, your retriever for a RAG flow should definitely support vector search, since...

View Article


Image may be NSFW.
Clik here to view.

RAG on a database table with PostgreSQL

RAG (Retrieval Augmented Generation) is one of the most promising uses for large language models. Instead of asking an LLM a question and hoping the answer lies somewhere in its weights, we instead...

View Article

Image may be NSFW.
Clik here to view.

Using SLMs in GitHub Codespaces

Today I went on a quest to figure out the best way to use SLMs (small language models) like Phi-3 in a GitHub Codespace, so that I can provide a browser-only way for anyone to start working with...

View Article

pgvector for Python developers

Lately, I've been digging into vector embeddings, since they're such an important part of the RAG (Retrieval Augmented Generation) pattern that we use in our most popular AI samples. I think that when...

View Article

Image may be NSFW.
Clik here to view.

Should you use Quart or FastAPI for an AI app?

As I have discussed previously, it is very important to use an async framework when developing apps that make calls to generative AI APIs, so that your backend processes can concurrently handle other...

View Article


Image may be NSFW.
Clik here to view.

Playwright and Pytest parametrization for responsive E2E tests

I am a big fan of Playwright, a tool for end-to-end testing that was originally built for Node.JS but is also available in Python and other languages.Playwright 101For example, here's a simplified test...

View Article

Image may be NSFW.
Clik here to view.

Making an Ollama-compatible RAG app

Ollama is a tool that makes it easy to run small language models (SLMs) locally on your own machine - Mac, Windows, or Linux - regardless of whether you have a powerful GPU. It builds on top of...

View Article


Image may be NSFW.
Clik here to view.

Integrating vision into RAG applications

 Retrieval Augmented Generation (RAG) is a popular technique to get LLMs to provide answers that are grounded in a data source. What do you do when your knowledge base includes images, like graphs or...

View Article

My parenting strategy: earn enough $ to outsource

Two kids are a lot. I know, its really not a lot in comparison to the many kids that women have had to birth and care for over the history of humanity. But still, it feels like a lot to me. My partner...

View Article


Entity extraction using OpenAI structured outputs mode

The relatively new structured outputs mode from the OpenAI gpt-4o model makes it easy for us to define an object schema and get a response from the LLM that conforms to that schema.Here's the most...

View Article

Image may be NSFW.
Clik here to view.

My first PyBay: Playing improv with Python

A few months ago in September, I attended my very first PyBay: an annual conference in San Francisco bringing together Pythonistas from across the bay area. It was a 2-track single-day conference, with...

View Article

Making a dev container with multiple data services

A dev container is a specification that describes how to open up a project in VS Code, GitHub Codespaces, or any other IDE supporting dev containers, in a consistent and repeatable manner. It builds on...

View Article

Running Azurite inside a Dev Container

I recently worked on an improvement to the flask-admin extension to upgrade the Azure Blob Storage SDK from v2 (an old legacy SDK) to v12 (the latest). To make it easy for me to test out the change...

View Article


Image may be NSFW.
Clik here to view.

Add browser speech input & output to your app

One of the amazing benefits of modern machine learning is that computers can reliably turn text into speech, or transcribe speech into text, across multiple languages and accents. We can then use those...

View Article


Observations: Using Python with DeepSeek-R1

Everyone's going ga-ga for DeepSeek-R1, so I thought I'd try it out in a live stream today:I'll summarize my experience in this post.I tried Python through two different hosts, via the OpenAI Python...

View Article

Image may be NSFW.
Clik here to view.

Safety evaluations for LLM-powered apps

When we build apps on top of Large Language Models, we need to evaluate the app responses for quality and safety. When we evaluate the quality of an app, we're making sure that it provides answers that...

View Article

Evaluating gpt-4o-mini vs. gpt-3.5-turbo for RAG applications

The azure-search-openai-demo repository was first created in March 2023 and is now the most popular RAG sample solution for Azure. Since the world of generative AI changes so rapidly, we've made many...

View Article

Browsing latest articles
Browse All 188 View Live