TL;DR

Build your own APIs for serving AI models, covering everything from basic server setup and AI model integration to authentication, database-backed user management, and rate limiting—transforming you from an API consumer to an API producer.

In the previous two modules we’ve seen many industry-standard API techniques and practices. Through the power of APIs we’ve also played with AI systems run by someone else. The problem is you are always spending money by doing so. It’s time to serve your own APIs so that you turn from consumers to producers (and maybe earn some money by letting other people use your APIs).

Example

This blog site is a fully self-hosted website with basic HTTP-based APIs for handling GET requests from browsers. When you visit this post, your browser essentially sends a GET request to my server and the server responds with the HTML body for the browser to render. Knowing how to implement your own APIs enables you to do lots of cool stuff that you can control however you want!

APIs are served by API servers—a type of application that listens to API requests sent to them and produces the corresponding responses. They are like kitchens that maintain order and delivery windows for accepting and fulfilling orders, but usually keep the process of how an order is processed behind the doors. Publicly accessible APIs that you’ve been playing with in previous modules are nothing magic: they are served by API servers run by providers on one or more machines identified by the APIs’ corresponding domains. We will compare a few choices of Python frameworks for implementing API servers, and focus on one of them to demonstrate how to implement API fundamentals you learned from previous modules in practice.

Python API Servers

Nowadays Python is the de facto language for implementing AI models. When we wrap our AI models with APIs, it would be straightforward if API servers are also implemented with Python, so that we can implement models and servers in one Python program. Thus, we will take a look at three popular Python frameworks commonly used to implement API servers: FastAPI, Django, and Flask.

FastAPI is a modern and high-performance framework used to build APIs quickly and efficiently. It is a relatively new player in Python API frameworks, but has quickly become one of the fastest-growing frameworks in Python. It has built-in support for essential components of APIs such as authentication and input validation. FastAPI is also suitable for implementing high-performing API servers thanks to its asynchronous support—think of a kitchen that won’t be occupied by a few orders under processing and can always take and process new requests. Below is a barebone API server implemented in FastAPI:

from fastapi import FastAPI
 
app = FastAPI()
 
@app.get("/")
def read_root():
    return {"message": "Hello, World!"}

Django is a comprehensive web framework that is designed to implement complex web applications (websites) instead of focusing on APIs. Django follows the classic MVT (Model-View-Template) design pattern of web apps, where model represents the data you want to display, typically sourced from a database; view handles incoming requests and returns the appropriate template and content based on the user’s request; and template is an HTML file that defines the structure of the web page and includes logic for displaying the data. It also comes with lots of built-in modules for building web apps, such as database connectors and authentication. Below is a minimal Django implementation.

# urls.py
from django.urls import path
from . import views
 
urlpatterns = [
    path('', views.hello_world, name='hello_world'),
]
 
# views.py
from django.http import JsonResponse
 
def hello_world(request):
    return JsonResponse({"message": "Hello, World!"})

Flask is a web framework similar to Django, but it is designed to be lightweight and modular. Its built-in functionalities are basic, but it can be extended through additional packages and is suitable for implementing smaller-scale applications or prototyping. It is also usually considered the least performant among the three frameworks, due to its lack of asynchronous support. Below is a barebone implementation with Flask.

from flask import Flask, jsonify
 
app = Flask(__name__)
 
@app.route('/')
def hello_world():
    return jsonify({"message": "Hello, World!"})

The three examples above can all achieve similar results—implement an API server that takes GET requests and returns a hello-world message. You can tell that comparatively the implementation with FastAPI and Flask is simpler than that with Django. We will use FastAPI as the primary example for demonstrating how to build your own API servers in the following content.

FastAPI Fundamentals

We will start with implementing an API server with essential functionalities: accept requests to specific routes (specified by the URL) with GET and POST methods.

Basic Setup

We will need both fastapi and uvicorn packages, where uvicorn is what we call a server worker. Essentially fastapi primarily handles the definition of the server, and uvicorn actually does the API serving heavy-lifting. Extending the above example, we can start from a minimal implementation, but this time with some customization so it feels more like our own:

# main.py
from fastapi import FastAPI
 
app = FastAPI(title="My AI API Server", version="1.0.0")
 
@app.get("/")
def read_root():
    return {"message": "Welcome to my AI API server!"}

And to start our server, run:

uvicorn main:app --reload --host 127.0.0.1 --port 8000

Where main:app points to the app object we implemented in the main program. --reload tells the server to automatically restart itself after we modify main.py for ease of development. 127.0.0.1 is the IP of “localhost”—the computer we run the server on, and --host 127.0.0.1 means the server will only accept requests sent from the same computer. 8000 is the port our server listens on, in other words, the port used to identify our server application. You can now try to send a GET request to http://127.0.0.1:8000 with another Python application and the requests library, or by accessing the URL in your browser, and you should be able to see the message.

You will also be able to see the log messages from your server:

INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [52851] using StatReload
INFO:     Started server process [52853]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     127.0.0.1:56835 - "GET / HTTP/1.1" 200 OK

Showing that a GET request from 127.0.0.1:56835 (location of the client application you used to send the request, the port you see might be different) for the route / is responded with 200 OK. Now try editing main.py and you will see the reload functionality working:

WARNING:  StatReload detected changes in 'main.py'. Reloading...
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [54804]
INFO:     Started server process [54811]
INFO:     Waiting for application startup.
INFO:     Application startup complete.

Routes and URL Variables

Remember the OpenAI and Anthropic APIs you’ve played with? They have lots of routes under the same domain, for example api.openai.com/v1/responses and api.openai.com/v1/chat/completions. You can also easily define specific routes in FastAPI using the @app.get decorator’s parameter. For example, in main.py we add another route:

@app.get("/secret")
def get_secret():
    return {"message": "You find my secret!"}

And you will get the corresponding responses from / and /secret routes by accessing http://127.0.0.1:8000 and http://127.0.0.1:8000/secret, respectively.

There are also occasions you want users to pass variables through URLs. For example YouTube channels’ URLs are given as https://www.youtube.com/@DigitalFoundry or https://www.youtube.com/channel/UCm22FAXZMw1BaWeFszZxUKw. Implementing a separate route for each unique variable is clearly not practical. Luckily you have two ways to pass and parse variables in FastAPI’s routes. One is through URL template:

@app.get("/parrot/{message}")
def repeat_message(message: str):
    return {"message": message}

Try accessing http://127.0.0.1:8000/parrot/ + any message. You can mix multiple fixed paths and variables in one route, for example:

@app.get("/parrot/{message}-{date}/secret/{user}")
def repeat_message(message: str, user: int, date: str):
    return {"message": f"A secret message {message} sent by user {user} on {date}."}
# Try access http://localhost:8000/parrot/random-July26/secret/21

Another way is through URL parameters. These are variables specified following ? at the end of URLs with format <key>=<value> for each variable, and can be &-separated for specifying multiple variables. For example, https://www.youtube.com/watch?v=5tdsZwlWXAc&t=2s. In FastAPI, URL parameters are caught by function parameters that are not covered by URL templates:

@app.get("/secret")
def get_secret(user: int = 0):
    return {"message": f"User {user} find my secret!"}

Try accessing http://127.0.0.1:8000/secret?user=2. Needless to say, you can mix the above two approaches in one route.

Note

Strictly speaking URL parameters are part of URL templates, and URL templates can go quite complicated. But in practice following REST principles, you should keep your served URLs intuitive and straightforward.

Handle POST Requests

You’ve noticed that the routes we implemented above can only handle GET requests. To handle POST requests we will have to additionally read the incoming data (request body). Since incoming and response data are often more complicated and structured for POST routes, it’s also worth bringing up pydantic for building data models and integrating into our FastAPI server.

Let’s say you expect user to send a request body with following format:

{
    "message": "This is a very secret message!",
    "date": "2025-07-30",
    "user": 20
}

You can, technically, simply use an @app.post decorated function and catch the request body as a plain Python dict:

@app.post("/receiver")
def receiver(data: dict):
    user = data['user']
    message = data['message']
    date = data['date']
    return {"message": f"User {user} send a secret message '{message}' on {date}."}

The problem is, you cannot ensure that the data sent by users actually complies with the data type you want. For example, a request body with user as a string will also be accepted by the above route, but can cause problems in later processing. Manually implementing type checks can be tedious.

Fortunately, FastAPI can incorporate pydantic, a data validation library to abstract data models and perform automatic type checking. Extending the above example, we first define a data class and require the request body to follow the definition:

from pydantic import BaseModel
 
class ReceivedData(BaseModel):
    user: int
    message: str
    date: str
 
@app.post("/receiver")
def receiver(data: ReceivedData):
    user = data.user
    message = data.message
    date = data.date
    return {"message": f"User {user} sent a secret message '{message}' on {date}."}

Now if the request body contains invalid data types, FastAPI will reject the request and return 422 Unprocessable Content.

Extended Reading

There are many more benefits and functionalities of integrating pydantic to define and abstract data models in FastAPI, including reusability and automatic documentation generation. Generally speaking, when implementing API servers, data model abstraction is a preferred practice. Take a look at more things you can do with both libraries combined:

API Versioning

As we covered in Advanced APIs in the Era of AI, API versioning allows you to introduce changes without breaking existing integrations. This is particularly important for AI APIs where models and features are constantly evolving. FastAPI makes implementing URL path versioning straightforward using APIRouter with prefixes.

from fastapi import APIRouter
from datetime import datetime
 
v1_router = APIRouter(prefix="/v1")
v2_router = APIRouter(prefix="/v2")
 
@v1_router.post("/receiver")
def receiver_v1(data: ReceivedData):
    return {"message": f"User {data.user} sent '{data.message}' on {data.date}"}
 
@v2_router.post("/receiver") 
def receiver_v2(data: ReceivedData):
    return {
        "message": f"User {data.user} sent '{data.message}' on {data.date}",
        "version": "2.0",
        "timestamp": datetime.now().isoformat()
    }
 
app.include_router(v1_router)
app.include_router(v2_router)

Now your API supports both versions simultaneously: users can access /v1/receiver for the original functionality while /v2/receiver provides enhanced features.

Extended Reading

Examples of implementing advanced API techniques introduced in Advanced APIs in the Era of AI with FastAPI:

Build APIs for AI Models

With the foundation of basic implementation of FastAPI servers, we proceed to integrate AI models and implement AI API servers. We will build an API server with image classification APIs as an example.

Barebone Implementation

First and foremost we need a image classification model to support the AI pipeline of our API server. You might already have a model implemented in previous or parallel courses lying around. For demonstration purpose here we will use an off-the-shelf model from HuggingFace.

import asyncio
from PIL import Image
import torch
from transformers import pipeline, AutoImageProcessor, AutoModelForImageClassification
 
class ImageClassifier:
    def __init__(self):
        self.model = None
        self.processor = None
        self.model_name = "microsoft/resnet-18"
        
    async def load_model(self):
        """Load the image classification model asynchronously"""
        if self.model is None:
            print(f"Loading model: {self.model_name}")
            self.model = AutoModelForImageClassification.from_pretrained(self.model_name)
            self.processor = AutoImageProcessor.from_pretrained(self.model_name)
            print("Model loaded successfully")
    
    async def classify_image(self, image: Image.Image) -> dict:
        """Classify a single image"""
        if self.model is None:
            await self.load_model()
        
        # Process image
        inputs = self.processor(image, return_tensors="pt")
        
        # Run inference
        with torch.no_grad():
            outputs = self.model(**inputs)
            predictions = torch.nn.functional.softmax(outputs.logits[0], dim=0)
        
        # Get top 5 predictions
        top_predictions = torch.topk(predictions, 5)
        
        results = []
        for score, idx in zip(top_predictions.values, top_predictions.indices):
            label = self.model.config.id2label[idx.item()]
            confidence = score.item()
            results.append({
                "label": label,
                "confidence": round(confidence, 4)
            })
        
        return {
            "predictions": results,
            "model": self.model_name
        }

We are using a very lightweight image classification model microsoft/resnet-18 that should be able to run on most PCs. Notice the async declaration on model loading and inference functions. It is there to make sure that when your server is loading the model or processing an incoming image, the server is still able to process other requests. Think of the server always assign a dedicated person (thread) to handle a incoming request, so that if it receives another request during the process, it can assign another person rather than waiting the previous person to finish their job.

Define the server app that will load the model when startup.

from fastapi import FastAPI, HTTPException
from contextlib import asynccontextmanager
 
# Initialize classifier
classifier = ImageClassifier()
 
@asynccontextmanager
async def lifespan(app: FastAPI):
    # Startup
    await classifier.load_model()
    yield
 
app = FastAPI(title="AI Image Classification API", version="1.0.0", lifespan=lifespan)

We also define a few data models for the incoming request and responses, and a utility function for reading image data. Note that we use base64-encoded images so that the request body is JSON as we familiar with.

import base64
import io
from typing import List, Optional
from pydantic import BaseModel
 
class ImageRequest(BaseModel):
    image: str  # base64 encoded image
    filename: Optional[str] = None
 
class ClassificationResponse(BaseModel):
    predictions: List[dict]
    model: str
 
class ModelInfo(BaseModel):
    name: str
    status: str
    num_labels: Optional[int] = None
 
def decode_base64_image(base64_string: str) -> Image.Image:
    """Decode base64 string to PIL Image"""
    try:
        # Remove data URL prefix if present
        if base64_string.startswith('data:image'):
            base64_string = base64_string.split(',')[1]
        
        # Decode base64
        image_data = base64.b64decode(base64_string)
        image = Image.open(io.BytesIO(image_data)).convert("RGB")
        return image
    except Exception as e:
        raise HTTPException(status_code=400, detail=f"Invalid base64 image: {str(e)}")

Finally, we implement two routes for users to fetch information about the model, and to perform image classification.

@app.get("/model/info", response_model=ModelInfo)
async def model_info():
    """Get model information"""
    if classifier.model is None:
        return ModelInfo(
            name=classifier.model_name,
            status="not_loaded"
        )
    
    return ModelInfo(
        name=classifier.model_name,
        status="loaded",
        num_labels=len(classifier.model.config.id2label)
    )
 
@app.post("/classify", response_model=ClassificationResponse)
async def classify_image(request: ImageRequest):
    """Classify a single base64 encoded image"""
    try:
        # Decode base64 image
        image = decode_base64_image(request.image)
        
        # Classify image
        result = await classifier.classify_image(image)
        
        return ClassificationResponse(**result)
        
    except HTTPException:
        raise
    except Exception as e:
        raise HTTPException(status_code=500, detail=f"Error processing image: {str(e)}")

Now you have a little image classification API server! I sent it a picture of Spanish-style seafood casserole I made yesterday (it’s delicious, by the way) by encoding the image to base64 format.

And I got the classification result from the server:

{
    "model": "microsoft/resnet-18",
    "predictions": [
        {
            "confidence": 0.5749,
            "label": "soup bowl"
        },
        {
            "confidence": 0.2213,
            "label": "consomme"
        },
        {
            "confidence": 0.1637,
            "label": "hot pot, hotpot"
        },
        {
            "confidence": 0.0107,
            "label": "mortar"
        },
        {
            "confidence": 0.0097,
            "label": "potpie"
        }
    ]
}

API Key Authentication

Right now our API server is unprotected and anyone that can access your PC can send requests and overload your delicate PC, while you also have no idea who are doing so. That’s why most APIs are protected with authentication (typically API keys), and we should also implement a similar system.

FastAPI has built-in authentication support, and to implement a basic API key authentication, we can use a verification function:

from fastapi import Depends, HTTPException, status
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
 
security = HTTPBearer()
 
async def verify_api_key(credentials: HTTPAuthorizationCredentials = Depends(security)):
    # API key validation logic
    if credentials.credentials != "your-secret-api-key":
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="Invalid API key"
        )
    return credentials.credentials

And for routes that need to be protected, we add the requirement for API keys:

@app.post("/classify", response_model=ClassificationResponse)
async def classify_image(request: ImageRequest, api_key: str = Depends(verify_api_key)):
  # Remaining code

Now only the request with authentication header Authorization:Bearer your-secret-api-key will be accepted by the classify route, otherwise it will return a 401 Unauthorized.

The limitation of the above implementation is that your API key is hardcoded. In practice you will want to have a dynamic list of API keys, one for each user, which also enables you to identify each request and keep track of the usage of users.

Extended Reading

Database Integration

Continue on the above topic, one common practice for recording the list of API keys and their respective users and other information is through databases. In previous AI og data course we already hands-on concepts of databases and interact with databases through SQL queries. You can directly use database connectors and integrate SQL queries into your API server, but similar to the pydantic library for managing data models, we also have sqlalchemy for managing data models for databases.

sqlalchemy provides high-level interface to interact with databases so you do not have to write SQL queries yourself, but focus on the abstract definition and manipulation of data models. Similar to pydantic providing automatic type verification, sqlalchemy also provide automatic database initialization and SQL injection protection. For dynamic API key and user management and usage tracking, we have the following two data models for each user and processed API request:

from sqlalchemy import create_engine, Column, Integer, String, DateTime, Float, ForeignKey
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, Session, relationship
from datetime import datetime
import time
 
Base = declarative_base()
 
class User(Base):
    __tablename__ = "users"
 
    id = Column(Integer, primary_key=True)
    api_key = Column(String, unique=True, nullable=False)
    email = Column(String)
    created_at = Column(DateTime, default=datetime.utcnow)
 
    requests = relationship("APIRequest", back_populates="user")
 
class APIRequest(Base):
    __tablename__ = "api_requests"
 
    id = Column(Integer, primary_key=True)
    user_id = Column(Integer, ForeignKey("users.id"))
    endpoint = Column(String)
    timestamp = Column(DateTime, default=datetime.utcnow)
    response_time_ms = Column(Float)
    status_code = Column(Integer)
 
    user = relationship("User", back_populates="requests")

Here we use SQLite as a simple database for demonstration, and the following code will create the database and tables when you start the server for the first time. You can see one of the benefits of sqlalchemy: later if you want to move to more performant databases, most of the time you just have to replace the database URL and reuse the data model, and sqlalchemy will handle the differences between databases for you.

engine = create_engine("sqlite:///ai_api.db")
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
Base.metadata.create_all(bind=engine)

To upgrade our API key authentication, we fetch the user using the given API key from the database:

security = HTTPBearer()
 
def get_db():
    db = SessionLocal()
    try:
        yield db
    finally:
        db.close()
 
async def get_current_user(
    credentials: HTTPAuthorizationCredentials = Depends(security),
    db: Session = Depends(get_db)
):
    user = db.query(User).filter(User.api_key == credentials.credentials).first()
    if not user:
        raise HTTPException(status_code=401, detail="Invalid API key")
    return user
 
@app.post("/classify", response_model=ClassificationResponse)
async def classify_image(
    request: ImageRequest, 
    user: User = Depends(get_current_user), 
    db: Session = Depends(get_db)
):
    # Remaining code

API Key Identification & Tracking

Now we can identify specific users with the API keys they used, we can also track their usage of our API server. Utilizing the APIRequest data model we defined earlier, we update our /classify route with additional recording of each request:

@app.post("/classify", response_model=ClassificationResponse)
async def classify_image(
    request: ImageRequest, 
    user: User = Depends(get_current_user), 
    db: Session = Depends(get_db)
):
    """Classify a single base64 encoded image"""
    start_time = time.time()
 
    try:
        # Classify image (your existing logic)
        image = decode_base64_image(request.image)
        result = await classifier.classify_image(image)
 
        # Log request
        api_request = APIRequest(
            user_id=user.id,
            endpoint="/classify",
            response_time_ms=(time.time() - start_time) * 1000,
            status_code=200
        )
        db.add(api_request)
        db.commit()
 
        return ClassificationResponse(**result)
 
    except Exception as e:
        # Log failed request
        api_request = APIRequest(
            user_id=user.id,
            endpoint="/classify",
            response_time_ms=(time.time() - start_time) * 1000,
            status_code=500
        )
        db.add(api_request)
        db.commit()
        raise HTTPException(status_code=500, detail=str(e))

Now whenever users send requests to our server, records will be stored into the api_requests table of our database:

1|1|/classify|2025-07-27 12:16:27.610650|21.8780040740967|200
2|1|/classify|2025-07-27 12:24:43.704042|22.1047401428223|200
3|1|/classify|2025-07-27 12:24:46.572790|16.6518688201904|200
4|1|/classify|2025-07-27 12:24:48.011679|16.9012546539307|200
5|1|/classify|2025-07-27 12:24:48.978239|16.8101787567139|200

We can also create a route for users to check their own usage status.

@app.get("/usage")
async def get_usage(
    user: User = Depends(get_current_user),
    db: Session = Depends(get_db)
):
    requests = db.query(APIRequest).filter(APIRequest.user_id == user.id).all()
 
    total = len(requests)
    successful = len([r for r in requests if r.status_code == 200])
    avg_time = sum(r.response_time_ms for r in requests) / total if total > 0 else 0
 
    return {
        "total_requests": total,
        "successful_requests": successful,
        "success_rate": round(successful / total * 100, 2) if total > 0 else 0,
        "avg_response_time_ms": round(avg_time, 2)
    }

Users can send a GET request to this route with their API keys and get a report of their usage:

{
    "avg_response_time_ms": 18.87,
    "success_rate": 100.0,
    "successful_requests": 5,
    "total_requests": 5
}

Rate Limiting

With API key-based user tracking in place, we can now implement rate limiting for each user to prevent bad actors from overloading our API server. Below is a simple DIY implementation that limits each user to sending 5 requests per minute, a measurement following the “sliding window” approach we introduced in Advanced APIs in the Era of AI.

from datetime import datetime, timedelta
 
async def check_rate_limit(user: User, db: Session):
    """Check if user has exceeded their rate limits"""
    now = datetime.utcnow()
 
    # Check requests in the last minute
    minute_ago = now - timedelta(minutes=1)
    recent_requests = db.query(APIRequest).filter(
        APIRequest.user_id == user.id,
        APIRequest.timestamp >= minute_ago
    ).count()
 
    if recent_requests >= 5:
        raise HTTPException(
            status_code=429,
            detail=f"Rate limit exceeded: 5 requests per minute"
        )

And in our /classify route, add one line of code at the start of the function:

    await check_rate_limit(user, db)

Now if a user sends more than 5 requests within one minute, their requests will be rejected with a 429 Too Many Requests. In practice you might also want to record users’ rate limit threshold in the User data model instead of hardcoding it.

Extended Reading

A few libraries for easier implementation of rate limiting:

Exercise

Build an image classification API server that demonstrates knowledge covered in this module, reversing your role from API consumer to producer.

Exercise: Image Classification API Server

Develop an API server that integrates the concepts covered throughout this module:

  • FastAPI Implementation: Use the FastAPI fundamentals covered in FastAPI Fundamentals, including proper route definition, request handling, and Pydantic data models
  • AI Model Integration: Integrate an image classification model following the patterns shown in Build APIs for AI Models, using an appropriate open-source model that can run on your system
  • Authentication System: Implement API key authentication as demonstrated in API Key Authentication to protect your endpoints
  • Database Integration: Use database integration techniques from Database Integration for user management and usage tracking
  • Rate Limiting: Apply rate limiting concepts from Rate Limiting to prevent server overload
  • API Versioning: Support API versioning using the approach shown in API Versioning

Client Integration:

Modify your image analysis program from API Fundamentals to connect to your server instead of third-party APIs.

Technical Guidelines:

  • Model Choice: Use lightweight models appropriate for your system capabilities (examples like ResNet models are mentioned in the module)
  • Data Format: Choose appropriate data formats for image handling
  • Database: Use suitable database solutions (examples like SQLite used in the module)