How to Boost Workflow with LLM Pair Programming in Jupyter AI

Install Jupyter AI, configure LLM providers, leverage %ai/%%ai to write Python, debug faster, and accelerate data science notebooks dramatically today.

Paco Awissi

11 min read • November 24, 2025

I'll be honest, when I first discovered Jupyter AI, it felt like someone finally understood what I actually needed. You know how it goes, you're deep in a notebook, debugging some pandas transformation, and you have to switch tabs to ChatGPT or copy-paste error messages into Claude. It's annoying. Jupyter AI just... stays where you are. It brings LLM-powered code generation directly into your notebook cells. No browser tabs, no context switching. You can ask it to write functions, explain that cryptic error you've been staring at for 20 minutes, or clean up that messy code you wrote at 2 AM. This tutorial will walk you through getting it set up, configuring a provider (I use OpenAI mostly, but Anthropic works great too), and actually using those %ai and %%ai magics to generate and debug Python code without leaving your workflow. If you're curious about the tech behind these models, check out how transformer models power modern LLMs.

Prerequisites

Okay, before we jump in, let me tell you what you'll need:

Python 3.8 or later on your machine. Nothing fancy, just a regular install
JupyterLab 3.x or Jupyter Notebook 7.x. And here's the thing, it doesn't work with Google Colab. I tried. Trust me.
An API key from at least one provider. OpenAI, Anthropic, Google, Mistral, they all work fine
You should know your way around Jupyter notebooks and basic Python. Nothing crazy.

Install Jupyter AI and Dependencies

So Jupyter AI plays nice with JupyterLab 3.x and Notebook 7.x. Here's what you need to run in your terminal to get the magics working, plus some common data science stuff you'll probably want anyway. Oh, and if you're using JupyterLab and want that chat UI (which is actually pretty cool), grab the optional package too.

# Create or activate your environment first if needed

# Core magics and helpful packages
pip install --upgrade pip
pip install jupyter-ai-magics python-dotenv pandas matplotlib

# Optional. Install the JupyterLab chat UI extension if you use JupyterLab.
pip install jupyter-ai

# Optional. Install provider SDKs so you can use their latest models.
# Install only what you plan to use.
pip install openai anthropic google-generativeai mistralai

Once that's done, fire up JupyterLab:

jupyter lab

Open up a fresh notebook and let's keep going.

Configure API Keys Securely

Here's something I learned the hard way. Jupyter AI needs to read your provider API keys from environment variables. Create a .env file in your project folder and drop your keys in there:

# .env
OPENAI_API_KEY=your_openai_key_here
ANTHROPIC_API_KEY=your_anthropic_key_here
GOOGLE_API_KEY=your_google_key_here
MISTRAL_API_KEY=your_mistral_key_here

Then at the very start of your notebook, run this cell to load them:

from dotenv import load_dotenv
_ = load_dotenv()  # Loads variables from .env into the environment

This way your keys are ready before the Jupyter AI extension even loads. Simple but important.

Load Jupyter AI Magics

Now let's actually load the extension so we can use those %ai and %%ai magics:

%load_ext jupyter_ai_magics

Quick sanity check, let's make sure it's working:

%ai openai/gpt-4o-mini Say hello in one short sentence.

If you see a response from the model, you're golden.

Define a Default Model

This is just a quality of life thing, but set a default model so you don't have to type it every single time. I usually create a Python variable and then interpolate it into my prompts.

# Pick a model you have access to.
# Examples: "openai/gpt-4o-mini", "anthropic/claude-3-5-sonnet", "google/gemini-1.5-pro", "mistral/mistral-large"
DEFAULT_MODEL = "openai/gpt-4o-mini"
DEFAULT_MODEL

Now you can just use {DEFAULT_MODEL} in your prompts. Saves typing, keeps things consistent.

Generate a Data Cleaning Function

Alright, let's actually do something useful. We'll use the %%ai cell magic to generate a function that cleans up a pandas DataFrame. Remember, the magic has to be the very first line of the cell or it won't work.

%%ai {DEFAULT_MODEL}
You are a Python expert. Write a function named clean_dataframe(df, inplace=False) that performs these steps:
- Strip whitespace from column names.
- Drop exact duplicate rows.
- Trim leading and trailing whitespace in string columns.
- Convert obvious numeric-like columns to numeric where safe.
- Fill missing values in numeric columns with the column median.
- If inplace is True, modify df in place and return df. Otherwise, return a new cleaned DataFrame.
Return only valid Python code for the function definition. Do not include any extra text.

Copy whatever it generates into a new cell and run it. Now you've got that function available in your notebook.

Refine the Function with Additional Requirements

Actually, wait. Let's make that function better. We should add some error handling and maybe support for inplace modifications:

%%ai {DEFAULT_MODEL}
You previously wrote clean_dataframe(df, inplace=False).
Refine it with:
- Defensive checks for non-DataFrame inputs. Raise a clear TypeError.
- More careful numeric conversion using errors='ignore'.
- A parameter columns_to_trim that accepts a list of column names to trim. Default trims all string columns.
- Docstring with args, returns, and examples.
Return only the updated Python function definition. No extra commentary.

Copy the updated version and run it to replace what we had before.

Use Prompt Interpolation for Context-Aware Code

This is where it gets really interesting. And honestly, this feature alone makes Jupyter AI worth it. Prompt interpolation lets you shove live data, error messages, schema info, whatever, directly into your %%ai prompts. The model gets way more context and generates much better code. It's like the difference between asking someone to cook dinner versus showing them what's in your fridge. If you want to understand why this works so well, check out our explainer on the magic of in-context learning. For more practical stuff, look at techniques for prompting reasoning models to get clear, accurate answers.

Let's load some sample data and pass its schema to the model:

import pandas as pd
import numpy as np

# Create a small, reproducible dataset
rng = np.random.default_rng(42)
df = pd.DataFrame({
    "total_bill": rng.normal(20, 8, 200).round(2),
    "tip": rng.normal(3, 1, 200).round(2),
    "size": rng.integers(1, 6, 200)
}).clip(lower=0)

schema = df.dtypes.to_string()
schema

Now generate a transformation function using that schema as context:

%%ai {DEFAULT_MODEL}
You are given this pandas DataFrame schema:
{schema}

Write a function transform_data(df) that:
- Adds a tip_pct column as tip / total_bill. Handle division by zero safely.
- Buckets size into small (1-2), medium (3-4), large (5+).
- Returns a new DataFrame with the new columns.
Return only valid Python code for the function definition.

Copy the function it generates into a new cell and run it to apply your transformation.

Debug Errors with AI Assistance

Let me show you something that's saved me countless hours. We'll deliberately break something to demonstrate:

# Deliberate typo in the column name to trigger a KeyError
bad_df = df.copy()
bad_df["tip_pct"] = bad_df["tip"] / bad_df["total_billl"]  # incorrect column name

Now feed that traceback to the model for a fix:

import traceback

try:
    # Re-run to capture the traceback
    bad_df["tip_pct"] = bad_df["tip"] / bad_df["total_billl"]
except Exception:
    error_trace = traceback.format_exc()

error_trace[:600]

%%ai {DEFAULT_MODEL}
You are a Python debugging assistant.
Here is the traceback:
{error_trace}

Given this code that caused the error:
bad_df["tip_pct"] = bad_df["tip"] / bad_df["total_billl"]

Explain the root cause in one sentence, then provide a single corrected line of code.
Return only the fixed line of Python code without extra text.

Apply whatever fix it suggests and check if it works:

# Apply the correct code. If the model suggested something equivalent, use that suggestion.
bad_df["tip_pct"] = bad_df["tip"] / bad_df["total_bill"]

# Quick validation
bad_df["tip_pct"].describe()

If this kind of AI orchestration gets you excited, you might enjoy building advanced multi-agent chatbots in Python notebooks.

Generate a Plotting Helper

Let's have it write us a reusable plotting function:

%%ai {DEFAULT_MODEL}
Write a function plot_histogram(df, column, bins=30, title=None, figsize=(6, 4)):
- Use matplotlib only.
- Validate inputs and raise a ValueError if column is missing or non-numeric.
- Show grid lines and a tight layout.
- Return the matplotlib Axes object.
Return only valid Python code for the function definition.

Copy that function into a new cell and use it to visualize your data:

import matplotlib.pyplot as plt

ax = plot_histogram(df, "total_bill", bins=25, title="Total Bill")
plt.show()

Validate Generated Code

Look, I've learned not to trust generated code blindly. Always add some basic sanity checks:

# Sanity checks for clean_dataframe
import inspect
assert "clean_dataframe" in globals() and inspect.isfunction(clean_dataframe)

toy = pd.DataFrame({"A": [1, 1, None], "B": [" x ", " y", " z "]})
out = clean_dataframe(toy)
assert isinstance(out, pd.DataFrame)
assert "A" in out.columns and "B" in out.columns
assert out.shape[0] <= toy.shape[0]
print("clean_dataframe sanity checks passed.")

These simple checks catch the obvious issues and give you more confidence in what the model generated.

Handle Provider Errors Gracefully

API calls fail. Rate limits, expired keys, network issues, it happens. Wrap your magic calls in try-except blocks:

from IPython import get_ipython

try:
    body = "Reply with 'ok' if you received this request."
    get_ipython().run_cell_magic("ai", DEFAULT_MODEL, body)
except Exception as e:
    import logging, time
    logging.exception("AI request failed")
    # Simple retry strategy
    time.sleep(1.5)
    try:
        get_ipython().run_cell_magic("ai", DEFAULT_MODEL, body)
    except Exception as e2:
        logging.exception("Second attempt failed")

For anything production-ish, you'll want proper logging and maybe exponential backoff for retries. But this gets you started.

Avoid Leaking Sensitive Data

This is important. When you're interpolating data into prompts, be careful about sensitive information. Redact or truncate columns with PII:

def safe_sample(df, cols_to_redact=None, max_rows=5, truncate=4):
    """
    Return a safe preview of df for prompts.
    Redact specified columns and truncate long strings.
    """
    import pandas as pd

    preview = df.sample(min(len(df), max_rows), random_state=42).copy()
    if cols_to_redact:
        for c in cols_to_redact:
            if c in preview.columns:
                preview[c] = "[REDACTED]"
    # Truncate long string values
    def _truncate(x):
        if isinstance(x, str) and len(x) > truncate:
            return x[:truncate] + "..."
        return x
    return preview.applymap(_truncate)

# Example usage
safe_preview = safe_sample(df, cols_to_redact=["email", "ssn"] if {"email", "ssn"}.issubset(df.columns) else [], max_rows=5)
safe_preview

Use that safe_sample in your prompts instead of the full dataset. Better safe than sorry.

End-to-End Runnable Example

Here's a complete workflow you can actually run from start to finish. I use something like this as a template for new projects:

# Environment and setup
from dotenv import load_dotenv
_ = load_dotenv()

%load_ext jupyter_ai_magics

import pandas as pd
import numpy as np

# Choose a model you have access to
DEFAULT_MODEL = "openai/gpt-4o-mini"

# Create a simple dataset
rng = np.random.default_rng(0)
df = pd.DataFrame({
    "total_bill": rng.normal(20, 7, 120).round(2),
    "tip": rng.normal(3, 1, 120).round(2),
    "size": rng.integers(1, 6, 120)
}).clip(lower=0)

df.head()

Generate your cleaning function:

%%ai {DEFAULT_MODEL}
Write a function clean_dataframe(df, inplace=False) that:
- Validates df is a pandas DataFrame.
- Strips whitespace from column names.
- Drops duplicate rows.
- Trims whitespace in string columns.
- Converts numeric-like columns with errors='ignore'.
- Fills NaNs in numeric columns with the column median.
- If inplace is True, modify df in place. Otherwise, return a new DataFrame.
Return only valid Python code for the function definition.

Copy it, run it, make sure it works:

# Example usage after you paste the generated function
cleaned = clean_dataframe(df)
cleaned.info()

# Basic checks
assert not cleaned.isna().sum().sum()
assert cleaned.shape[0] <= df.shape[0]

Generate a plot:

%%ai {DEFAULT_MODEL}
Write a function plot_histogram(df, column, bins=30, title=None, figsize=(6, 4)):
- Use matplotlib to plot a histogram of df[column].
- Validate the column exists and is numeric.
- Label axes and add a title if provided.
- Return the Axes object.
Return only valid Python code for the function definition.

Copy and run that too:

import matplotlib.pyplot as plt

ax = plot_histogram(cleaned, "total_bill", bins=25, title="Total Bill Distribution")
plt.show()

Next Steps

The thing about %ai and %%ai magics is that your prompts really matter. Bad prompt, bad code. It's that simple. If you want to get better at this, check out our guide on prompt engineering with LLM APIs.

And if you're looking to level up beyond just using these tools, if you want to actually build with AI, our practical roadmap for aspiring GenAI developers lays out the skills and projects that'll get you there. It's what I wish I had when I started.