For the best reading experience, please switch to desktop view 💻✨

Harness AI to Build AI:
An LLM-powered co-pilot that provides in-session support for tutors

Overview

Detailed Process

Project Info

001

Timeline

6 months

Role

Lead Product Designer

Team

Zhiyuan Chen (Lead Product Designer)
Tina Chen (Product Designer)
Shivang Gupta (Head of Product)
Bill Guo (Product Manager)
Zach Levonian (ML Engineer)

Methods

Focus Group
Ideate w/ AI & Prompt Engineering
Data Analysis
Rapid Prototyping
Participatory Design

Company Info

002

Who is PLUS?

Led by Carnegie Mellon University and Stanford University, PLUS is a tutoring platform that combines human and AI tutoring to bridge opportunity gaps in math education.

3000 +

Middle School Students

500 +

Math Tutors

3000 +

Tutoring Hours Per Week

Case Summary

003

Problem: In a race against time, tutors struggle to make math stick and engage students

Tutors at PLUS conduct 30-minute sessions with about 5 students, giving them only 6 minutes per student on average. At the same time, they often struggle to explain math concepts clearly and efficiently, or to keep students engaged. In this race against time, tutors must maximize both clarity and engagement to ensure each student grasps the material and feel more motivated.

Requirement: Embracing the trend of AI

Following the wave of Al, the Head of Product wants us to design an LLM-powered solution to help tutors with this issue.

Solution: Empowering tutors with a co-pilot

We developed an LLM-powered co-pilot to assist tutors clearly explain math problems, provide effective encouragement, and ask strategic leading questions, ensuring they make the most of their limited time with each student to enhance engagement and learning outcomes.

Impact: Improving session efficiency

300 +

MAU

20%

decrease in time spent explaining math concepts

38%

increase in student engagement

End Results

004

Enter or upload a math problem and Co-pilot generates a step-by-step guide, encouraging phrases, and leading questions to assist tutors with explaining problems and engaging students.

Ask followup questions or ask to extend or reduce the number of steps.

Provide quick and impactful feedback for developers in no time!

Detailed Process

For the curious minds only 🤓

Research

001

Leading the effort to identify user pain points

12 Video Analysis

Understand session structure
Observe tutor and student behaviors and interaction patterns
Identify challenges and frictions

2 Focus Group Interviews

Understand tutors' pain points, including when they occur, their frequency, and severity
Understand how they manage challenges and the support available to tutors

After synthesizing the result into a user journey map, I decided to focus on the in-session phase because the inadequate existing support and the numerous pain points highlight a significant opportunity for intervention.

Mapping user pain points and the level of existing support onto various phases

How might we empower PLUS tutors to make tutoring sessions effective and engaging by providing in-session support that addresses their most critical needs?

Ideation

002

From 0 to 200: A wild exploration with AI-driven brainstorming

To quickly generate a wide range of creative ideas, I decided to leverage AI-driven brainstorming.

After going through 3 iterations in prompt engineering where I simultaneously evaluate input and output to find the most effective prompt that lead to most reliable ideas, I successfully generated over 200 ideas within just 1 hour.

Iteration 1: Pilot Run
I input a brief prompt and discovered that the output quality was poor—lacking context, practicality, and relevance to our prompt.
Iteration 2: Dumping Information and Requirements
To make the prompt more detailed, I tried to provide as much information as possible and also introduced a template for the desired format. However, it soon became clear that this restricted the model's creativity, resulting in repetitive outputs.
Iteration 3: Everything in Moderation
Realizing providing more information would not necessarily improve the output, I shifted the focus to providing critical information only while maintain a clear structure and conciseness.
Iteration 1: Pilot Run
I input a brief prompt and discovered that the output quality was poor—lacking context, practicality, and relevance to our prompt.
Iteration 2: Dumping Information and Requirements
To make the prompt more detailed, I tried to provide as much information as possible and also introduced a template for the desired format. However, it soon became clear that this restricted the model's creativity, resulting in repetitive outputs.
Iteration 3: Everything in Moderation
Realizing providing more information would not necessarily improve the output, I shifted the focus to providing critical information only while maintain a clear structure and conciseness.
Iteration 1: Pilot Run
I input a brief prompt and discovered that the output quality was poor—lacking context, practicality, and relevance to our prompt.
Iteration 2: Dumping Information and Requirements
To make the prompt more detailed, I tried to provide as much information as possible and also introduced a template for the desired format. However, it soon became clear that this restricted the model's creativity, resulting in repetitive outputs.
Iteration 3: Everything in Moderation
Realizing providing more information would not necessarily improve the output, I shifted the focus to providing critical information only while maintain a clear structure and conciseness.
Iteration 1: Pilot Run
I input a brief prompt and discovered that the output quality was poor—lacking context, practicality, and relevance to our prompt.
Iteration 2: Dumping Information and Requirements
To make the prompt more detailed, I tried to provide as much information as possible and also introduced a template for the desired format. However, it soon became clear that this restricted the model's creativity, resulting in repetitive outputs.
Iteration 3: Everything in Moderation
Realizing providing more information would not necessarily improve the output, I shifted the focus to providing critical information only while maintain a clear structure and conciseness.

The ideas before were out of scope and impractical whereas the ideas after are applicable and innovative.

Synthesis

003

From 200 to 2: Identifying a deliverable MVP

After several rounds of initial filtering, we trimmed our potential ideas from 200 to 10. I then used multi-method approaches to seek input from different stakeholders to further narrow down ideas and eventually landed on 2 solutions that are high-impact and low-effort.

Step 1. Evaluate Technical Difficulty with Dev
We conducted a workshop with the product head and the lead developer to assess the technical difficulty of each idea. At this point, we didn't use their input as a strict yes-or-no decision maker, but as a reference to guide our design direction.
Validate with End-Users
I also designed a survey for tutors to assess the relevance (validating needs) and helpfulness (validating solutions) of each idea using Likert Scale. To make tutors better understand and relate to them, we created textual storyboards in a "Problem-Solution-Resolution" format to provide context. Finally, we plotted the average scores of all ideas on a Relevance vs. Helpfulness matrix.
Cross-reference Developer and User Feedback
I cross-referenced the earlier assessed technical difficulty of each idea with their relevance and helpfulness to identify the low-hanging fruit—ideas with the highest impact and lowest technical difficulty. This process led us to focus on 2 ideas that are most valuable:
Step 1. Evaluate Technical Difficulty with Dev
We conducted a workshop with the product head and the lead developer to assess the technical difficulty of each idea. At this point, we didn't use their input as a strict yes-or-no decision maker, but as a reference to guide our design direction.
Validate with End-Users
I also designed a survey for tutors to assess the relevance (validating needs) and helpfulness (validating solutions) of each idea using Likert Scale. To make tutors better understand and relate to them, we created textual storyboards in a "Problem-Solution-Resolution" format to provide context. Finally, we plotted the average scores of all ideas on a Relevance vs. Helpfulness matrix.
Cross-reference Developer and User Feedback
I cross-referenced the earlier assessed technical difficulty of each idea with their relevance and helpfulness to identify the low-hanging fruit—ideas with the highest impact and lowest technical difficulty. This process led us to focus on 2 ideas that are most valuable:
Step 1. Evaluate Technical Difficulty with Dev
We conducted a workshop with the product head and the lead developer to assess the technical difficulty of each idea. At this point, we didn't use their input as a strict yes-or-no decision maker, but as a reference to guide our design direction.
Validate with End-Users
I also designed a survey for tutors to assess the relevance (validating needs) and helpfulness (validating solutions) of each idea using Likert Scale. To make tutors better understand and relate to them, we created textual storyboards in a "Problem-Solution-Resolution" format to provide context. Finally, we plotted the average scores of all ideas on a Relevance vs. Helpfulness matrix.
Cross-reference Developer and User Feedback
I cross-referenced the earlier assessed technical difficulty of each idea with their relevance and helpfulness to identify the low-hanging fruit—ideas with the highest impact and lowest technical difficulty. This process led us to focus on 2 ideas that are most valuable:
Step 1. Evaluate Technical Difficulty with Dev
We conducted a workshop with the product head and the lead developer to assess the technical difficulty of each idea. At this point, we didn't use their input as a strict yes-or-no decision maker, but as a reference to guide our design direction.
Validate with End-Users
I also designed a survey for tutors to assess the relevance (validating needs) and helpfulness (validating solutions) of each idea using Likert Scale. To make tutors better understand and relate to them, we created textual storyboards in a "Problem-Solution-Resolution" format to provide context. Finally, we plotted the average scores of all ideas on a Relevance vs. Helpfulness matrix.
Cross-reference Developer and User Feedback
I cross-referenced the earlier assessed technical difficulty of each idea with their relevance and helpfulness to identify the low-hanging fruit—ideas with the highest impact and lowest technical difficulty. This process led us to focus on 2 ideas that are most valuable:

Evaluate Technical Difficulty with Dev
I facilitated a workshop with the head of product and the ML engineer to assess the technical difficulty of each idea. At this point, we didn't use their input as a strict yes-or-no decision maker, but as a reference to guide our design direction.
Validate with End-Users
I also designed a survey for tutors to assess the relevance (validating needs) and helpfulness (validating solutions) of each idea using Likert Scale. To make tutors better understand and relate to them, we created textual storyboards in a "Problem-Solution-Resolution" format to provide context. Finally, we plotted the average scores of all ideas on a Relevance vs. Helpfulness matrix.
Cross-reference Developer and User Feedback
I cross-referenced the earlier assessed technical difficulty of each idea with their relevance and helpfulness to identify the low-hanging fruit—ideas with the highest impact and lowest technical difficulty. This process led us to focus on 2 ideas that are most valuable:
Evaluate Technical Difficulty with Dev
I facilitated a workshop with the head of product and the ML engineer to assess the technical difficulty of each idea. At this point, we didn't use their input as a strict yes-or-no decision maker, but as a reference to guide our design direction.
Validate with End-Users
I also designed a survey for tutors to assess the relevance (validating needs) and helpfulness (validating solutions) of each idea using Likert Scale. To make tutors better understand and relate to them, we created textual storyboards in a "Problem-Solution-Resolution" format to provide context. Finally, we plotted the average scores of all ideas on a Relevance vs. Helpfulness matrix.
Cross-reference Developer and User Feedback
I cross-referenced the earlier assessed technical difficulty of each idea with their relevance and helpfulness to identify the low-hanging fruit—ideas with the highest impact and lowest technical difficulty. This process led us to focus on 2 ideas that are most valuable:
Evaluate Technical Difficulty with Dev
I facilitated a workshop with the head of product and the ML engineer to assess the technical difficulty of each idea. At this point, we didn't use their input as a strict yes-or-no decision maker, but as a reference to guide our design direction.
Validate with End-Users
I also designed a survey for tutors to assess the relevance (validating needs) and helpfulness (validating solutions) of each idea using Likert Scale. To make tutors better understand and relate to them, we created textual storyboards in a "Problem-Solution-Resolution" format to provide context. Finally, we plotted the average scores of all ideas on a Relevance vs. Helpfulness matrix.
Cross-reference Developer and User Feedback
I cross-referenced the earlier assessed technical difficulty of each idea with their relevance and helpfulness to identify the low-hanging fruit—ideas with the highest impact and lowest technical difficulty. This process led us to focus on 2 ideas that are most valuable:
Evaluate Technical Difficulty with Dev
I facilitated a workshop with the head of product and the ML engineer to assess the technical difficulty of each idea. At this point, we didn't use their input as a strict yes-or-no decision maker, but as a reference to guide our design direction.
Validate with End-Users
I also designed a survey for tutors to assess the relevance (validating needs) and helpfulness (validating solutions) of each idea using Likert Scale. To make tutors better understand and relate to them, we created textual storyboards in a "Problem-Solution-Resolution" format to provide context. Finally, we plotted the average scores of all ideas on a Relevance vs. Helpfulness matrix.
Cross-reference Developer and User Feedback
I cross-referenced the earlier assessed technical difficulty of each idea with their relevance and helpfulness to identify the low-hanging fruit—ideas with the highest impact and lowest technical difficulty. This process led us to focus on 2 ideas that are most valuable:

➡️ Solution I
A step-by-step guide to math problems for tutors to provide explanations effeciently

➡️ Solution II
Strategic leading questions for tutors to ask students instead of offering answers directly

Rapid Prototyping

004

Co-create solutions for effective and desirable LLM output

Although the design direction is clear, we were unsure what kind of content output would be most helpful and meet tutors' needs. To develop a tool that's useful for them, we conducted 5 participatory design sessions, inviting tutors to build the co-pilot together.

Step 1. Pre-train the model
First, I input a prepared initial prompt into the GPT for pre-training the model.
Step 2. Experiment the model with pre-selected math problems
We devised some math problems that reflect the types students frequently find challenging and entered them into the model to observe the output, using this to guide further improvements.
Step 3. Solicit feedback & offer ideas
The key to effective soliciting is being adaptable and asking insightful questions. I asked about various aspects, such as what they liked or disliked about the model, the reasons behind their opinions, and their suggestions for improvements.
Step 4. Synthesize, retrain & iterate
As tutors provide feedback, I compiled and typed it on the fly in the left panel to retrain the model. After then, I would iterate on the steps 3 to 4 until the output was desirable.
Step 1. Pre-train the model
First, I input a prepared initial prompt into the GPT for pre-training the model.
Step 2. Experiment the model with pre-selected math problems
We devised some math problems that reflect the types students frequently find challenging and entered them into the model to observe the output, using this to guide further improvements.
Step 3. Solicit feedback & offer ideas
The key to effective soliciting is being adaptable and asking insightful questions. I asked about various aspects, such as what they liked or disliked about the model, the reasons behind their opinions, and their suggestions for improvements.
Step 4. Synthesize, retrain & iterate
As tutors provide feedback, I compiled and typed it on the fly in the left panel to retrain the model. After then, I would iterate on the steps 3 to 4 until the output was desirable.
Step 1. Pre-train the model
First, I input a prepared initial prompt into the GPT for pre-training the model.
Step 2. Experiment the model with pre-selected math problems
We devised some math problems that reflect the types students frequently find challenging and entered them into the model to observe the output, using this to guide further improvements.
Step 3. Solicit feedback & offer ideas
The key to effective soliciting is being adaptable and asking insightful questions. I asked about various aspects, such as what they liked or disliked about the model, the reasons behind their opinions, and their suggestions for improvements.
Step 4. Synthesize, retrain & iterate
As tutors provide feedback, I compiled and typed it on the fly in the left panel to retrain the model. After then, I would iterate on the steps 3 to 4 until the output was desirable.
Step 1. Pre-train the model
First, I input a prepared initial prompt into the GPT for pre-training the model.
Step 2. Experiment the model with pre-selected math problems
We devised some math problems that reflect the types students frequently find challenging and entered them into the model to observe the output, using this to guide further improvements.
Step 3. Solicit feedback & offer ideas
The key to effective soliciting is being adaptable and asking insightful questions. I asked about various aspects, such as what they liked or disliked about the model, the reasons behind their opinions, and their suggestions for improvements.
Step 4. Synthesize, retrain & iterate
As tutors provide feedback, I compiled and typed it on the fly in the left panel to retrain the model. After then, I would iterate on the steps 3 to 4 until the output was desirable.

Final Design

005

Convergence of Two Solutions

Combining the two models into one saves time and reduces unnecessary effort for tutors

Table Instead of Bullets

Table format enhances readability and avoids the overwhelm of paragraph-based content

Motivational Boost

Adding specific words of encouragement that tutors can easily share with students

Emoji for a Human Touch

Evoking positive emotions, subtly encouraging tutors to maintain a warm and positive tone

Model Iterations

006

Transform SME feedback into clear action steps for ML engineer

To continuously enhance model output, I conducted two rounds of feedback collection from tutor supervisors. I synthesized this feedback into concrete and actionable steps for developers to iterate on.

Initial feedback, unstructured and wordy

Synthesized feedback, contextual, actionable and with examples

Reflection

007

Embracing AI to drive innovation and efficiency

As an early adopter, I’m always drawn to emerging trends and embracing AI has been a pivotal step in my growth as a designer. The GPT co-design process, for instance, allowed me to rapid prototype with improved efficiency and no cost. Reflecting on this experience, I’m proud of how AI has sharpened my adaptability and strengthened my ability to innovate in a rapidly changing landscape.

Over-communication for success

Throughout this process, I learned the importance of clear communication in fostering strong collaboration. By regularly engaging with the cross-functional team and actively listening to feedback from all stakeholders, I ensured everyone was aligned and involved. This approach not only built trust but also kept the project moving forward smoothly, reinforcing how essential open communication is in navigating complex projects.

Harness AI to Build AI: An LLM-powered co-pilot that provides in-session support for tutors

Harness AI to Build AI: An LLM-powered co-pilot that provides in-session support for tutors

Who is PLUS?

3000 +

500 +

3000 +

Detailed Process

For the curious minds only 🤓

For the curious minds only 🤓

How might we empower PLUS tutors to make tutoring sessions effective and engaging by providing in-session support that addresses their most critical needs?

How might we empower PLUS tutors to make tutoring sessions effective and engaging by providing in-session support that addresses their most critical needs?

Iteration 1: Pilot Run

Iteration 2: Dumping Information and Requirements

Iteration 3: Everything in Moderation

Step 1. Evaluate Technical Difficulty with Dev

Validate with End-Users

Cross-reference Developer and User Feedback

Evaluate Technical Difficulty with Dev

Validate with End-Users

Cross-reference Developer and User Feedback

Step 1. Pre-train the model

Step 2. Experiment the model with pre-selected math problems

Step 3. Solicit feedback & offer ideas

Step 4. Synthesize, retrain & iterate

Thanks for visiting my portfolio

Linkedin ↗

Resume ↗

zhiyuanchen.zc@gmail.com

Thanks for visiting my portfolio

Linkedin ↗

Resume ↗

zhiyuanchen.zc@gmail.com

Thanks for visiting my portfolio

Linkedin ↗

Resume ↗

zhiyuanchen.zc@gmail.com

Harness AI to Build AI:
An LLM-powered co-pilot that provides in-session support for tutors

Harness AI to Build AI:
An LLM-powered co-pilot that provides in-session support for tutors