#13 - Nina Moorman: Democratizing Robot Teaching & Hierarchical Learning

In this conversation, Nina discusses how non-experts can teach robots by breaking complex tasks into reusable subtasks, and why this hierarchical approach matters for real-world deployment. She shares insights from her studies with older adults and care providers, exploring how personality traits, physical proximity, and teaching interfaces shape human-robot interaction.

Democratizing robot teaching. The central challenge in robot learning is making it accessible to everyone, not just engineers. Nina emphasizes that the first step is evaluating how accessible current technology actually is with real end users. While many teaching interfaces have been developed, few are tested with the populations who will actually use them—like older adults, people with mild cognitive impairment, or family caregivers. Her approach involves continuously checking in with these groups to understand their pain points and preferences, then incorporating that feedback into the design process. The barriers to home robot teaching are numerous. Robots can arrive with pre-programmed functionality or learn from scratch in the home. The teaching methods vary widely: kinesthetic teaching involves physically moving the robot’s arms to demonstrate tasks, while observational learning means the robot watches you perform the action yourself—though this creates challenges since human and robot bodies are shaped differently. There’s also teleoperation with joysticks, but this can be cognitively demanding when it’s unclear what buttons do, especially with multi-armed robots. Each approach presents different accessibility opportunities and challenges. Making robot teaching truly democratic means evaluating these methods with diverse populations and designing systems that work for people regardless of their technical background or physical capabilities.

Making the teaching process natural. People have distinct preferences for how they teach robots, and these preferences vary across multiple dimensions. Physical workload matters—kinesthetic teaching where you manually move the robot can be exhausting. But cognitive workload is equally important. Ironically, physically moving the robot is often less mentally demanding because it’s more intuitive than using a teleoperation interface that requires translating your intentions into button presses. Nina’s work aims to make teaching easier through two complementary approaches: improving the human’s understanding and improving the robot’s understanding. On the human side, this means developing training curriculums that teach demonstrators how to effectively teach robots, even without robotics expertise. On the robot side, it involves injecting domain knowledge that makes learning easier in specific contexts. Her research sits firmly in human-robot interaction, examining the full bidirectional interaction between human and robot. While much of her work focuses on how humans perceive robot learning and behavior, she’s increasingly interested in improving the feedback flowing in both directions—from robot to human and human to robot. This holistic view recognizes that naturalness emerges not from perfecting one side of the interaction, but from designing systems where both agents can effectively communicate and learn from each other in ways that feel intuitive rather than forced.

Hierarchical abstraction in robot learning. Most prior research in learning from demonstrations focuses on single, short tasks that can be evaluated quickly in a lab. But real-world deployment is messier. Tasks in unstructured home environments have many components that must be repeated over time. Consider setting a dinner table: you might normally set for two people, but when guests arrive, you need settings for three or four. The core actions remain the same—placing forks left of plates, filling water glasses—but they repeat for different numbers of people. Hierarchical abstraction is about breaking tasks into reusable subtasks. You can optimize for different goals when teaching. If you just want the robot to set the table for two people right now, you might demonstrate the whole task at once. But if you optimize for long-term efficiency, you break it down so components can be reused later. For table setting, you could teach the complete two-person setup, or teach one person’s setting and repeat it twice, or teach individual actions like “place fork to right of plate” that can be sequenced flexibly. The same principle applies to making omelets or any complex household task. How you break it up influences what the robot can generalize to and how much human involvement is needed when the environment or task changes slightly. Hierarchical learning isn’t just about task efficiency—it’s about creating a library of reusable skills that make robots genuinely adaptable rather than brittle systems that fail when conditions shift even slightly.

Do humans naturally break down tasks? Nina and collaborator Nakul Gopalan (now at ASU) investigated whether people naturally break complex tasks into reusable subtasks when teaching robots. They designed intentionally complex scenarios, like making soil mixtures for plants with different ratios of lime, sand, and manure—tasks requiring nine or more repeated scoops. The deliberately tedious nature was meant to reveal whether people would naturally decompose the task or power through demonstrating everything at once. The surprising finding: people don’t naturally break tasks into effectively reusable subtasks. The researchers evaluated multiple metrics—whether people broke tasks down at all, whether they broke them to the appropriate granularity, and whether different teaching methods for the human demonstrator would encourage better decomposition. This reveals something important often confusing about robot teaching research: you’re teaching the human to teach the robot. There are two layers of learning happening simultaneously. The study compared different teaching modalities to help humans become better robot teachers. Videos showing examples weren’t sufficient when there was too much difference between the example task and the actual teaching scenario. What worked best was showing exactly how to break up the specific task, essentially having demonstrators copy the experimenter’s approach. But this creates a scalability problem—you can’t create personalized demonstrations for every possible task someone might encounter in their home. The challenge becomes bridging what works immediately versus what works long-term, preparing humans to transfer teaching principles across novel situations rather than memorizing task-specific demonstrations.

Designing a good curriculum for robot teaching. Since showing people exactly how to break down each specific task isn’t scalable, Nina’s research explores whether people can accumulate generalizable teaching experience. Can humans learn to become better robot teachers in ways that transfer to completely new tasks they’ve never seen before? Her curriculum design follows a practice-feedback cycle rather than upfront instruction. Instead of showing demonstrators exactly how to teach before they start, the approach lets them try teaching naturally first, then shows them how an expert would do it, then moves on to a different task. This forces demonstrators to extract and apply principles from the comparison between their approach and the expert’s approach. Critically, they don’t get to retry the same task—they must transfer what they learned to the next domain. The research randomized the order of domains to ensure results weren’t artifacts of one particular sequence. The findings were promising: teaching duration decreased over time, and people incorporated more abstractions as they progressed through the curriculum. This suggests that generalizable robot teaching is a skill that can be developed through structured experience. The open question remains: what makes a truly optimal curriculum? How many examples are needed? What diversity of tasks? How should feedback be delivered? Nina notes these are still actively researched questions, but the fundamental principle is clear—humans can learn to be better robot teachers if given appropriate practice opportunities with comparative feedback that highlights effective decomposition strategies.

Goldilocks zone for subtasks. There are two extreme ends of task abstraction that don’t work. At one extreme, you could break down a task into movements of two inches to the right—utterly impractical because it requires precise measurement and creates an unmanageable number of unusable micro-actions. At the other extreme, you demonstrate the entire complex task as one monolithic unit, eliminating any possibility of reuse. Somewhere between these extremes lies a Goldilocks zone of optimal abstraction. Nina’s papers define this based on minimal changes that create symbolic state transitions—essentially, grouping actions together until something fundamentally changes in the world. For example, when reaching for a water bottle, all the movements can be grouped together because they’re functionally the same until you make contact with the bottle. That contact represents a symbolic change in the world state and marks a natural transition point. The optimal granularity likely depends on the specific task. To evaluate demonstration quality, Nina’s team created rubrics that could consistently assess whether people were providing good abstractions. Importantly, these were human-annotated rather than automatically computed, reflecting the current limitations in having robots discover optimal abstractions independently. While machine learning approaches exist for discovering task hierarchies, Nina emphasizes the importance of keeping humans in the loop—not just because possible tasks are unlimited, but because people have preferences for how they want things broken down, especially in care contexts where older adults are aging in place. The key is empowering people to make changes to robot behavior when desired while automating whatever they’re not interested in controlling.

Personality traits in HRI studies. Personality significantly shapes human-robot interaction, though it’s typically evaluated as a post-hoc analysis rather than a recruitment criterion. The field uses the “Big Five” personality traits established in psychology research: openness, extroversion, neuroticism, agreeableness, and conscientiousness. Studies incorporate personality surveys during the research process, then analyze afterward whether correlations exist between personality traits and interaction outcomes. Certain traits repeatedly show impact. Extroversion and agreeableness frequently influence how positively robots are perceived, how detail-oriented people are during interaction, and how open-minded they remain to the technology. But there’s no “ideal” personality type for robot teaching because it depends entirely on what you’re optimizing for. Agreeable people might take longer to teach, but that doesn’t necessarily mean their demonstrations are worse—they might be more deliberate, producing higher quality teaching despite slower pace. The bigger question is calibration. You don’t want people to be universally positive toward robots because uncritical acceptance can lead to dangerous over-reliance. There’s a Goldilocks zone for trust and perception too. Nina found that agreeable people did take longer to teach robots, but the extra time didn’t translate to improved teaching quality. The additional time might reflect people enjoying the process rather than rushing, trying to be agreeable to the experimenter, or moving slowly to avoid damaging the robot. Feeling rushed creates stress, so slower teaching isn’t inherently problematic. Personality traits function as covariates—important to characterize and understand, but not as targets for preferential recruitment or as indicators of teaching success.

Challenges with studying older adults. A persistent tension in HRI research is the gap between who you want to study and who you can actually recruit. Best practice would be regularly evaluating with end users, but there are significant tradeoffs. Older adults—a key population for care robotics—are harder to recruit and schedule than college students who can participate the next day. Some of Nina’s studies run upwards of two and a half hours, which is physically and emotionally demanding for any participant but especially for older adults. Nina’s solution is recruiting from both target populations and student populations. This allows comparison while acknowledging the practical realities of research timelines. But “target population” for elder care isn’t just older adults—it’s actually a network of stakeholders. There are care recipients (people aging in place) and their care network, which divides into formal and informal care providers. Formal providers include doctors, occupational therapists, physical therapists, nurses, and hired in-home care workers with professional training. Informal care providers—who often do the heavy lifting because they’re present more regularly—include family, friends, and spouses. Ideally, you’d design with all stakeholders in mind and recruit representative samples from each group. Nina has worked with both older adults and care providers across her research. One study examined perceived performance of medicine dispensing and breakfast preparation tasks. Care providers rated robot success lower than students for identical behaviors, likely because their professional experience gives them grounded understanding of medication risks—they know the consequences of dispensing wrong medication, which shapes their perception of success. Despite lower perceived success, care providers didn’t intervene more frequently than students, raising questions about whether they felt empowered to stop the robot or whether their lower technology experience created hesitation.

Confounders with in-person studies. Nina’s research during COVID portrays the complex experimental design required when working with vulnerable populations. She wanted to study care providers, but because they work with older adults, bringing them into the lab risked exposure. The solution was a multi-layered comparison study designed to isolate different factors. Care providers participated remotely. To separate the effects of being a care provider versus being remote, the study compared remote care providers to remote students. To isolate the impact of remote versus in-person participation, it compared remote students to in-person students. The findings were counterintuitive. In-person participants had more favorable perceptions of robot safety and usability compared to remote participants. This surprised Nina because the in-person condition involved watching a robot cut a banana with a real, sharp knife from eight feet away. She expected physical presence with a knife-wielding robot might make the interaction feel riskier and more frightening. Instead, perceptions were much more positive in person. This has interesting implications for research conducted primarily virtually during COVID—it suggests findings might actually be more positive once robots are deployed in person with users. One critical question this raises is whether physical presence creates appropriate trust calibration or dangerous over-reliance. If people don’t intervene when they should because they’re comfortable with the robot’s presence, that’s problematic. Nina speculates the experimenter’s physical presence may also be a confounding factor—participants might assume the robot can’t go too wrong because smart people designed it and a researcher is supervising. Rerunning all studies in people’s homes without experimenters present would reveal what holds, what changes, and what becomes stronger. But such studies face enormous practical, financial, and safety challenges.

Designing effective experiments in HRI. Designing human-robot interaction studies is complex enough that entire courses are dedicated to it. Nina’s approach follows the scientific method: start by identifying what you’re interested in measuring and formulating clear research questions. Then determine the minimum interaction with humans needed to representatively capture what you’re trying to measure. Studies easily become unwieldy. You need to introduce participants to relevant concepts, ensure they understand what’s required, then conduct the actual interaction. Being cognizant of study length and its impact on participant perception is crucial. When researchers get greedy—wanting to measure many things simultaneously—study design suffers. A key decision is whether to use within-subjects design (same person experiences all conditions) or between-subjects design (different people experience different conditions). Each has appropriate use cases and affects statistical analysis differently. The IRB (Institutional Review Board) protocol process is actually helpful for design quality. Writing up everything you’ll tell participants, every metric you’ll collect, every survey you’ll administer creates consistency across the many hours of repetitive study runs. Running user studies is a unique experience—you execute the same process over and over, requiring patience and discipline. Yet despite following identical protocols, every session is fascinatingly different because people are incredibly diverse. The same instructions are perceived in widely varying ways. This diversity is what makes HRI research compelling but also why careful experimental design is essential. Without rigorous structure, the natural variation in human behavior creates uncontrollable noise that obscures the factors you’re actually trying to measure.

Scoping research questions. The challenge of scoping research questions isn’t unique to HRI—it’s the universal plight of researchers. Interactions with end users are especially rich with potential questions. When you have precious time with participants, it’s tempting to think “I could collect so many more surveys and do so much more analysis” beyond just the core measurements. Nina’s approach involves two strategies. First, she writes early in the research process. Rather than following the traditional sequence of idea generation, execution, then writing, she drafts the paper introduction and methods while still developing the study. This forces her to check whether each decision in the user study is supported by literature or whether she’s just taking shortcuts. Writing early identifies gaps and holes before running expensive studies. Second, she talks to lots of people. While she jokes that she hates elevator pitches, she acknowledges that explaining ideas briefly forces synthesis of what’s most interesting about a study, which helps focus on just that core element. Research in isolation isn’t her style—she’s impressed by single-author publications but notes that different fields have different norms. For her, the best part of research is connection with other people who bring diverse ideas and create surprising insights. Nina came to her PhD convinced she’d work on foster care algorithms, having discussed network analysis of California foster care data with Professor Redette Abibi (now at Berkeley). But when her now-advisor Matthew Gombolay announced a student needed help running a study, she tried it and was hooked. She’s a believer in not knowing everything at the start—interests evolve. She’s also a believer in enjoying the day-to-day process. Running studies, designing experiments, and reading papers felt genuinely enjoyable, which sustained her through the PhD journey.

Self-reflection and interface design. Nina hasn’t evaluated whether verbalizing self-explanations makes people better robot teachers, though part of her suspects reflection must help. However, she’s not convinced verbalization is necessary or always optimal. Literature shows people often struggle to explain what they want robots to do—showing is frequently much easier than telling, especially for physical manipulation tasks. But showing relies on a significant caveat: you must physically be able to show. If the task is impossible to demonstrate, or if you have mobility limitations or aren’t strong enough to lift objects, showing becomes impractical or impossible. This raises questions about whether interfaces that force people to break down and label demonstrations actually make their teaching better. Nina’s studies use interfaces that require participants to label every demonstration with a name, which forces summarization of “what am I doing” and displays all labeled subtasks together in one place. This creates both verbal and demonstrative components that might influence the teaching process. But she’s careful not to claim her interface design definitively makes teaching better—there’s enormous room for improvement, and interface design is an entire field unto itself. What the interface does provide is pauses in the teaching process for reflection and removal of the experimenter from constant involvement, which could enable home deployment without researchers present. The interface includes three phases: planning (listing subtasks before demonstrating anything), teaching (physically demonstrating each piece), and composition (chaining subtasks and specifying how they should be used to accomplish full tasks). Each phase needs its own page, and navigating between phases while maintaining clear understanding of “where am I in the teaching process” is a core design challenge.

Characterizing good interfaces. When asked what features make interfaces easy for humans to teach robots, Nina clarifies an important distinction: the interface is a middleman between demonstration collection and data packaging. It doesn’t substantially influence the robot learning process as long as you’re saving the right data. But from the human side, interface design is critically important. Her ideal interface streamlines the three teaching phases—planning, demonstrating, and composing—making it intuitive where users should be at each point in the process and how to evaluate what they’ve done so far. Her research has focused on this in interface design, but critically, she hasn’t compared it against prior work. And there isn’t really a standard interface for learning from demonstration that everyone uses. She’s considering publishing their next interface iteration so there’s something the community can build on and improve. Measuring interface quality is inherently subjective. The gold standard would be having the same person use multiple interfaces for the same or very similar tasks, then evaluating both subjective perceptions (how they felt using it) and objective behaviors. Objective metrics include teaching duration, number of questions asked, number of undo/redo actions, and response to interface feedback. When participants request evaluation and receive feedback, does it prompt changes? What type of changes? What was the feedback indicating? Nina’s studies log every button press, capturing ordering and interface usage patterns for analysis. The question of whether one interface can work for all robot morphologies is increasingly relevant as Nina’s lab shifts from single-arm robots to bimanual (two-armed) systems. The shift may require interface changes, depending on demonstration methods. If both arms move concurrently using a dual-arm rig, that’s one design. If participants freeze one arm, move it, then move the other sequentially, the interface should make that clear. But fundamentally, she believes a shared backbone interface could work across morphologies because the recording process can be separate from interface design—just press record and capture joint movements and camera feeds independently.

Challenges in longitudinal studies. Lab studies last a few hours at most, sometimes with repeat visits. But longitudinal deployment—actually placing a robot in a home and leaving it—is far less common because it’s costly and complex. Definitions of “longitudinal” vary: some researchers mean a week, some mean months, and some mean a year or more. Literature shows that interaction patterns depend on deployment duration. There are distinct phases people go through when adopting technology. Initially, users don’t know how to use the system. Then they show it off to friends and visitors. Gradually it becomes incorporated into daily routines, both socially and operationally. Because deployment length has measurable impact, more studies should evaluate how it affects their independent variables. Nina emphasizes that longitudinal studies are already happening, particularly with simpler robots like Roombas. Other robots have been deployed longitudinally too, though their functionality might not include wielding kitchen knives. It’s not as far off as it seems, but it’s definitely task-dependent. Longitudinal deployment also changes study scale dramatically—you might run just five homes for many months, involving all family members, rather than dozens of participants for short sessions. Research is often driven by what conferences and journals value. Whether the field prioritizes fewer participants with longitudinal deployment versus higher volume in-person studies depends on the research question. There are also practical constraints: Can the lab operate without that robot for months? Not every lab has multiple identical robots available for parallel studies. Longitudinal deployment requires investment in resources, regular functionality checks to ensure the robot hasn’t been abandoned in a corner, and careful study design around check-in frequency that doesn’t disturb natural use patterns. Nina looks forward to running longitudinal studies and discovering how study design must adapt for this very different research paradigm.

Personal journey in HRI research. Nina is a believer in not necessarily knowing everything at the start. When she began her PhD, she was convinced she’d write algorithms for foster care, working with data from California to improve network outcomes. But when her now-advisor Matthew Gombolay put out an announcement that a student needed help running a study, she thought “what the heck, that sounds interesting”—it was like something from movies where people come to labs and get their behavior studied. She helped with that study and was hooked. She’s also a big believer in enjoying the process. There’s high-level motivation—do I value this work, is it important for the world—and there’s day-to-day experience: am I enjoying what I’m doing? For Nina, every day in HRI felt rewarding. Running studies, designing experiments, reading papers—the daily work itself was enjoyable in a way that’s difficult to find in most jobs. That sustained her through the PhD. Looking ahead to finishing soon, she hopes to go into academia where responsibilities will shift toward mentorship. Both her parents are teachers, so there’s strong emphasis in her family on teaching and helping younger generations. She’s excited about the collaborative aspects—working with students, other labs, other schools. But she acknowledges the transition from PhD to professorship feels like a dramatic jump. Responsibilities change substantially, and many people feel unprepared for aspects like grant writing or leading PhD teams that they had limited exposure to as students.

Her hoped-for impact centers on running studies in people’s actual habitats, evaluating whether systems work in the real world over long periods with real users. She wants to improve systems so people can age in place with less loneliness and more physical and cognitive support, alleviating burnout on care providers. And her motivation extends beyond aging in place—she’s driven by meeting people where they’re at in terms of their understanding of robots and abilities with tasks. That philosophy of starting from user capabilities rather than imposing technical requirements is what makes Nina’s research both rigorous and deeply human-centered.

On the go? There’s an audio-only version too. Click here.

Nina Moorman is a PhD student at the CORE Robotics Lab at Georgia Tech, advised by professor Matthew Gombolay. Her research interests are in human-interactive robot learning and care robotics. She develops algorithms that enable robots to learn in-situ, from non-expert user demonstrators.