Research – UX for AI

🤖 人机对决 👥

用户体验研究专家Jeff Sauro、Will Schiavone和James R. Lewis开展了一项开创性对比实验：将真实用户的卡片分类结果与ChatGPT对相同信息的归类能力进行较量。

（若对卡片分类概念不熟悉，尼尔森诺曼集团提供权威解读：https://lnkd.in/efYpzm22）

🔎 实验设计

MeasuringU团队要求ChatGPT对百思买商品进行分类，并将其结果与人类参与者通过卡片分类得出的方案进行对比。

📄 核心发现

效果惊艳！

分类数量与命名呈现高度相似性
ChatGPT未出现明显错误归类

🧐 深度解读

当前场景下ChatGPT的优异表现合乎逻辑

处理对象为大众消费品分类
符合”遵循通用标准”原则（如将冰箱归为”厨房电器”而非创造”制冷设备”等生僻类别🥴）
基于海量网络数据训练的大型语言模型（LLMs），天然擅长捕捉跨网站的标准化分类模式
该优势同样适用于UX文案的标准术语选择（如”登录”与”登入”的取舍）

⚠️ 重要局限

正如MeasuringU团队指出

在专业领域表现可能欠佳

这正是卡片分类的真正价值所在——当为火箭科学家设计专用系统时（当然，我们多数人并非航天专家😉），必须通过真实用户的卡片分类来捕捉特定群体的心智模型。

（关于心智模型的深度解析：https://lnkd.in/efPRB6bE）

启示：AI可作为强大的研究辅助工具，但在专业领域仍需人类洞察力保驾护航。人机协同才是用户体验研究的未来之道！🚀

研究全文

ChatGPT与卡片分类结果的对比分析

虽然ChatGPT不太可能取代用户体验研究员的职位，但擅长使用ChatGPT等AI工具的研究人员或许会成为潜在竞争者。在先前的研究中，我们探索了如何利用ChatGPT-4提升研究人员对开放式评论的分类能力，发现其结果虽存在缺陷，但与人类表现相当。本文将继续探讨AI在其他用户体验研究方法中的应用潜力。

一、卡片分类方法概述

卡片分类作为流行的用户体验研究方法，常用于重构网站或软件的导航结构（信息架构）。传统方法分为封闭式（研究者指定分组）和开放式（参与者自主分组）。现代开放式卡片分类通过专业软件实现，提供定量（相似性矩阵和树状图）与定性（分类列表及命名依据）双重输出。但由于参与者背景差异，结果具有非确定性，需要研究者进行专业判断。

二、实验设计与实施

本研究以Best Buy网站40项商品为材料，对比623名参与者的卡片分类数据（聚焦200份纯文本项结果）与ChatGPT-4的分类表现。向ChatGPT随机呈现相同商品列表，要求其自主分组命名（完整提示见附录A）。

三、核心研究发现

Synthesized from Most Common User Group Names	ChatGPT Category Names
Kitchen Appliances	Kitchen Appliances
Home Appliances	Household Utilities
Bathroom & Personal Care	Personal Gadgets & Accessories
Electronics & Entertainment	Home Electronics & Entertainment
Computer & Electronics	Office & Computing Accessories

Category Match	Item in 5th Group	Participant Category Name 1	Participant Category Name 2	ChatGPT Category
1	Flash drive	Electronics	Computer	Office & Computing Accessories
0	Laptop	Electronics	Computer	Personal Gadgets & Accessories
1	Keyboard	Electronics	Computer	Office & Computing Accessories
0	Headphones	Electronics	Entertainment	Personal Gadgets & Accessories
1	Mouse	Electronics	Office	Office & Computing Accessories
0	Tablet	Electronics	Technology	Personal Gadgets & Accessories
1	Printer	Electronics	Computer	Office & Computing Accessories
0	Cell phone	Electronics	Technology	Personal Gadgets & Accessories
0	Fitness Tracker	Electronics	Personal Care	Personal Gadgets & Accessories

分类数量一致性
ChatGPT生成5个类别，与25%参与者选择的主流分类数量一致（图2显示5类为最常见分组模式）。
命名合理性
ChatGPT的命名（如”厨房电器”、”家庭电子与娱乐”）与研究团队综合参与者数据得出的命名高度吻合（表1对比显示命名策略相似性）。
项目匹配度
总体匹配率达68%（25/40项），显著高于随机预期的21%。组间匹配率存在差异：厨房电器类达90%，而计算机配件类仅44%（表2详述第五组对比情况）。
分类可靠性
三次重复实验平均卡帕系数为0.60，显示中度至高度一致性（Landis & Koch标准）。AI分类未出现明显错误（如将”冰箱”归入办公用品类）。

四、应用价值与局限

优势体现：

快速生成信息架构初稿
零人力成本获取分类方案
反映基于海量文本的”心智模型”
应用局限：
特殊领域项目分类效果待验证
需充足数据训练定制模型
目前无法完全替代人类判断

五、未来研究方向

建议探索：

与树测试结果对比
专业领域分类表现评估
混合式研究流程设计

附录A：ChatGPT提示示例

Prompt 1

Prompt (We asked ChatGPT):

“I want you to take the perspective of a shopper on a consumer website. You should follow my instructions as if you were fulfilling this role. For context, I’m seeing how you would complete this task, which is a card sort task.

“Consider the following list of items and how you would best group them. Create as many groups as you deem appropriate, and give them descriptive names. Then, give me the groups and how you would categorize each item.”

[list of 40 items from Figure 1a dendrogram]

ChatGPT (Responded):

“For this card sort task, I would create groups based on the primary function or usage of each item. Here are the groups I propose along with the categorization of each item:”

[grouped item list shown in Figure 1b]

“Each item is placed in a group where its primary use or functionality aligns best. For example, the ‘Kitchen Appliances’ group contains items predominantly used in the kitchen for cooking or food preparation. ‘Home Electronics & Entertainment’ includes items related to household comfort and entertainment. ‘Personal Gadgets & Accessories’ are items used on a personal level, often portable. ‘Office & Computing Accessories’ are items commonly used in conjunction with computers or in office settings. Finally, ‘Household Utilities’ includes larger home appliances and items for general household use.”

Prompt 2

Prompt (We asked ChatGPT):

“Consider the following list of items and how you would best group them. Create between 3 to 5 groups as you deem appropriate and give them descriptive names. Then, give me the groups and how you would categorize each item.”

[list of 40 items from Figure 1a dendrogram]

ChatGPT (Responded):

“To organize these items into groups, I will consider their primary function and usage context. Here’s how I would categorize them into five groups:”

[list of items placed in groups]

“This categorization considers the primary use of each item, grouping them into intuitive and contextually relevant categories.”

Prompt 3

Prompt (We asked ChatGPT):

“Group the following items into categories.”

[list of 40 items from Figure 1a dendrogram]

ChatGPT (Responded):

“The items you’ve listed can be categorized based on their primary use or function. Here’s a possible grouping:”

[list of items placed in groups]

“This categorization is based on the primary function of each item. Some items might fit into multiple categories depending on their features or how they are used.”

附录B：卡帕系数计算

通过概率校正公式k=(po-pe)/(1-pe)，计算得期望随机匹配率21%，实际观测值63%，卡帕系数0.53（详见表格化计算过程）。

Group	Size	p_e	n_e	n_o	p_o
A	10	0.25	2.5	9	0.90
B	9	0.23	2.025	4	0.44
C	5	0.13	0.625	5	1.00
D	7	0.18	1.225	3	0.43
E	9	0.23	2.025	4	0.44

结论

ChatGPT展现出与人类相当的分类能力，可作为用户体验研究的有效辅助工具。其快速生成合理分类方案的特点，为信息架构设计提供了高效预研手段，但需结合专业判断与补充研究。随着AI技术发展，人机协作的研究模式或将重塑用户体验研究范式。