PrISM-Q&A: Step-Aware Voice Assistant on a Smartwatch Enabled by Multimodal Procedure Tracking and Large Language Models
被引:0
|
作者:
论文数: 引用数:
h-index:
机构:
Arakawa, Riku
[1
]
Lehman, Jill Fain
论文数: 0引用数: 0
h-index: 0
机构:
Carnegie Mellon Univ, Pittsburgh, PA 15213 USACarnegie Mellon Univ, Pittsburgh, PA 15213 USA
Lehman, Jill Fain
[1
]
论文数: 引用数:
h-index:
机构:
Goel, Mayank
[1
]
机构:
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
来源:
PROCEEDINGS OF THE ACM ON INTERACTIVE MOBILE WEARABLE AND UBIQUITOUS TECHNOLOGIES-IMWUT
|
2024年
/
8卷
/
04期
基金:
美国安德鲁·梅隆基金会;
关键词:
context-aware;
procedure tracking;
task assistance;
large language models;
question answering;
D O I:
10.1145/3699759
中图分类号:
TP [自动化技术、计算机技术];
学科分类号:
0812 ;
摘要:
Voice assistants capable of answering user queries during various physical tasks have shown promise in guiding users through complex procedures. However, users often find it challenging to articulate their queries precisely, especially when unfamiliar with the specific terminologies required for machine-oriented tasks. We introduce PrISM-Q&A, a novel question- answering (Q&A) interaction termed step-aware Q&A, which enhances the functionality of voice assistants on smartwatches by incorporating Human Activity Recognition (HAR) and providing the system with user context. It continuously monitors user behavior during procedural tasks via audio and motion sensors on the watch and estimates which step the user is performing. When a question is posed, this contextual information is supplied to Large Language Models (LLMs) as part of the context used to generate a response, even in the case of inherently vague questions like "What should I do next with this?" Our studies confirmed that users preferred the convenience of our approach compared to existing voice assistants. Our real-time assistant represents the first Q&A system that provides contextually situated support during tasks without camera use, paving the way for the ubiquitous, intelligent assistant.
机构:
Peking Univ, Sch Comp Sci, Natl Key Lab Multimedia Informat Proc, Beijing, Peoples R China
Microsoft Corp, Redmond, WA 98052 USA
Microsoft Res Asia, Beijing, Peoples R ChinaPeking Univ, Sch Comp Sci, Natl Key Lab Multimedia Informat Proc, Beijing, Peoples R China
Li, Yifei
Lin, Zeqi
论文数: 0引用数: 0
h-index: 0
机构:
Microsoft Corp, Redmond, WA 98052 USAPeking Univ, Sch Comp Sci, Natl Key Lab Multimedia Informat Proc, Beijing, Peoples R China
Lin, Zeqi
Zhang, Shizhuo
论文数: 0引用数: 0
h-index: 0
机构:
Microsoft Corp, Redmond, WA 98052 USAPeking Univ, Sch Comp Sci, Natl Key Lab Multimedia Informat Proc, Beijing, Peoples R China
Zhang, Shizhuo
Fu, Qiang
论文数: 0引用数: 0
h-index: 0
机构:
Microsoft Corp, Redmond, WA 98052 USAPeking Univ, Sch Comp Sci, Natl Key Lab Multimedia Informat Proc, Beijing, Peoples R China
Fu, Qiang
Chen, Bei
论文数: 0引用数: 0
h-index: 0
机构:
Microsoft Corp, Redmond, WA 98052 USAPeking Univ, Sch Comp Sci, Natl Key Lab Multimedia Informat Proc, Beijing, Peoples R China
Chen, Bei
Lou, Jian-Guang
论文数: 0引用数: 0
h-index: 0
机构:
Microsoft Corp, Redmond, WA 98052 USAPeking Univ, Sch Comp Sci, Natl Key Lab Multimedia Informat Proc, Beijing, Peoples R China
Lou, Jian-Guang
Chen, Weizhu
论文数: 0引用数: 0
h-index: 0
机构:
Microsoft Corp, Redmond, WA 98052 USAPeking Univ, Sch Comp Sci, Natl Key Lab Multimedia Informat Proc, Beijing, Peoples R China
Chen, Weizhu
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1,
2023,
: 5315
-
5333