Initial Clarissa test on ISS: summary
of results
Clarissa was first
used by Expedition 11 Science Officer and Flight Engineer John Phillips
on June 27, 2005. To the best of our knowledge, this is the first ever
use of a spoken dialogue system in space. During the test, Phillips
completed the interactive Clarissa training procedure, which exercises
all the main system functionality; this procedure contains 50 steps,
and took 25 minutes to complete. Table 1 summarises performance per
step:
Condition
|
#steps
|
No problems
|
45
|
Bad recognition due to
background speech
|
4
|
Bad recognition due to
misunderstandings about command syntax
|
2
|
Total steps
|
50
|
Table
1: Performance per step in training procedure.
One step had problems both with
background speech
and misunderstandings about command
syntax.
Of the 50 procedure steps, 45 were completed without incident. In four
steps, Clarissa suffered from speech recognition problems, apparently
due to the fact that fellow crew-member Sergei Krikalev was talking
near to the microphone. In all but one of these steps, the system
simply failed to respond, and Phillips was able to correct the problem
by repeating himself. In two steps, the training procedure was
insufficiently clear about explaining the correct command syntax, and
Phillips attempted to phrase requests in ways not acceptable to the
recogniser. In the first of these steps (entry of numbers into a
table), Phillips quickly ascertained that the recogniser did not permit
negative numbers, and completed the step. In the second (setting an
alarm), Phillips was unable to find the correct syntax to define the
alarm time. This was the only step that was not completed successfully.
While Phillips was navigating the training procedure, the recogniser
recorded 113 separate audio files: most of these contained spoken
commands, but some were just background noise. Table 2 breaks down
performance by files:
Condition
|
#files |
Recognised exactly
|
84
|
At least one word different, but
correctly understood
|
4
|
Non-command, correctly ignored
|
11
|
Appropriate
responses
|
99
|
No recognition
|
9
|
Incorrect recognition
|
5
|
Inappropriate
responses
|
14
|
| Total
responses |
113
|
Table
2: Performance per audio file in training procedure.
99 of the 113 files produced appropriate responses. In 84 cases, the
file contained a command, and all the words were recognised correctly.
In another four cases, the file was again a command, at least one word
was misrecognised, but the system still understood and responded
correctly. In 11 cases, the file contained non-command content (usually
background
noise), which was correctly ignored.
14 files produced inappropriate responses. In nine cases, the system
failed to respond at all to a command, and in another five it responded
incorrectly. The reasons for these problems are described above.
Both Phillips and the Clarissa team considered that the system
performed very creditably during its first test. We hope to carry out a
second test later in Expedition 11, using a real water sampling
procedure.