Classifying document categories based on physiological measures of analyst responses

Abstract

Improvements in the collection and analysis of physiological signals has increased the potential for computer systems to assist human analysts in various workplace tasks. We have constructed a data set of documents with three main categories of documents, being related to national security, natural disasters and computer science, ranging from stressful to non-stressful. We include some documents which contain more than one of these categories and some which contain none of these categories. The document collection is designed to mimic the range of documents an intelligence analyst would need to read quickly and categorize in the few days after the seizure of computers from suspects in a national security investigation. Our participants were university students, primarily our own computer science students, hence the inclusion of the computer science category. We found that on our dataset our participants were 79% correct on average, which we could replicate with 88% accuracy, that is, by a 70% correctness on the underlying task. The worst results by our participants was on the computer science task which was surprising, but this did not reduce the performance of our replicating the results using AI techniques.

Topics

12 Figures and Tables

Download Full PDF Version (Non-Commercial Use)