Analyzing Before Solving: Which Parameters Influence Low-Level Surgical Activity Recognition

Similar documents
Neural Network Model of the Backpropagation Algorithm

An Effiecient Approach for Resource Auto-Scaling in Cloud Environments

Fast Multi-task Learning for Query Spelling Correction

MyLab & Mastering Business

1 Language universals

More Accurate Question Answering on Freebase

Information Propagation for informing Special Population Subgroups about New Ground Transportation Services at Airports

Channel Mapping using Bidirectional Long Short-Term Memory for Dereverberation in Hands-Free Voice Controlled Devices

In Workflow. Viewing: Last edit: 10/27/15 1:51 pm. Approval Path. Date Submi ed: 10/09/15 2:47 pm. 6. Coordinator Curriculum Management

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

ACTIVITY: Comparing Combination Locks

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

NCEO Technical Report 27

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

On-Line Data Analytics

Rule Learning with Negation: Issues Regarding Effectiveness

Lecture 1: Machine Learning Basics

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

INTERNAL MEDICINE IN-TRAINING EXAMINATION (IM-ITE SM )

Using EEG to Improve Massive Open Online Courses Feedback Interaction

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

Rule Learning With Negation: Issues Regarding Effectiveness

Rote rehearsal and spacing effects in the free recall of pure and mixed lists. By: Peter P.J.L. Verkoeijen and Peter F. Delaney

A Case Study: News Classification Based on Term Frequency

Stacks Teacher notes. Activity description. Suitability. Time. AMP resources. Equipment. Key mathematical language. Key processes

School Size and the Quality of Teaching and Learning

Language Acquisition Chart

CS Machine Learning

Running head: DELAY AND PROSPECTIVE MEMORY 1

WHEN THERE IS A mismatch between the acoustic

Level 1 Mathematics and Statistics, 2015

INPE São José dos Campos

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Data Fusion Models in WSNs: Comparison and Analysis

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Mining Association Rules in Student s Assessment Data

Different Requirements Gathering Techniques and Issues. Javaria Mushtaq

Human Factors Computer Based Training in Air Traffic Control

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Artificial Neural Networks written examination

Speech Recognition at ICSI: Broadcast News and beyond

Case study Norway case 1

Linking Task: Identifying authors and book titles in verbose queries

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

The Good Judgment Project: A large scale test of different methods of combining expert predictions

BENCHMARK TREND COMPARISON REPORT:

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

A Game-based Assessment of Children s Choices to Seek Feedback and to Revise

CONCEPT MAPS AS A DEVICE FOR LEARNING DATABASE CONCEPTS

THE INFORMATION SYSTEMS ANALYST EXAM AS A PROGRAM ASSESSMENT TOOL: PRE-POST TESTS AND COMPARISON TO THE MAJOR FIELD TEST

A Reinforcement Learning Variant for Control Scheduling

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

Running head: THE INTERACTIVITY EFFECT IN MULTIMEDIA LEARNING 1

E mail: Phone: LIBRARY MBA MAIN OFFICE

University of Waterloo School of Accountancy. AFM 102: Introductory Management Accounting. Fall Term 2004: Section 4

Active Learning. Yingyu Liang Computer Sciences 760 Fall

The Impact of Honors Programs on Undergraduate Academic Performance, Retention, and Graduation

DegreeWorks Advisor Reference Guide

(Sub)Gradient Descent

The Art and Science of Predicting Enrollment

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Conceptual and Procedural Knowledge of a Mathematics Problem: Their Measurement and Their Causal Interrelations

TEAM NEWSLETTER. Welton Primar y School SENIOR LEADERSHIP TEAM. School Improvement

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

HEROIC IMAGINATION PROJECT. A new way of looking at heroism

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Evidence for Reliability, Validity and Learning Effectiveness

Politics and Society Curriculum Specification

Constructing Parallel Corpus from Movie Subtitles

Evolutive Neural Net Fuzzy Filtering: Basic Description

Learning From the Past with Experiment Databases

Guidelines for Writing an Internship Report

Human Emotion Recognition From Speech

Monitoring Metacognitive abilities in children: A comparison of children between the ages of 5 to 7 years and 8 to 11 years

Python Machine Learning

On the Combined Behavior of Autonomous Resource Management Agents

A. True B. False INVENTORY OF PROCESSES IN COLLEGE COMPOSITION

Integrating simulation into the engineering curriculum: a case study

A Study of Metacognitive Awareness of Non-English Majors in L2 Listening

Mathematical Misconceptions -- Can We Eliminate Them? Phi lip Swedosh and John Clark The University of Melbourne. Introduction

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS

The Political Engagement Activity Student Guide

San José State University Department of Psychology PSYC , Human Learning, Spring 2017

E C C. American Heart Association. Basic Life Support Instructor Course. Updated Written Exams. February 2016

Reinforcement Learning by Comparing Immediate Reward

GCE. Mathematics (MEI) Mark Scheme for June Advanced Subsidiary GCE Unit 4766: Statistics 1. Oxford Cambridge and RSA Examinations

Using focal point learning to improve human machine tacit coordination

An Introduction to Simio for Beginners

Surgical Residency Program & Director KEN N KUO MD, FACS

PREPARING FOR THE SITE VISIT IN YOUR FUTURE

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Software Maintenance

Improving Conceptual Understanding of Physics with Technology

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Peaceful School Bus Program

PHILOSOPHY & CULTURE Syllabus

Pedagogical Content Knowledge for Teaching Primary Mathematics: A Case Study of Two Teachers

Transcription:

1 Analyzing Before olving: Which Parameers nfluence ow-evel urgical Aciviy ecogniion Olga Dergachyova, Xavier Morandi, Pierre Jannin arxiv:1711.06259v1 [cs.hc] 15 Nov 2017 Absrac Auomaic low-level surgical aciviy recogniion is oday well-known echnical boleneck for smar and siuaionaware assisance for he operaing room of he fuure. Our sudy sough o discover which sensors and signals could faciliae his recogniion. ow-level surgical aciviy represens semanic informaion abou a surgical procedure ha is usually expressed by he following elemens: an acion verb, surgical insrumen, and operaed anaomical srucure. We hypohesized ha aciviy recogniion does no require sensors for all hree elemens. We conduced a large-scale sudy using deep learning on semanic daa from 154 operaions from four differen surgeries. The resuls demonsraed ha he insrumen and verb encode similar informaion, meaning only one needs o be racked, preferably he insrumen. The anaomical srucure, however, provides some unique cues, and i is hus crucial o recognize i. For all he sudied surgeries, a combinaion of wo elemens, always including he srucure, proved sufficien o confidenly recognize he aciviies. We also found ha in he presence of noise, combining he informaion abou he insrumen, srucure, and hisorical conex produced beer resuls han a simple composiion of all hree elemens. everal relevan observaions abou surgical pracices were also made in his paper. uch findings provide cues for designing a new generaion of operaing rooms. ndex Terms ow-level surgical aciviy recogniion, surgical process model, semanic analysis, O sensors, deep learning, TM. NTODUCTON Today, an overwhelming flood of new echnologies and equipmen hreaens o overrun operaing rooms, adding even more complexiy o he surgical workflow. A large amoun of research is focused on smar and siuaion-aware inraoperaive assisance o help alleviae he surgeon s sress and faciliae procedures. Auomaic recogniion of surgical processes represens a subsanial par of his effor. The surgical process can be described in differen hierarchical semanic levels: phases, seps and aciviies [1]. A surgical phase, he highes granulariy level, is a procedure period wih several seps and ineracions beween he surgeon and surgical saff. A surgical sep is defined as a sequence of aciviies wih a specific surgical objecive. A surgical aciviy, he lowes semanic level, is a physical aciviy of he surgeon consising of he performed acion also called verb), he surgical insrumen used, and an anaomical srucure on which he acion is performed. Many research groups have sudied he recogniion of high-level surgical phases [2], [3], [4], [5] and Olga Dergachyova, Xavier Morandi and Pierre Jannin are affiliaed wih NEM, U1099, ennes, France and Universié de ennes 1, T, ennes, France email: pierre.jannin@univ-rennes1.fr) Xavier Morandi is affiliaed wih ervice de Neurochirurgie, CHU ennes, ennes, France. seps [6], [7]. A large amoun of research has also been devoed o how o recognize surgical gesures [8], [9], [10], which offer much lower granulariy ye no semanics, from raining sessions e.g., on JGAW or MTC daases). Ye, only a few works exis sudying he auomaic recogniion of lowlevel semanic aciviies from real complex clinical procedures [11], [12]. This auomaic recogniion is no only useful for in-deph siuaion awareness, analyzing semanic aciviies also enables beer undersanding, learning, and eaching of surgical procedures [13]. urgical skills can be objecively evaluaed based on a sequence of performed acions [14], [15]. everal oher applicaions include deecion of deviaions from a sandard procedure flow [16], [17], accurae esimaion of remaining ime, and resource managemen [18]. Due o he lack of auomaic recogniion, mos applicaions use manually annoaed surgical aciviies, which is a erribly edious and ime-consuming process. However, he auomaic recogniion of low-level aciviies is an exremely challenging ask. Unlike phases and seps, aciviies are of shorer duraion seconds vs. minues) and higher diversiy in erms of number hundreds of disinc iems vs. dozen), execuion order grea muliude of possible pahs vs. simple sequencing) and surgeon/pracice-specific characerisics. To faciliae he recogniion process, he approaches proposed in he lieraure break down he aciviy ino is meaningful elemens e.g., verb, insrumen, and srucure) hen proceed wih one-byone deecion. The aciviy is hen deduced from one or a combinaion of elemens. The elemens o be deeced are chosen depending solely on available signals, wihou any analysis of heir relevance. The insrumen is ofen considered a good indicaor of he on-going ask [19], [16], [18], even hough i has been shown o have muliple funcions ha vary depending on he siuaion and surgeon [20]. The verb, which provides perinen informaion abou he aciviy conex, is difficul o recognize due o a high variabiliy of acion execuion [12], and ofen requires addiional sensors. The anaomical srucure, on he oher hand, can be recognized from usually available image-based signals [11], wihou exra sensors needing o be brough o he operaing room. However, no sudy as ye exiss jusifying he choice of elemens o deec. n his paper, we propose o approach he problem from he opposie direcion. Assuming informaion on all hree elemens is available, we assess heir impac on he performance of lowlevel aciviy recogniion wih he aim of defining a minimum se of required sensors and signals. This is he firs largescale muli-sie sudy of elemens imporance o aciviy recogniion, conduced on complex clinical daa comprising

2 four differen ypes of surgery. This work s unique conribuion consiss in is original approach o informaion analysis for he opimizaion of operaing room sensors, as well as is sudy resuls. A. Clinical daa. METHOD n order o assess he impac of aforemenioned elemens on aciviy recogniion, we analyzed four differen surgical procedures performed by junior and senior surgeons: anerior cervical discecomy and fusion ACDF) [15], lumbar disc herniaion DH) [14], piuiary adenoma PA) [21], and caarac surgery C) [11]. The firs hree neurosurgical procedures were sudied via daa colleced from wo universiy hospials: ennes France) and eipzig Germany). Caarac surgery daa was aken from he universiy hospial of Munich Germany). A oal of 154 inervenions were sudied represening inraand iner-domain diversiy, variey of pracices, and a range of skill levels. Table conains addiional informaion abou each of he seven daases. The daa consis of manual annoaions of he surgical process, i.e., phases and aciviies. The neurosurgeries were annoaed by he same senior surgeon in real ime. The caarac surgeries were annoaed by he same PhD suden based on video recordings. Boh annoaors had previously been carefully rained o use he annoaion sofware. The firs five annoaions for each procedure were considered as ess and no aken ino accoun. All annoaions were carefully reviewed aferwards using he same sofware. The phase annoaion is a 3-uple conaining is name, sar and end ime, e.g., discecomy, sar, end ). The aciviy annoaion is represened by a 7-uple consising of acor, body par, verb, insrumen, srucure, and sar and end imes, e.g., surgeon, lef hand, hold, classic forceps, muscle, sar, end ). B. emanic analysis Having informaion abou each surgeon s hands simulaneously is relevan for a more complee undersanding of he siuaion. We hus modified he definiion of a low-level aciviy in his sudy o include six iems: verb, insrumen, and srucure defined for boh lef and righ hands of he surgeon a he same ime. An example of an ACDF aciviy could be hold, forceps, disc, cu, scalpel, ligamen). A paricular value of an elemen, e.g., cu for he verb, is called an insance. Using he sar and end imesamps, each annoaed procedure was hen ransformed ino a emporally ordered sequence of 6-uple aciviies. n order o assess he impac of each semanic elemen on he overall recogniion process we have used he following scheme. f we wan o know how well a whole aciviy can be recognized solely knowing he used insrumens, we can apply a 010010 mask o he aciviy uple, which gives us unknown, forceps, unknown, unknown, scalpel, unknown) for he example above. The elemen is considered known for boh lef and righ hands, as in pracice he same ype of sensor is needed o recognize boh. We also ake ino consideraion a emporal conex, meaning n aciviies having aken place before. The same mask is also applied o he n previous aciviies, ensuring ha only available informaion is involved in he analysis. The problem hen consiss of mapping a sequence of parially hidden uples o a full uple i.e., masked wih 111111 ) of a curren aciviy. This problem is resolved using a deep neural nework, described in he nex subsecion. This is performed for all oher configuraions, meaning he solely known elemens, as well as heir combinaions. Finally, he neural nework model of each configuraion is esed on he aciviy recogniion ask, and heir performances are compared, as described in ecion -D. The relevance of he emporal conex has previously been discussed in [2], [18]. We found ou ha working wih deep learning and a fairly small daase requires a careful choice of parameers, n in our case, in order o enable an effecive learning process. We will alk abou our choice of his parameer, defined as a funcion of facors like daase size, number of unique aciviies, average number of aciviies per inervenion, and complexiy level, in ecion. C. Deep neural neworks for analysis Today, deep learning mehods are successfully applied o many differen problems, saring from image labeling o naural language modeling and ex generaion. n he majoriy of cases, hey ouperform classical machine learning mehods in erms of performance. ong hor-time Memory TM) recurren neural neworks enable analysis of long sequences wih complex emporal dependences. We used TM in his sudy, hypohesizing ha any hidden elemens of an aciviy depend on currenly known elemens, as well as on he emporal conex. A varian of classic TM [22] including hree gaes i.e., inpu, forge, oupu), an oupu acivaion funcion, no peephole connecions, dropous, and a full gradien raining was used. While many TM models exis, all produce similar saeof-he-ar resuls [23], [24]. We also esed differen ses of TM parameers, each ime varying he number of layers, number of hidden neurons, bach size, learning rae, opimizer, acivaion and loss funcions, and differen daa represenaions. All of he esed models generaed similar resuls, wih less han 5% difference. n order o recreae he same analysis condiions for all aciviy elemens, he same TM model ha which provided he bes resuls on preliminary ess) was used for all configuraions and experimens, described as follows. The model had wo sacked emporal layers wih dropous of 0.2, each conaining 256 hidden neurons. was rained during 50 epochs wih a learning rae of 0.001 by 128- size baches. Caegorical cross enropy was used as he loss propagaion funcion, ogeher wih Adam opimizer. D. udy design n order o confirm our hypohesis and deermine he essenial sensors and signals ha are necessary for aciviy recogniion, we conduced a series of several experimens assessing he impac of each elemen and heir combinaions on recogniion performance, as described below. The performances were compared based on an accuracy score. An

3 TABE NFOMATON ABOUT DATAET UED FO ANAY Procedure ACDF DH PA C Hospial eipzig ) ennes ) eipzig ) ennes ) eipzig ) ennes ) M Number of surgeons 4 5 6 5 2 1 2 Number of inervenions 16 48 25 20 15 11 19 Duraion min) 156±61 85±26 80±26 34±14 78±21 58±22 12±3 Number of aciviies per inervenion 367±149 244±76 242±72 148±49 266±77 213±46 29±5 Number of unique aciviies 377 379 413 243 282 255 45 Number of unique phases 5 5 4 4 5 6 7 Number of unique verbs 12 12 12 11 15 15 12 Number of unique insrumens 25 30 27 23 32 30 15 Number of unique srucures 11 7 10 8 7 9 7 aciviy was considered well recognized when all uple iems were correcly discovered. All he experimens assumed he presence of elemen informaion used as inpu a each momen of surgery. This informaion was assumed o originae from underlying disinc processing and recogniion algorihms, each aking care of is own elemen. Given he relaively small amoun of daa we had available for deep learning, we performed a full-cross validaion for each daase in a leave-one-inervenion-ou manner. Moreover, since TM uses non-deerminisic algorihms for raining, we performed hree runs for each fold, calculaing an average recogniion score for hree models. a) Experimen 1: One-elemen configuraion: The firs experimen was designed o compare he aciviy recogniion performances achieved wih using each individual elemen as he only inpu. We also examined one-o-one relaionships beween he elemens o assess how well one elemen can be recognized when anoher anoher is known. This experimen focused on he sequenial aspec only, omiing imesamps and duraion of aciviies. The model hus had only o predic he correc aciviy uples in he correc order, wihou indicaing momens of ransiion. Each recognized aciviy is supposed o sar when all hree elemens appear in he operaing scene in heory, hey have o be recognized a he same ime), and end when hey disappear, which hen implicily provides he aciviy duraion. b) Experimen 2: Two-elemen configuraion: The second experimen compared combinaions of elemens, meaning ha a pair of known elemens was used o infer a complee aciviy. The same no-ime condiion was used, requiring a sequence of aciviies as oupu only. c) Experimen 3: Aciviy duraion: Conrary o wo previous experimens where he workflow was considered as a sequence of aciviies only, in his experimen we added he duraion of he aciviy in seconds) a he end of is inpu uple. We hen analyzed how knowledge of aciviy duraion impacs inference process. However, he model was sill required o predic a 6-uple only, and no iming was aken ino accoun when compuing final accuracy. The consrain of providing duraion for inpu resrics he recogniion process, as you have o wai for he on-going aciviy o finish. This could negaively reflec on on-line applicaions, ye i is sill well adaped o cases where no immediae reacion is needed during he aciviy, or when only he order of aciviies is relevan. d) Experimen 4: Noise in inpu daa: The previous experimens were conduced wih he assumpion ha all he inpu informaion was correc. n realiy, raw signals coming from sensors may have a cerain amoun of noise, or some elemens may be mislabeled by corresponding recogniion algorihms. n his experimen, some elemen insances in aciviy uples were randomly corruped in order o simulae noise and creae more realisic condiions for he analysis. For example, in one inpu uple, he value of he righ verb cu could be replaced by anoher exising verb coagulae, he lef insrumen needle-holders by classic forceps or he righ srucure ligamen by fascia. n he simulaion, he elemens, as well as heir lef and righ counerpars, were independenly corruped, meaning ha noise occurred a differen momens in ime for all six iems of he uple. Four ypes noise were simulaed: Uniform disribuion noise. Using his kind of noise, a corruped elemen insance is replaced by anoher insance of he same elemen group chosen randomly from a uniform disribuion. Frequency disribuion noise. Ofen, recogniion algorihms end o assign he labels of he mos prevalen classes o incorrecly recognized samples. n our case, if an underlying recogniion algorihm was rained over he enire procedure, he mos commonly represened elemen insances in erms of number of samples) would be hose ha appear in he operaional scene for longer han he ohers. To simulae he behavior of his ype of noise, each insance has a chance o be randomly seleced proporional o he frequency of is appearance in he daase compued by duraion. Pairwise noise. Anoher poenial occurrence is when he samples of wo major classes are muually mislabeled i.e., heir labels are swiched), his is known as pairwise noise. No signal. omeimes recogniion algorihms fail o idenify a performed acion or an objec presen in he scene, providing no label a all. n his experimen, a emporal absence of he sensor signal or he algorihm s disabiliy o recognize an elemen is simulaed by simply replacing he corruped insance wih he word none.

Three possible configuraions were compared in his experimen: 1) inpu wih one knowing elemen, 2) wih wo elemens, and 3) wih all hree elemens. For he firs wo configuraions, he TM models from experimens #1 and #2 rained on noise-free daa were esed on corruped daa. The hird configuraion played he role of base line, where all hree elemens were available wih no need of an TM model, and he aciviy was simply defined as heir composiion. For all configuraions, he noise was simulaed a differen raes: 5, 10, 15, 20, 25, 50 and 75% of he corruped daa in he procedure. For insance, for an algorihm recognizing insrumens ha provides a correc label 95% of ime, 5% of all insrumen insances in he inervenion will have a wrong label. The same is applicable o oher elemens. Given ha a correc aciviy is one where all uple iems are correcly recognized, wih 5% noise for each elemen, a oal amoun of corruped aciviy uples may vary from 5 o 10% for oneelemen configuraions, o 20% for wo-elemen configuraions and up o 30% for he base line. n he bes case scenario, all iems in all corruped uples have wrong labels, and in he wors case, no more han one iem is wrongly labeled in each corruped uple. Giving his grea variaion, noise a each rae was simulaed five imes, and an average was calculaed. As for experimens #1 and #2, no ime aspec was involved in he analysis. e) Experimen 5: Temporal delay: For he previous experimens, we mosly worked wih only he sequencing aspec wih he excepion of experimen #3). No ime was aken ino accoun when compuing accuracy, which was sricly based on he order and correcness of aciviies, no on heir duraions. The elemens were supposed o be recognized a he same ime as hey appeared in he scene. However, he underlying recogniion algorihm may experience a cerain delay before providing a label. n his experimen, we simulaed such a emporal delay. As in he hird experimen, we firs added he duraion of each aciviy o is uple, and hen simulaed delay for all available elemens. is imporan o menion ha a change in aciviy duraion is no he only consequence of he delay. n some occasions, his may also cause a shif in aciviies order by creaing new uples and deleing or alering exising ones. This changes he workflow of he inervenion in erms of sequencing and number of aciviies. The goal of he TM model was o discover a sequence of aciviy uples wihou giving heir correc duraions. However, hey were accouned for when calculaing final accuracy, which was compued as he sum of duraions of all correcly discovered uples divided by he oal duraion of all aciviies in he inervenion. Delays of 1, 5, 10, 15, 20, 25 and 30 seconds were simulaed. Again, we did no rerain he TM models on delayed daa and no model was used for he base line.. EUT a) Experimen 1: One-elemen configuraion: Firs we assessed he impac of each individual elemen i.e., verb V, insrumen, and srucure ) on aciviy recogniion performance. The esimaed scores are indicaed in he upper par Kn o wi n g 4 ACDF. ACDF. DH. Ve r b DH. n s r u me n P A. P A. C r u c u r e Fig. 1. Average recogniion accuracy in for one elemen knowing anoher V ACDF. ACDF. DH. DH. P A. P A. C Fig. 2. Aciviy recogniion accuracy scores in for elemen combinaions. Cener lines of he box plo show he medians, limis indicae he 25h and 75h perceniles, whiskers exend o minimum and maximum values. For each daase, is on he lef, V in he middle, and on he righ of Table. The experimen demonsraed ha one elemen is no enough o confidenly recognize aciviy. The insrumen provided he bes resuls for four ou of seven daases, ye no elemen is exclusively preferable for all procedures. The insrumen and verb are ighly conneced Fig. 1) and have a saisically significan p-value 0.05) correlaion, according o pearman s ho wo-ailed es. While boh elemens provide a lo of informaion abou each oher, hey conribue lile regarding he srucure, and vice versa. Caarac surgery, however, happens o be an excepion, i is a shor, highly sandardized procedure wih minimum of deviaions and a small number of unique aciviies. Here, one elemen almos explicily defined ohers, which explains excepionally high scores. TABE AVEAGE ACTTY ECOGNTON ACCUACY N ACHEVED N EXPEMENT 1 AND 2. VAUE N BOD NDCATE THE EEMENT ) THAT PODED THE BET COE FO EACH DATAET DH. DH. C V 49.72 59.08 54.50 66.29 79.06 63.27 52.64 59.69 64.91 62.32 74.89 68.29 48.63 58.17 60.52 68.23 80.25 60.08 92.79 90.40 90.18 V 64.56 84.99 94.18 81.73 83.33 96.54 63.47 91.12 97.30 75.62 90.11 97.40 60.23 85.16 97.10 82.79 87.73 96.81 96.96 97.06 99.82

5 DH. DH. y x x-noi seamoun C V y-accur acyl oss Fig. 3. ine diagrams showing he loss in aciviy recogniion accuracy wih he growh of frequency disribuion noise in daa for wo- and hree-elemen configuraions DH. DH. y x x-noi seamoun C V y-r ecogni i onaccur acy Fig. 4. mpac of frequency disribuion noise on aciviy recogniion accuracy b) Experimen 2: Two-elemen configuraion: As one elemen was clearly no enough o provide saisfacory recogniion resuls, we conduced an experimen o assess heir combinaions i.e., verb-insrumen, verb-srucure V, insrumen-srucure ). The esimaed accuracy scores can be found in he lower par of Table. We also performed a Wilcoxon signed-rank saisical es. For all daases, he scores given by hese hree combinaions significanly differed, wih a medium o large effec size p-value 0.01 for a woailed es, excep p-value 0.05 for versus V from and C, and versus from C; no significan difference beween and V from ). A noiceable difference in scores can be also observed in Fig. 2. As expeced, he combinaion, providing redundan informaion wih less clues abou he srucure, generaed low performance resuls, proven insufficien for correc sable recogniion. A V combinaion produced relaively good resuls of approximaely 85%, ha would probably be accepable for some purposes. For all procedures and sies, he combinaion was saisfacory o confidenly recognize aciviies, producing a score of approximaely 95% and higher. This demonsraes ha only wo ypes of sensors are necessary for low-level aciviy recogniion. Far all, he srucure, presen in wo leading combinaions, is an essenial piece of informaion. During he firs wo experimens, differen values of number n, defining he size of he emporal conex, were esed. We observed ha as he emporal window increased, he recogniion scores ended o grow quickly unil reaching a

6 plaeau. Wih a furher augmenaion of he number n, he recogniion performances began o decrease. This behavior is due o he working mechanism of TM. n order o obain a clear picure of he relaionship beween he elemens and aciviies, he nework needs o consider he larger porion of he conex. However, in order o clarify hese connecions, a larger se of raining examples is necessary. s size should increase in correlaion wih problem s complexiy. Wihou sufficien amoun of examples, he learning process becomes much less effecive. Tha is why, calculaing he opimal size of he emporal window depends on many aspecs and differs for each presened daase. The bes resuls presened here correspond o n = 50 for ACDF procedures, n = 20 for DH and PA, and n = 5 for C. c) Experimen 3: Aciviy duraion: The experimen analyzed he imporance of aciviy duraion and revealed ha using i as addiional inpu informaion only slighly improved he aciviy inference resuls, while having greaer effec on configuraions wih one known elemen han hose wih wo. For all daases, V configuraion resuled, on average, in a gain in accuracy of 3.7%, - 4.3%, - 4.1%, - 2.8%, V - 1.3%, and - 1.5%. Neverheless, i resuled in he combinaion achieving a recogniion accuracy approaching 98-99%, which corresponds o our hypohesis. d) Experimen 4: Noise in inpu daa: During his experimen, noise was added o inpu informaion in order o simulae possible daa mislabeling by underlying elemen recogniion algorihms. As expeced, all he configuraions had reduced abiliy o predic on-going aciviy when subjeced o noise. Generally, hose which previously providing higher recogniion scores i.e., having more useful informaion in hem) were he mos significanly affeced see Fig. 3). While a ranking of one-elemen configuraions was slighly alered for some daases, he wo-elemen combinaions kep heir order: was sill he mos informaive combinaion, followed by V hen finally. The highes aciviy recogniion accuracy for combinaion ranged from 79% o 84.4% wih 5% noise, and decreased o an average of 6.6% wih 75% noise. However, using an TM model encoding procedure hisory enables he effec of noise o be aenuaed as well as correcing inpu uples, especially for smaller amouns of noise. The deailed resuls for each daase and noise level can be viewed in Fig. 4. We chose o presen he resuls for he example of frequency disribuion noise as his is he mos common ype of noise relaed o daa recogniion and classificaion. is ineresing o see ha he base line combinaion always concedes o, and ha i yields o all wo-elemen combinaions even wih relaively small amouns of noise saring from 10-15% noise). quickly decreases reaching almos zero accuracy a 75% noise, and exhibis he mos significan accuracy loss see Fig. 3), assuming ha a perfec combinaion would aain 100% recogniion. represens a naive approach of simply puing hree elemens ogeher wih no emporal model, and is hus unable o correc iself. Unlike oher configuraions, in he presence of noise i is auomaically incorrecly recognized. The rapid drop in is performance qualiy can also be explained by he fac ha an addiional elemen in an aciviy uple leads o a higher risk of is corrupion, especially wih greaer noise. Thus, having less informaion is beer han having los wih noise. Coninuing in our analysis of differen ypes of noise, we found ha a lower levels up o 20, he resuls for all noise ypes were similar wih jus a minor difference in accuracy see Fig. 5). This difference grows noiceable a higher noise raes, resuling in seeper or flaer curves. However, no saisically significan correlaion beween configuraions and noise ypes suiable for all daases was found a higher noise levels. The curve of he base line is nearly he same for all noise ypes and daases. n he case of configuraion, he qualiy of aciviy recogniion on noisy daa depends neiher on he semanic conen of he surgery nor on he ype of noise, bu raher only on he randomness of corrupion. This is eviden as an alered uple iem is wrong anyway, no maer is iniial or received value. Previous experimens have demonsraed ha wihin cerain limis, a wider emporal window is beer for perfecly correc daa. This experimen, however, showed ha for noisy daa, a small n value generaes beer resuls, as a larger emporal window offers a beer chance of making a predicion based on false informaion. n he excepion of one-elemen configuraions of and DH. daases conaining uniform noise, n = 5 is he bes opion for all oher cases. Neverheless, mos of he ime, he difference in accuracy scores given by differen values of n was no saisically significan. e) Experimen 5: Temporal delay: This experimen assessed he impac of emporal delay on he qualiy of aciviy recogniion. As in he previous experimen, we found here ha a delay delay caused all he configuraions o progressively lose heir recogniion abiliy. Neverheless, as before, he relaionship beween configuraions remained he same wih he combinaion achieving he bes resuls. This combinaion keeps very high scores for a 1s delay, ranging from 91% o 97.9%, wih an average of 94.1%. Even if some divergence is observable beween and curves for several daases, as seen in Fig. 6, he average and curves are very similar wih less han 1% difference a each delay poin, wih he excepion of he 30s poin where he configuraion surpasses by 1.8%. As was he case wih noise, he performance of configuraion is progressively impaired as he delay increases, as here is no emporal conex of he procedure and no opporuniy o correc uple values. Two-elemen combinaions, on he oher hand, we found sill able o discover on-going aciviy due o he hisory of he procedure represened by he TM. They neverheless suffer from alered aciviy sequencing, making i difficul for TM o follow. We can observe ha he bigges deficiencies for wo-elemen combinaions occurred in inervals from 1 o 10 seconds a loss of approximaely 15-20% each ime). This can be explained by he fac ha during hese inervals, he mos significan changes in workflows are made i.e., creaion and deleion of aciviies). V. DCUON This sudy proved our hypohesis ha for accurae recogniion of low-level surgical aciviy, no all of he aciviy

7 a) b) c ) d) e) f ) g) h) y x V x-noi s eamoun V y-r ec ogni i onac c ur ac y Fig. 5. nfluence of differen ypes of noise on aciviy recogniion performance corresponding o he DH. daase. The uniform disribuion noise is presen in figures a) and e), frequency disribuion in b) and f), pairwise noise in c) and g), no-signal noise in d) and h). The op shows he resuls for one-elemen configuraions, while he boom for wo and hree-elemen ones DH. DH. y x x-r ecogni i ondel ay s) C V y-r ecogni i onaccur acy Fig. 6. mpac of emporal delay on aciviy recogniion accuracy elemens need sensors o rack. Though ha sor of analysis should also be conduced for oher surgical domains, he bes choice for neurosurgery is he use of a combinaion of sensors recognizing he insrumen and he anaomical srucure. n he case of sandardized simple procedures, such as caarac operaions where one elemen is necessarily ighly bounded o wo ohers, searching for he mos informaive elemens is no worhwhile. However, wo sensors are sufficien for aciviy recogniion for hese procedures as well. The experimens wih noise and emporal delay also demonsraed he advanage of he insrumen-srucure combinaion over oher configuraions, including hose uniing all hree elemens, suggesing ha he combinaion can be safely replaced by wih no significan impairmen. However, in order o verify hese conclusions, furher analysis in cerain direcions mus sill be underaken. Firs of all, during our fourh experimen, for he sake of simpliciy we assumed ha all he elemens had he same amoun of noise in hem, which is, of course, no necessarily he case in real-life procedures. The amoun and ype of noise in each elemen depends on he underlying algorihm for is recogniion. The bes way o ge realisic esimaions of scores is o use confusion marices from hese algorihms o simulae noise in daa. We also proceed under he assumpion ha he perurbaion in daa was uniform in

8 ime, making each ime-poin equally available for corrupion. This aspec should be explored more carefully, as i may no be valid in real surgeries. The same applies o he delay. may also vary from one elemen o anoher in real-life siuaions, as well as beween differen elemen insances. The combinaions of differen noises and delays mus also be evaluaed. econdly, in he las wo experimens, condiions under which he sudied configuraions and base line were compared differed. Wih he TM model, he configuraions wih one and wo known elemens benefied from he emporal conex of he procedure, enabling inpu uples o be correced when needed. f his was possible for a base line, i would probably generae beer resuls. We demonsraed ha in erms of recogniion scores, he combinaion is capable of providing very accurae resuls. Neverheless, a considerable drop in performance was observed in he presence of noise and delay abou 80% accuracy on average a 5% noise vs. 97% wih no noise). This work sough o generae neiher a high recogniion performance nor a suggesion of an original efficien TM archiecure. Neverheless, in order o ruly prove ha wo ypes of sensor are enough for surgical aciviy recogniion, he overall performance should be enhanced. Firs, our main focus was on discovering relaionships beween aciviy elemens using simple TM models. There are always suble connecions beween he elemens ha influence he recogniion process, however, regardless of which mehod is used. Thus, he conclusions drawn from he analysis would no considerably change using any oher mehod or TM model. However, i should sill be possible o find oher more suiable deep models ha could provide greaer accuracy and mainain a srong performance even in he presence of noise. n our experimens, he chosen TM model was rained on nonoise daa only. One can herefore imagine ha reraining he nework on simulaed noisy daa or using some preprocessing mehods, as well as noise reducion echniques, could be beneficial. econdly, he problem wih delay can also be avoided. The procedural workflows used in our sudy were annoaed manually in real ime. Mos of he very shor aciviies are due o an annoaor s lae reacion or a surgeon s complex hand coordinaion. For mos applicaions, such an exremely deailed annoaion is unnecessary. Eliminaing hese very brief aciviies, causing a major change in aciviy sequencing in he experimen wih delay, will make he recogniion scores increase again. A larger amoun of available daa would also provide beer resuls and make he nework more robus. n addiion, i should be noiced ha no all clinical applicaions require absolue recogniion accuracy. Cerain errors or delay cause no harm and cab be oleraed, consequenly reducing he gap beween developing aciviy recogniion echniques and heir acual realizaion and use in operaing heaers. Applicaion-dependen merics similar o [25] may be used o reesimae his gap. Finally, one hing remains clear: placing more sensors in he operaing heaer is no a soluion. The way forward is enhancing underlying algorihms recognizing verbs, insrumens and anaomical srucures. n addiion o confirming our hypohesis abou sensors, his sudy led o some ineresing observaions abou surgical pracices. During he experimens, we noiced ha he elemens of he righ hand of he surgeon were obviously conribuing more o he correc idenificaion of he aciviy. However, despie he correlaion beween he surgeon s hand movemens, neiher he informaion abou he righ nor ha of he lef hand alone was enough o aain accepable recogniion resuls. This demonsraes how imporan boh hands are in aciviy execuion. The firs wo experimens also revealed a difference beween pracices in he eipzig and ennes hospials. Using one-elemen configuraions o discover he aciviy, he resuls for he procedures performed in ennes were always significanly beer han for hose performed in eipzig. A he same ime, he insrumen was clearly a beer choice over oher individual elemens for ennes, ye he same was no rue for eipzig. Moreover, in all of he procedures conduced in ennes, we found here was a sronger bound beween he insrumen and srucure, as well as beween he verb and srucure. Our resuling hypohesis is ha, unlike in ennes, he surgical insrumens in eipzig are more ofen used for new funcions raher han heir iniially-inended applicaion. This indicaes ha he procedures performed in ennes are more sandardized and have less variabiliy in surgical workflow. uch observaions are imporan for he analysis and undersanding of surgical processes. V. CONCUON n his work we analyzed he relaionships beween he essenial elemens of low-level surgical aciviy and heir impac on recogniion process. By performing a semanic analysis using deep learning, we demonsraed ha wo ou of hree elemens are enough o confidenly recognize an aciviy. The operaed anaomical srucure is a crucial elemen. The combined srucure-insrumen pair enables very confiden aciviy recogniion, followed by a srucure-verb combinaion ha provides slighly worse ye sill accepable resuls. This knowledge should faciliae he choice of righ sensors o insall in he operaing room of he fuure for siuaion awareness. We also made some ineresing observaions abou surgical pracices ha improve undersanding of he surgical process. ACKNOWEDGEMENT This work was parially suppored by French sae funds managed by he AN wihin he nvesissemens d Avenir program abex CAM) under he reference AN-11-ABX- 0004. EFEENCE [1] F. alys and P. Jannin, urgical process modelling: a review, nernaional journal of compuer assised radiology and surgery, vol. 9, no. 3, pp. 495 511, 2014. [2] G. Foresier,. iffaud, and P. Jannin, Auomaic phase predicion from low-level surgical aciviies, nernaional Journal of Compuer Assised adiology and urgery, vol. 10, no. 6, pp. 833 841, 2015. [3] D. Kaić, J. chuck, A.-. Wekerle, H. Kenngo, B. P. Müller-ich,. Dillmann, and. peidel, Bridging he gap beween formal and experience-based knowledge for conex-aware laparoscopy, nernaional journal of compuer assised radiology and surgery, vol. 11, no. 6, pp. 881 888, 2016.

9 [4] A. P. Twinanda,. hehaa, D. Muer, J. Marescaux, M. de Mahelin, and N. Padoy, Endone: A deep archiecure for recogniion asks on laparoscopic videos, EEE Transacions on Medical maging, vol. 36, no. 1, pp. 86 97, 2017. [5]. Bodensed, M. Wagner, D. Kaić, P. Miekowski, B. Mayer, H. Kenngo, B. Müller-ich,. Dillmann, and. peidel, Unsupervised emporal conex learning using convoluional neural neworks for laparoscopic workflow analysis, arxiv preprin arxiv:1702.03684, 2017. [6] J. E. Bardram, A. Doryab,. M. Jensen, P. M. ange, K.. Nielsen, and. T. Peersen, Phase recogniion during surgical procedures using embedded and body-worn sensors, in Pervasive Compuing and Communicaions PerCom), 2011 EEE nernaional Conference on. EEE, 2011, pp. 45 53. [7] A. P. Twinanda, E. O. Alkan, A. Gangi, M. de Mahelin, and N. Padoy, Daa-driven spaio-emporal rgbd feaure encoding for acion recogniion in operaing rooms, nernaional journal of compuer assised radiology and surgery, vol. 10, no. 6, pp. 737 747, 2015. [8] B. B. Haro,. Zappella, and. Vidal, urgical gesure classificaion from video daa, in nernaional Conference on Medical mage Compuing and Compuer-Assised nervenion. pringer, 2012, pp. 34 41. [9] Y. Gao,.. Vedula, G.. ee, M.. ee,. Khudanpur, and G. D. Hager, Query-by-example surgical aciviy deecion, nernaional journal of compuer assised radiology and surgery, vol. 11, no. 6, pp. 987 996, 2016. [10]. DiPiero, C. ea, A. Malpani, N. Ahmidi,. Vedula, G. ee, M. ee, and G. Hager, Auomaic daa-driven real-ime segmenaion and recogniion of surgical workflow, in n: nernaional Conference on Medical mage Compuing and Compuer-Assised nervenion MC- CA). pringer, 2016, pp. 551 558. [11] F. alys, D. Bouge,. iffaud, and P. Jannin, Auomaic knowledgebased recogniion of low-level asks in ophhalmological procedures, nernaional journal of compuer assised radiology and surgery, vol. 8, no. 1, pp. 39 49, 2013. [12] C. Meißner, J. Meixensberger, A. Preschner, and T. Neumuh, ensorbased surgical aciviy recogniion in unconsrained environmens, Minimally nvasive Therapy & Allied Technologies, vol. 23, no. 4, pp. 198 205, 2014. [13] G. Foresier, F. alys,. iffaud, B. Trelhu, and P. Jannin, Classificaion of surgical processes using dynamic ime warping, Journal of Biomedical nformaics, vol. 45, no. 2, pp. 255 264, 2012. [14]. iffaud, T. Neumuh, X. Morandi, C. Tranakis, J. Meixensberger, O. Burger, B. Trelhu, and P. Jannin, ecording of surgical processes: a sudy comparing senior and junior neurosurgeons during lumbar disc herniaion surgery, Neurosurgery, vol. 67, pp. 325 332, 2010. [15] G. Foresier, F. alys,. iffaud, D.. Collins, J. Meixensberger,. N. Wassef, T. Neumuh, B. Goule, and P. Jannin, Muli-sie sudy of surgical pracice in neurosurgery based on surgical process models, Journal of biomedical informaics, vol. 46, no. 5, pp. 822 829, 2013. [16]. Bouarfa and J. Dankelman, Workflow mining and oulier deecion from clinical aciviy logs, Journal of biomedical informaics, vol. 45, no. 6, pp. 1185 1190, 2012. [17] A. Huaulmé,. Voros,. iffaud, G. Foresier, A. Moreau-Gaudry, and P. Jannin, Disinguishing surgical behavior by sequenial paern discovery, Journal of Biomedical nformaics, vol. 67, pp. 34 41, 2017. [18] M. Makabi and T. Neumuh, Online ime and resource managemen based on surgical workflow ime series analysis, nernaional journal of compuer assised radiology and surgery, vol. 12, no. 2, pp. 325 338, 2017. [19] M. Kranzfelder, A. chneider,. Gillen, and H. Feussner, New echnologies for informaion rerieval o achieve siuaional awareness and higher paien safey in he surgical operaing room: he mri insiuional approach and review of he lieraure, urgical endoscopy, vol. 25, no. 3, pp. 696 705, 2011. [20] N. Meha,. Haluck, M. Frecker, and A. nyder, equence and ask analysis of insrumen use in common laparoscopic procedures, urgical endoscopy, vol. 16, no. 2, pp. 280 285, 2002. [21] F. alys,. iffaud, X. Morandi, and P. Jannin, Auomaic phases recogniion in piuiary surgeries by microscope images classificaion, in nernaional Conference on nformaion Processing in Compuer- Assised nervenions, 2010, pp. 34 44. [22] A. Graves, upervised equence abelling wih ecurren Neural Neworks. pringer Berlin Heidelberg, 2012. [23]. Jozefowicz, W. Zaremba, and. uskever, An empirical exploraion of recurren nework archiecures, Journal of Machine earning esearch, 2015. [24] K. Greff,. rivasava, J. Kouník, B. eunebrink, and J. chmidhuber, sm: A search space odyssey, EEE ransacions on neural neworks and learning sysems, 2017. [25] O. Dergachyova, D. Bouge, A. Huaulmé, X. Morandi, and P. Jannin, Auomaic daa-driven real-ime segmenaion and recogniion of surgical workflow, nernaional journal of compuer assised radiology and surgery, vol. 11, no. 6, pp. 1081 1089, 2016.