Backend Features in TechLauncher Common Assessment Process Platform James Volis u5370515 Supervisor: Shayne Flint
Structure 1. 2. 3. 4. 5. Background Task Approach Results Conclusions
Background
TechLauncher
Also known as...
COMP3100
COMP3100 COMP3500
COMP3100 COMP3500 COMP3550
COMP3100 COMP3500 COMP3550 COMP4500
COMP3100 COMP3500 COMP3550 COMP4500 COMP8715
~37 different projects
~37 different projects ~240 students enrolled
~37 different projects ~240 students enrolled Big course.
So what? It s a big course.
The issues?
Subjectivity.
No two students are the same!
No two pieces of work are the same!
(hopefully not)
A huge amount of data.
A lot of marking is required!
Effort is required for marking!
Problems remain...
How do you mark different pieces of work to the same standard?
How do you give timely feedback, given the large amount of marking?
Task
Simple solution to the marking problem...
Get the students to do the work for you!
Otherwise known as Peer Assessment
However,
this brings up other issues, namely...
Using students to mark themselves!
How can you assess if they are giving good feedback?
How can you assess what is good feedback?
And then there is the challenge,
How do you filter good feedback from not good feedback?
How do you filter all good feedback and give it back to the students in time?
Approach
Data can be sorted into two classes, (Jin 2016)
Actionable Feedback
Contains an executable suggestion
You should try doing this
Negative tone that suggests improvement
No risk assessment plan
Feedback that suggests improvement.
Descriptive Feedback
Detailing what has been done
You have done X, Y and Z well
Compliments
Good job dude! Great work!
Anything else.
But we still have too much data to classify by hand?
Solution?
Use a machine learning algorithm to do it for us.
What type of machine learning algorithm should we use?
Support Vector Machine
By Alisneaky, svg version by User:Zirguezi (Own work) [CC BY-SA 4.0 (http://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons
Found in a previous report to be the most effective for this problem (Jin 2016)
This year in TechLauncher,
Using Tag Reports
Students select tags that match deliverables,
Examples: Collecting user feedback Collecting evidence Requirements managed Prioritising work
They also leave a rationale for their tags,
My focus are the rationale responses.
Large amount of responses,
473 reports * 5 responses each = 2365 responses to sort fortnightly
Tools Used
Utilising the SVM workflow created last year (Jin 2016)
Why continue with this?
Open-source
Open-source Easy to use
Open-source Easy to use Sufficient performance
Data Storage
Why.xlsx over a database?
Output of tag report forms are.xlsx
Output of tag report forms are.xlsx Can be used as an input
Output of tag report forms are.xlsx Can be used as an input GUI for less technical people (tutors)
Data Manipulation
Why Python?
Easy to set up
Easy to set up Easy to use
Easy to set up Easy to use Easy to document
Results
How did this go?
Training sets were created based off a random selection of raw data,
Data was sorted and put into Excel (training sets and raw data),
KNIME created a new spreadsheet as an output with predictions,
The output was then matched with the initial tag report,
End result: Tag report with predictions.
How effective was the SVM?
Using a Confusion Matrix
n = total sample size Predicted Observation: True Predicted Observation: False Actual Observation: True Number of True Positives Number of False Negatives Actual Observation: False Number of False Positives Number of True Negatives
n = total sample size Predicted Observation: True Predicted Observation: False Actual Observation: True Number of True Positives Number of False Negatives Actual Observation: False Number of False Positives Number of True Negatives
n = total sample size Predicted Observation: True Predicted Observation: False Actual Observation: True Number of True Positives Number of False Negatives Actual Observation: False Number of False Positives Number of True Negatives
n = total sample size Predicted Observation: True Predicted Observation: False Actual Observation: True Number of True Positives Number of False Negatives Actual Observation: False Number of False Positives Number of True Negatives
n = total sample size Predicted Observation: True Predicted Observation: False Actual Observation: True Number of True Positives Number of False Negatives Actual Observation: False Number of False Positives Number of True Negatives
Adapting this to TechLauncher data
Suggested Confusion Matrix for Peer Assessment Responses n = Number of Peer Assessment Responses Predicted Observation: Actionable Predicted Observation: Descriptive Actual Observation: Actionable Number of Actionable responses predicted Number of Actionable predicted as Descriptive Actual Observation: Descriptive Number of Descriptive predicted as Actionable Number of Descriptive responses predicted
Actual Confusion Matrix for Peer Assessment Responses n = 2110 Predicted Observation: Actionable Predicted Observation: Descriptive Actual Observation: Actionable 527 108 Actual Observation: Descriptive 118 1357
Metrics for Confusion Matrix performance
Accuracy
Accuracy How often did it predict correctly?
Accuracy How often did it predict correctly? Accuracy = (TP + TN) / Total Sample Size
Precision
Precision What proportion of predictions were true?
Precision What proportion of predictions were true? Precision = TP / (TP + FP)
Recall
Recall What proportion of positives were predicted?
Recall What proportion of positives were predicted? Recall = TP / (TP + FN)
Performance
Accuracy ~ 89%
Accuracy ~ 89% Precision ~ 82%
Accuracy ~ 89% Precision ~ 82% Recall ~ 83%
The classifier is relatively good.
Models may need to be improved slightly before use in grades.
Conclusions
Where do we go from here?
Students will receive an extra field in their Peer Assessment feedback,
AI Feedback field will list whether feedback is actionable or descriptive,
Given a form to check whether the AI has classified feedback correctly,
Student s corrections will help improve the AI s performance.
Other uses
Giving statistics on types of feedback given
A student would be able to see how much good feedback they have given,
Making it part of the marking scheme.
More iterations are needed at this stage.
Summary TechLauncher is really big and hard to mark Classify feedback given Support Vector Machine Results Actionable & Descriptive Using Machine Learning Algorithm How do we mark it in time? Performance Metrics are relatively good Conclusions Students will help correct classifications Statistics from predictions in future More iterations needed to improve performance.
Questions?
References Jin, Zi, 2016, Using Peer Assessment Data to Help Improve Teaching and Learning Outcomes, B Advanced Computing (Honours) Thesis, The Australian National University.