Backend Features in TechLauncher Common Assessment Process Platform. James Volis u Supervisor: Shayne Flint

Backend Features in TechLauncher Common Assessment Process Platform James Volis u5370515 Supervisor: Shayne Flint

Structure 1. 2. 3. 4. 5. Background Task Approach Results Conclusions

Background

TechLauncher

Also known as...

COMP3100

COMP3100 COMP3500

COMP3100 COMP3500 COMP3550

COMP3100 COMP3500 COMP3550 COMP4500

COMP3100 COMP3500 COMP3550 COMP4500 COMP8715

~37 different projects

~37 different projects ~240 students enrolled

~37 different projects ~240 students enrolled Big course.

So what? It s a big course.

The issues?

Subjectivity.

No two students are the same!

No two pieces of work are the same!

(hopefully not)

A huge amount of data.

A lot of marking is required!

Effort is required for marking!

Problems remain...

How do you mark different pieces of work to the same standard?

How do you give timely feedback, given the large amount of marking?

Task

Simple solution to the marking problem...

Get the students to do the work for you!

Otherwise known as Peer Assessment

However,

this brings up other issues, namely...

Using students to mark themselves!

How can you assess if they are giving good feedback?

How can you assess what is good feedback?

And then there is the challenge,

How do you filter good feedback from not good feedback?

How do you filter all good feedback and give it back to the students in time?

Approach

Data can be sorted into two classes, (Jin 2016)

Actionable Feedback

Contains an executable suggestion

You should try doing this

Negative tone that suggests improvement

No risk assessment plan

Feedback that suggests improvement.

Descriptive Feedback

Detailing what has been done

You have done X, Y and Z well

Compliments

Good job dude! Great work!

Anything else.

But we still have too much data to classify by hand?

Solution?

Use a machine learning algorithm to do it for us.

What type of machine learning algorithm should we use?

Support Vector Machine

By Alisneaky, svg version by User:Zirguezi (Own work) [CC BY-SA 4.0 (http://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons

Found in a previous report to be the most effective for this problem (Jin 2016)

This year in TechLauncher,

Using Tag Reports

Students select tags that match deliverables,

Examples: Collecting user feedback Collecting evidence Requirements managed Prioritising work

They also leave a rationale for their tags,

My focus are the rationale responses.

Large amount of responses,

473 reports * 5 responses each = 2365 responses to sort fortnightly

Tools Used

Utilising the SVM workflow created last year (Jin 2016)

Why continue with this?

Open-source

Open-source Easy to use

Open-source Easy to use Sufficient performance

Data Storage

Why.xlsx over a database?

Output of tag report forms are.xlsx

Output of tag report forms are.xlsx Can be used as an input

Output of tag report forms are.xlsx Can be used as an input GUI for less technical people (tutors)

Data Manipulation

Why Python?

Easy to set up

Easy to set up Easy to use

Easy to set up Easy to use Easy to document

Results

How did this go?

Training sets were created based off a random selection of raw data,

Data was sorted and put into Excel (training sets and raw data),

KNIME created a new spreadsheet as an output with predictions,

The output was then matched with the initial tag report,

End result: Tag report with predictions.

How effective was the SVM?

Using a Confusion Matrix

n = total sample size Predicted Observation: True Predicted Observation: False Actual Observation: True Number of True Positives Number of False Negatives Actual Observation: False Number of False Positives Number of True Negatives

Adapting this to TechLauncher data

Suggested Confusion Matrix for Peer Assessment Responses n = Number of Peer Assessment Responses Predicted Observation: Actionable Predicted Observation: Descriptive Actual Observation: Actionable Number of Actionable responses predicted Number of Actionable predicted as Descriptive Actual Observation: Descriptive Number of Descriptive predicted as Actionable Number of Descriptive responses predicted

Actual Confusion Matrix for Peer Assessment Responses n = 2110 Predicted Observation: Actionable Predicted Observation: Descriptive Actual Observation: Actionable 527 108 Actual Observation: Descriptive 118 1357

Metrics for Confusion Matrix performance

Accuracy

Accuracy How often did it predict correctly?

Accuracy How often did it predict correctly? Accuracy = (TP + TN) / Total Sample Size

Precision

Precision What proportion of predictions were true?

Precision What proportion of predictions were true? Precision = TP / (TP + FP)

Recall

Recall What proportion of positives were predicted?

Recall What proportion of positives were predicted? Recall = TP / (TP + FN)

Performance

Accuracy ~ 89%

Accuracy ~ 89% Precision ~ 82%

Accuracy ~ 89% Precision ~ 82% Recall ~ 83%

The classifier is relatively good.

Models may need to be improved slightly before use in grades.

Conclusions

Where do we go from here?

Students will receive an extra field in their Peer Assessment feedback,

AI Feedback field will list whether feedback is actionable or descriptive,

Given a form to check whether the AI has classified feedback correctly,

Student s corrections will help improve the AI s performance.

Other uses

Giving statistics on types of feedback given

A student would be able to see how much good feedback they have given,

Making it part of the marking scheme.

More iterations are needed at this stage.

Summary TechLauncher is really big and hard to mark Classify feedback given Support Vector Machine Results Actionable & Descriptive Using Machine Learning Algorithm How do we mark it in time? Performance Metrics are relatively good Conclusions Students will help correct classifications Statistics from predictions in future More iterations needed to improve performance.

Questions?

References Jin, Zi, 2016, Using Peer Assessment Data to Help Improve Teaching and Learning Outcomes, B Advanced Computing (Honours) Thesis, The Australian National University.