Adaptive clustering of codes for assessment in
introductory programming courses
Alexandre de A. Barbosa – UFAL / UFCG
Evandro de B. Costa – UFAL / UFCG
Adaptive clustering of codes for assessment in
introductory programming courses
Topics
–
Context and problem
–
Related work
–
Adaptive Clustering of codes
●
Background
●
The clustering approach
●
Evaluation of the clustering approach
●Results
3
Context and problem
●
Programming is one of the basic competences in computer
science
–
disciplines of algorithms and the introductory programming
●
easy to find unmotivated students with some doubts and that
do not understand basic programming concepts
●
approved students do not have the necessary competencies
for the course and professional life [1]
–
Many factors are described in the scientific literature
●
individualized help for each student can minimize some of the
factors
Context and problem
●
Practical coding activities are typically adopted in
programming courses
–
Assessment of the proposed solutions is quite difficult
●
n. students x n. exercices x n.solutions (code/submissions)
●Large number of parameters can be observed
–
inputs/outputs
–code structure
–eficiency
–
The evaluation of code solutions is time-consuming, it is
subject to the bias and errors of each evaluator.
5
Related work
Online judges [10]
–
set of tests determine success or failure of the solution
Analysis of similarities [11,12, 13]
–
explore the code similarities with different purposes
Clustering or classification of codes [14,15,16]
–
a different set of techniques is used in each research
–
same set of criterias adopted by each evaluator
Adaptive Clustering of codes
Background
●
Clustering algorithm Kmeans [7]
K centroids
Set of data from each element
Distance of elements
●
Software metrics [18]
–
Properties extracted from codes
7
Adaptive Clustering of codes
Background
●
Euclidean distance
–
distance of two points in a n-dimensional space
●
Cohen’s Kappa
–
degree of agreement between two lists of classification
beyond what would be expected at random
Adaptive Clustering of codes
The clustering approach
●
The main ideia...
1. Select one element
to represent the cluster
2. Evaluate the element
(grade + text)
3. Generalize the
9
Adaptive Clustering of codes
The clustering approach
●
The steps
–
(1) code metrics extraction
–
(2) identification of the criteria adopted by the specialist
–
(3) Clustering generation
–
(4) Evaluation
Adaptive Clustering of codes
The clustering approach
●
The steps:
(1) code metrics extraction
–
Each code have a vector of properties
–
Some metrics are restricted to the code (eg. number of
operators)
–
Similarity metrics consider the relation of a code and a
11
Adaptive Clustering of codes
The clustering approach
●
The steps:
(2) identification of the criteria adopted by the
specialist
–
Generate all possible combinations of metrics (will be used
to create all possible set of clusters)
–
Specialist grades 10 codes (used to select on set of clusters)
–
Brute force method (not efficient)
Adaptive Clustering of codes
The clustering approach
●
The steps: (3) Clustering generation
●
Using Kmeans, with K = 10, based on a set of software
metrics*
* all possible combinations
–
All possible sets of clusters are
13
Adaptive Clustering of codes
The clustering approach
●
The steps: (4) Evaluation
–
a specialist assign 10 grades
–
one set clusters is selected
–
for each cluster in the set the grades are generalized to all
the other cluster elements, using the already given grades
Adaptive Clustering of codes
Evaluation of the clustering approach
●
The dataset
–
set of programming problems
–
set of codes
15
Adaptive Clustering of codes
Evaluation of the clustering approach
●
The dataset: set of programming problems (exercises)
–
‘salary bonus’ and ‘points distance’ (basic problems)
–
‘student situation’ and ‘elections’ (decision problems)
–
‘odd loop’ and ‘divisible by 3’ (loop problems)
Adaptive Clustering of codes
Evaluation of the clustering approach
●
The dataset: set of codes submitted by students as
solutions to the exercises
–
‘salary bonus’ (32 submissions)
–
‘points distance’ (23 submissions)
–
‘student situation’ (43 submissions)
–
‘elections’ (40 submissions)
–
‘odd loop’ (41 submissions)
17
Adaptive Clustering of codes
Evaluation of the clustering approach
●
The dataset: set of evaluations (grades varying from 0 to
10)provided by specialist (teachers and teacher assistants)
Adaptive Clustering of codes
Results
●
Compute Cohen`s Kappa
–
Specialist list vs. specialist list
–
Specialist list vs. cluster generated list
●
Compute Euclidean distance
–
Specialist list vs. specialist list
19
Adaptive Clustering of codes
Results
●
Cohen’s Kappa
–
Mean of 0.76 - Strong Agreement
–
“The specialists have a string agreement with the cluster
generated list of grades”
Adaptive Clustering of codes
Results
●
Euclidean distance*
–
Mean of 5.95
–
“Interpreting the list of grades as points coordinates in a
21
Adaptive Clustering of codes
Adaptive clustering of codes for assessment in introductory programming courses – ITS 2018 Alexandre A. Barbosa – [email protected]
Conclusions
●
We have proposed the use of a clustering algorithm to
minimize the effort expended in the evaluation of codes in
introductory courses
●
The results suggest that it is possible to minimize the
evaluation effort expended (Strong agreement between
specialist – cluster approach)
●