Adaptive clustering of codes for assessment in introductory programming courses

(1)

Adaptive clustering of codes for assessment in

introductory programming courses

Alexandre de A. Barbosa – UFAL / UFCG

Evandro de B. Costa – UFAL / UFCG

(2)

Adaptive clustering of codes for assessment in

introductory programming courses

Topics

–

_{Context and problem}

–

_{Related work}

–

_{Adaptive Clustering of codes}

●

Background

●

The clustering approach

●

Evaluation of the clustering approach

●

Results

(3)

3 Context and problem

●

Programming is one of the basic competences in computer

science

–

_{disciplines of algorithms and the introductory programming}

●

easy to find unmotivated students with some doubts and that

do not understand basic programming concepts

●

approved students do not have the necessary competencies

for the course and professional life [1]

–

_{Many factors are described in the scientific literature}

●

individualized help for each student can minimize some of the

factors

(4)

Context and problem

●

Practical coding activities are typically adopted in

programming courses

–

_{Assessment of the proposed solutions is quite difficult}

●

n. students x n. exercices x n.solutions (code/submissions)

●

Large number of parameters can be observed

–

_{inputs/outputs}

–

_{code structure}

–

_eficiency

–

_{The evaluation of code solutions is time-consuming, it is}

subject to the bias and errors of each evaluator.

(5)

5 Related work

Online judges [10]

–

_{set of tests determine success or failure of the solution}

Analysis of similarities [11,12, 13]

–

explore the code similarities with different purposes

Clustering or classification of codes [14,15,16]

–

_{a different set of techniques is used in each research}

–

same set of criterias adopted by each evaluator

(6)

Adaptive Clustering of codes

Background

●

Clustering algorithm Kmeans [7]

K centroids

Set of data from each element

Distance of elements

●

Software metrics [18]

–

Properties extracted from codes

(7)

7 Adaptive Clustering of codes

Background

●

Euclidean distance

–

distance of two points in a n-dimensional space

●

Cohen’s Kappa

–

_{degree of agreement between two lists of classification}

beyond what would be expected at random

(8)

Adaptive Clustering of codes

The clustering approach

●

The main ideia...

1. Select one element

to represent the cluster

2. Evaluate the element

(grade + text)

3. Generalize the

(9)

9 Adaptive Clustering of codes

The clustering approach

●

The steps

–

(1) code metrics extraction

–

(2) identification of the criteria adopted by the specialist

–

(3) Clustering generation

–

_{(4) Evaluation}

(10)

Adaptive Clustering of codes

The clustering approach

●

The steps:

(1) code metrics extraction

–

Each code have a vector of properties

–

Some metrics are restricted to the code (eg. number of

operators)

–

_{Similarity metrics consider the relation of a code and a}

(11)

11 Adaptive Clustering of codes

The clustering approach

●

The steps:

(2) identification of the criteria adopted by the

specialist

–

_{Generate all possible combinations of metrics (will be used}

to create all possible set of clusters)

–

_{Specialist grades 10 codes (used to select on set of clusters)}

–

_{Brute force method (not efficient)}

(12)

Adaptive Clustering of codes

The clustering approach

●

The steps: (3) Clustering generation

●

Using Kmeans, with K = 10, based on a set of software

metrics*

* all possible combinations

–

All possible sets of clusters are

(13)

13 Adaptive Clustering of codes

The clustering approach

●

The steps: (4) Evaluation

–

a specialist assign 10 grades

–

one set clusters is selected

–

for each cluster in the set the grades are generalized to all

the other cluster elements, using the already given grades

(14)

Adaptive Clustering of codes

Evaluation of the clustering approach

●

The dataset

–

set of programming problems

–

set of codes

(15)

15 Adaptive Clustering of codes

Evaluation of the clustering approach

●

The dataset: set of programming problems (exercises)

–

‘salary bonus’ and ‘points distance’ (basic problems)

–

‘student situation’ and ‘elections’ (decision problems)

–

‘odd loop’ and ‘divisible by 3’ (loop problems)

(16)

Adaptive Clustering of codes

Evaluation of the clustering approach

●

The dataset: set of codes submitted by students as

solutions to the exercises

–

_{‘salary bonus’ (32 submissions)}

–

_{‘points distance’ (23 submissions)}

–

_{‘student situation’ (43 submissions)}

–

_{‘elections’ (40 submissions)}

–

‘odd loop’ (41 submissions)

(17)

17 Adaptive Clustering of codes

Evaluation of the clustering approach

●

The dataset: set of evaluations (grades varying from 0 to

10)provided by specialist (teachers and teacher assistants)

(18)

Adaptive Clustering of codes

Results

●

Compute Cohen`s Kappa

–

Specialist list vs. specialist list

–

Specialist list vs. cluster generated list

●

Compute Euclidean distance

–

_{Specialist list vs. specialist list}

(19)

19 Adaptive Clustering of codes

Results

●

Cohen’s Kappa

–

Mean of 0.76 - Strong Agreement

–

“The specialists have a string agreement with the cluster

generated list of grades”

(20)

Adaptive Clustering of codes

Results

●

Euclidean distance*

–

_{Mean of 5.95}

–

“Interpreting the list of grades as points coordinates in a

(21)

21 Adaptive Clustering of codes

Adaptive clustering of codes for assessment in introductory programming courses – ITS 2018 Alexandre A. Barbosa – [email protected]

(22)

Conclusions

●

We have proposed the use of a clustering algorithm to

minimize the effort expended in the evaluation of codes in

introductory courses

●

The results suggest that it is possible to minimize the

evaluation effort expended (Strong agreement between

specialist – cluster approach)

●

This research is an ongoing work, much investigation is

still necessary

–

_{comparison of different clustering techniques}

(23)

23 References

[1] McCracken et. al., “

A multi-national, multi-institutional study of

assessment of programming skills of first-year cs students

.” ItiCSE 2001

[2] Stegeman, M., Barendsen, E., Smetsers, S.: “

Towards an empirically

validated model for assessment of code quality.

”. International Conference

on Computing Education Research 2014

[10] Yulianto, S.V., Liem, I.: “

Automatic grader for programming assignment

using source code analyzer.

” ICODSE 2014

[11] Rego, M.G., Dantas, A., Dalton Serey Guerrero “

Can Computers

Compare Student Code Solutions As Well As Teachers?

” Symposium on

Computer Science Education 2014

[12] Biggers, L.R., Kraft, N.A.: “

Quantifying the similiarities between source

code lexicons.

” ACM-SE 2011

(24)

References

[13] Li, S., Xiao, X., Bassett, B., Xie, T., Tillmann, N.: “

Measuring code

behavioral similarity for programming and software engineering education.

”

ICSE 2016

[14] Srikant, S., Aggarwal, V.: “

A system to grade computer programming

skills using machine learning

” ACM SIGKDD 2014

[15] Choudhury, R.R., Yin, H., Moghadam, J., Chen, A., Fox, A.: “

Autostyle:

Scale-driven hint generation for coding style.

” ITS 2016

[16] Yin, H., Moghadam, J., Fox, A.: “

Clustering student programming

(25)

Adaptive clustering of codes for assessment in

introductory programming courses

Alexandre de A. Barbosa – UFAL / UFCG

Evandro de B. Costa – UFAL / UFCG