Writing a Plagiarism Detector Using Python

This project put our Python skills to the test. It involved developing five different algorithms that would compute the similarity score between two Jupyter Notebooks.

Overview

The data consists of 30 Jupyter Notebooks. These were homework sets from a previous semester. Five different algorithms were developed by our group:

- Rabin-Karp algorithm of comparing strings

- SequenceMatcher from the difflib library to compute the similarity ratio between two cells

- SequenceMatcher comparing lines instead of characters

- RapidFuzz applying the fuzz.ratio to calculate similarity score

- Itertools combinations

To incorporate of all of these algorithms into a single plagiarism detector, we created a meta-heuristic algorithm that combines all five of the algorithms created by our team.

Take a look at the project below.

Writing a Plagiarism Detector Using Python Final Presentation.pdf