Writing a Plagiarism Detector Using Python
This project put our Python skills to the test. It involved developing five different algorithms that would compute the similarity score between two Jupyter Notebooks.
Overview
The data consists of 30 Jupyter Notebooks. These were homework sets from a previous semester. Five different algorithms were developed by our group:
- Rabin-Karp algorithm of comparing strings
- SequenceMatcher from the difflib library to compute the similarity ratio between two cells
- SequenceMatcher comparing lines instead of characters
- RapidFuzz applying the fuzz.ratio to calculate similarity score
- Itertools combinations
To incorporate of all of these algorithms into a single plagiarism detector, we created a meta-heuristic algorithm that combines all five of the algorithms created by our team.
Take a look at the project below.