30 Mar 2020
How I Kinda "Cheated" on my Schoolwork
As the education system is stepping into the modern era, more and more textbooks and exercise books are adapting technology into their contents. Well not like the sci-fi movie where a 3D hologram of the heart pops up from the book, we are still well beyond that (but hopefully we are still inching closer to it!)
As for my time there are QR codes for students to access additional content such as explanation video, very simple 3D model of alkane and answers of the exercises in the books.
This reminds me of the increasing amount of this “IT advancement” in more and more educational books nowadays, where the introduction of KSSM includes even more of it inside their contents.
To be honest I looked through some of the videos and quite a lot of them are just simple slideshow videos repeating what the textbook already said plus some content with less than 1000 views… (Except some, including one with TED-ED!)
But credit where credit is due, it’s a good step towards integrating technology into the education system and can influence some students to have more passion into their respective subjects.
1BestariNet (one of Malaysia’s past steps towards e-learning), failure or not, it had brought a lot of benefits to students and teachers alike. Goodbye froggie… |
But introducing the IT system inside exercise book may lead to some degree of backfiring… In one case my Biology reference books (that are just like fill-in-the-blanks children’s books) include pdfs of answers to all the exercise chapter by chapter. You should know what it probably leads to…
Everyone started copying the answer, and some (including me) even printed out all of the chapters so we could write all of it down during the boring Biology class. Thank you teacher! Alas the next year teacher decided to change to a different book publisher so no more copying! But at the end we still managed to copy all of it down from teacher’s book😏😏 (I still do revision, teacher!)
This leads me to another case where I’m able to copy all of the answers to my schoolwork which only I knew…
The heist
In the new year, my Malay teacher introduced us to a workbook for us to use. And to be honest the content is mediocre with quite a few errors in it which justify my heist to crack my workbook! You might see what I’m going with this…
For every exercise, there are QR codes below which students can scan for answers. It leads to a Dropbox site with a folder of pdfs consisting of the answers of the workbook pages. At that time I thought I discovered a gold mine but nope, each page is password-locked and only the teacher’s book has the passcode. That means that teachers can either give answers through their book or just give out the password to the students so they can access it themselves and “mark” their answers.
Fortunately the answers are in a Dropbox folder since I can go to the parent directory and just download every file (or page) simply. I tried to use the old link I still have in my QR scanner (and later realized I already have them before) and the url shortener now leads to Google Drive. Damn it!
List of the pdfs, ‘kr’ means karangan (essay) and ‘ku’… I actually forgot what it is. |
There were a total of 165 pdfs categorised to 6 sections, all with one page of answers each. So after collecting all the pages, I compressed the document online (they were quite large if I remember correctly) and renamed all of them accordingly with its page number and category with python.
Here is the pattern for the password: a seemingly random Malay word followed by ‘123’.
What a secure password! But of course I do not expect the publisher to put a random hash for their password. Imagine the teacher painstakingly spelling out the hash one by one…
Teacher: Dear students, please write down the following password to access the answer: ‘sixteen c eight f eight a c seven b…’
Students: Ermmm what?
But actually for almost every page I had seen, the word choice for the password are taken directly from one of the words in the pages.
One of the sentence, it’s ‘daripada pelbagai peringkat’ not ‘dari’… |
Why is there a common mistake in a book? *facepalm* |
So the first intuitive option is to take every word out from the page, and then put ‘123’ at the end and try opening the document with all of the words.
But the obvious downside is that we can only access the document if we have the password, so what’s even the point? Unless we have the spare time to open the book and type out all of the passwords manually, there’s no way anyone wants to do that! (Some might do it in this quarantine period though…)
The second approach is to bruteforce every word, plus ‘123’ appended. Specifically, dictionary attack. So I search around the Internet for a Malay dictionary list and then use Notepad++ to append the number in every line.
Now we need to write a script for the bruteforcing process. The simple workflow should be like this:
- Open the pdf with the first dictionary words as password.
- If failed, try the next word.
- If successful, print out the password and/or decrypt the pdf.
Before we proceed, we need to have a pdf library. I tried using PyPDF2 but it is a little bit of a hassle to do it. (I forgot why tbh)
So I decided to just use a binary program and use subprocess
to run it because I found a command-line program called QPDF that can decrypt pdf with just a line of code. Yeah, I’m a bit lazy at that time.
After we gather our encrypted pdfs and the pdf program, we can start cracking! (or coding)
The intresting part?
First of all, we put all the encrypted pdfs into a folder and the pdf binary program in another folder. Using os.listdir()
, we can put all of the files’ names into a list in Python. This enables us to loop all of the files.
import os
import subprocess
files = os.listdir(input_path)
for file_count, file_name in enumerate(files):
Later, we can loop the dictionary lists and attempt to decrypt the file every line of the way with QPDF. But we must catch the error if we failed to decrypt the file (which will always be the case).
dictionary = open("malay-dict.txt").read().splitlines()
for dict_count, dict in enumerate(dictionary):
command = "bin/qpdf --password=" + dict + " --decrypt " + file_path + " " + output_path
try:
subprocess.check_output(command, stderr=subprocess.STDOUT)
We can run the program first to see what is the name of the error, then catch the error and continue with the loop until all of the words are used or the password is found.
except subprocess.CalledProcessError as e:
if count >= len(lines):
# No password found!
else:
continue
We can make the code work in its barebone form. But we can add some logging and printing so we can keep track of the progress of the decryption. Here’s one I wrote at that time:
datetime.datetime.now().strftime("%H:%M:%S") + " | " + jt
+ " - password found: " + i + " - " + str(count) + " words | " + str(round(elp,1)) + "s | "
+ str(round(count/(elp),2)) + 'w/s'
Yeah, I have no idea what the code is about. But it work(ed)!
I tried to run the same code again. But as of the custom of programming, every code must be non-functional after a random number of year/s. |
I also implemented a feature that you can close the program at any time and resume its operation after that. Sounds cool right?! (I just look into the output directory and skipped all of the decrypted files lol)
The results
Running the script for the first time, I thought “Eh, will this even work?” But with more and more pages decrypted, I slowly realized that this isn’t just a simple script…
It took around 13 hours to complete decrypting 165 files with 2 069 250 operations done. That means on average 43 ops/s and 156 010 ops/h were done, and that’s a lot of bruteforcing!
Don’t expect me releasing all the passwords here. Ask your teacher instead! |
But unfortunately, there’s two passwords that are yet to be found: page 75 and page 184. And till this day I still wonder what words are missing from the list…
I also combined all the files with the same category and compressed it. And there I was, I had 4.84MB of my workbook’s 163 pages answers sitting on my hard drive. I’m having doubts about my time spent on writing the script at that time.
And a few months later when I decided to share this with my classmates, my teacher already moved on to other exercises… Oh well, at the end we did not finish the whole workbook though.
The takeaway
This is just a fun little project I’ve done (not to encourage you to do what I did though). And this proved that passwords’ strength and variablity are tangible and an important part of keeping your account safe from intruders or hackers.
Let’s said you have a sophisticated password on your account, maybe you add numbers, multiple symbols and your crush’s date of birth. Can you guarantee that nobody can crack into your account?
I have personally signed into a lot of websites in the past, and I unashamedly put the same passwords into almost all of the websites (except the more important ones) and just get along with it. And now I have been pwned from more than ten leaks!
When hackers know about your passwords, they can quickly guess the pattern of your password for your other accounts (or you use the same password!). For example, if you add the website’s name after your password, hijackers can just apply that to enter into your other accounts.
You can imagine the 165 files as accounts with passwords. Just knowing one password can let hackers access 99% of your accounts with just a dictionary attack! (Assuming your passwords are really weak…)
So what are you waiting for? Change your passwords now!! (if they are insecure)