When my wife is asked what I do her generic response, “he's in data security,” is typically sufficient to change the topic. When it's not, it's usually because the questioner is somewhat educated on technology and prods further out of curiosity. And that's when she'll artfully try to change the topic.
But that all changed over this past weekend when my wife (a professor) came to me with a problem: She suspected that one of her graduate students had plagiarized her dissertation but she didn't have a good way to prove it. Evidently she's got a school database that contains a ton of published dissertations but there's no real way to compare this girl's dissertation to any in the database, since the database only has a keyword search of the abstract, not the full dissertation (let alone any deep content analysis).
We talked generally for a while about the problem and then it hit me… what if I downloaded a bunch of relevant dissertations, fingerprinted them with a DLP solution, and then sent the girls dissertation through the systems analysis engine for comparison? Would the DLP solution be able to detect plagiarism? It almost seemed too simple.
I explained that using DLP I might be able to compare her student's paper to any dissertations she identified and downloaded from the database, so even if only a paragraph was taken out of a 250 page paper, we'd know it. She seemed intrigued.
So when we got home from lunch I started up my laptop and RDP'd into my DLP system. I had my wife download a bunch of relevant dissertations from her school's database, and within minutes I fingerprinted roughly 50 dissertation files, many of which were a couple hundred pages in length, and built a policy to block transmission of any of the data in those files. I then took her students dissertation and emailed it from a client station to my personal email. Now because the system was monitoring SMTP traffic it sent the email (with the student's paper as an attachment) to the content analysis engine. I waited a second… another… and then I impatiently hit send receive… and there it was, an automated notification telling me that my email had violated my new policy and had been blocked.
My wife was grateful but equally upset by her student. She thought what we'd done with DLP was pretty cool and I have to admit, so did I. I had never thought of using DLP in this application. If you've found others, please share them.
So, there were three things that I took away from this experience:
1) Don't plagiarize. Some poor sap just spent 5 or 6 years and a ton of money to get a graduate degree, and I suspect she's now screwed.
2) Don't plagiarize if your professor's husband is in the DLP business. I mean, we all know cheating is stupid; but cheating with someone that has the upper hand is REALLY STUPID.
3) My wife finally knows what I do. I'm still not sure she could entirely explain it, but she's certainly got a good story she can tell.