Does Using an LLM During the Hiring Process Make You a Fraud as a Candidate?

I saw a post on LinkedIn from the Director of a Consulting Firm describing how he assigned an essay about model drift in machine learning systems to screen potential candidates.

Then, based on criteria that he established based on his intuitions ("you can smell it") he used four different "AI detectors" to "confirm" that the applicants used ChatGPT to write their responses to the essay.

Adam Sroka on LinkedIn: I auto-rejected a candidate using LLMs in the hiring process. I set a… |…

The criteria for "suspected" bot-generated essays were:

weird sentence structure
wonky analogies
repetition
switching from one English dialect to another (in separate writing samples from the same application)

One criteria notably missing: accuracy.

The rationale behind this is that using AI tools is trying to subvert the candidate selection process. Needless to say, the comments are wild (and very LinkedIn-core).

I can appreciate that argument, even though I find his methodology less than rigorous. It seems like he wanted to avoid candidates who would copy and paste a response directly from ChatGPT without scrutiny.

However, I think this post raises an interesting question that we as a society need to explore – is using an LLM to help you write cheating during the hiring process?

I would say it is not. Here is the argument for why using an LLM to help you write is just fine and why it should not exclude you as a candidate.

As a bonus for the Director, I’ll include a better methodology for filtering candidates based on how they use LLMs and AI tools.

Humans can do bad writing without AI

Let me be clear. There are many occasions when using text generated by an LLM is not ethical. For example, when writing an article for Towards Data Science, when it is expressly forbidden for artistic purposes, or when you are creating training data for other LLMs. There are other good reasons as well.

But, I’ve been thinking a lot about AI content detection lately and what features might be used to detect it. How do they capture the je ne sais quoi of robot writing? What features are they using to determine that is a block of text is a little too uncanny to pass as human? Teachers who have become accustomed to their fifth-grader students’ writing could tell right away. That’s because the writing was (technically) better.

When it comes to the Director and his candidate exclusion methodology, the criteria he outlined simply describe "bad" writing – which humans are quite capable of doing without assistance from technology. He never said their answers were wrong.

So, why not just exclude them for bad writing, and not because they use AI tools? I would venture to guess that many of them didn’t use AI, because if they did it would have made their wonky answer better.

Let’s use an example from the world of cinema. Please utilize the "necessary suspension of doubt" framework to digest this example without examining the historical accuracy of the movie.

In the movie Amadeus, a semi-biographical period drama from 1984 based on the life of Mozart, the composer Salieri is depicted as being tortured by jealousy for Mozart. The root cause of his jealousy is that Salieri is also an expert composer and he understands, on a technical level, what makes Mozart’s music so superior. Salieri is skilled and experienced, but Mozart is gifted. Even though Salieri is one of the great composers of his time, it took Salieri a lifetime to accomplish things Mozart accomplished as a young man.

This sentiment is summed up perfectly by Alicia of Blue Pencil Beading in her 2015 blog post, "Queen of the Night":

You’ve got to have talent – but you’ve also got to have the discipline to use that talent as best you can. Imagine what Salieri could have done with Mozart’s gift for easy composing. Imagine what Mozart could have done with Salieri’s drive and ability to focus (and climb the social ladder in the imperial court). Mozart squanders his potential literally farting around Vienna, and dies with one of his greatest works unfinished. Salieri labors too much over the form of his pieces: they sound difficult and forced and even semi-idiot emperor Franz Joseph can tell something’s missing.

If we apply the same framework to understand the the ethics of using AI to help you write, consider what ChatGPT (Mozart) can accomplish in less than a minute. Then, consider how much better the technically talented engineer (Salieri) could accomplish if expressing themselves was less laborious.

"Mozart Family Portrait" By Johann Nepomuk della Croce (Public Domain)

If the motive behind excluding candidates based on their use of AI tools is that it calls their credibility into question, I would challenge that. If a candidate produced a well-written and factual response, with no dialect-switching and weird analogies, would the Director have checked it in the detector in the first place?

If he did, would he still exclude the candidate if they used AI to generate a superior response? If a candidate can recognize and curate the quality of the response, it suggests that they are competent. A good candidate could spot the hallucination and revise the answer.

If you want someone who is a great writer, without any assistance at all, make it clear that is what you are after. If you want a machine learning engineer who can prioritize their time, don’t give them a secret writing test. Asking for an essay as part of applicant screening is a little suspect as it is – but the added AI detection "gotcha" at the end affirms that you do not have a lot of respect for candidates.

AI Text Detection is the Blood Splatter Analysis of Machine Learning

Have you ever watched the show Forensic Files? It was a true-crime show that was on CourtTV in the 1990s and 2000s. Each episode told the story of a crime and how the investigators solved it using forensic science. Blood splatter, bite marks, footprints in the snow – you name it. As a kid, I thought the science was rock-solid.

Re-watching some episodes as an adult gave me pause, for good reason. Their slogan these days seems to be "No witnesses, no leads, no problem."

Seems like a problem to me.

Despite mounting evidence that bite marks and blood splatter are not reliable tests to determine the nature of a crime or the identity of the criminal, non-expert judges still accept this evidence. Then juries of lay people convict innocent people based on bad science.

A 2019 NBC article quotes Alicia Carriquiry, director of the Center for Statistics and Applications in Forensic Evidence, a government-funded project to measure the limits of forensic methods.

According to Carriquiry:

"If we don’t have technologies that are objective, repeatable and reliable, then we have no idea how many times we’re making the wrong decision…We don’t even have a way to estimate how many times we’re making the wrong decisions."

Sure, publishing a blog online is somewhat less consequential than a violent crime. But it is a serious matter that people’s lives and livelihoods are being impacted. AI detection tools are neither repeatable or reliable.

In a blog article, Linda Carrol discusses how she ran her organic writing through an AI text detection tool and discovered that much of her writing was getting flagged as AI.

There’s a real irony to using AI detection tools in the first place. Because AI was trained on us. Literally. They fed it giant swaths of the internet to teach it how to write. Which basically means it learned from us. And now if we write like ourselves, we fail.

In a similar article that Carrol published in Medium, numerous creators lament about how their writing was flagged as AI-generated content. The problem of false positives in AI text detection and the frenzy to detect AI text is harming the earnings of freelance writers who have been honing their craft for years, and whose writing was used without permission to train the LLM in the first place.

What is interesting about the Director, is that he does not appreciate the irony in using AI detectors to detect AI content, because he doesn’t trust the outputs. Why do you trust the output of the detector, but not the AI itself? Is it possible that it’s people you don’t trust?

It almost seems like AI text detection tools hype up a problem so they can offer you a solution. And the best part is, like shoddy forensics, is that we have no way of measuring how wrong we are.

You Are Already a Cyborg

Even before ChatGPT, algorithms were running our lives. They were also (indirectly) generating content. Recommendation algorithms privilege certain content over other content, putting more eyeballs on that topic. Then, we say that it is trending and more content is generated about it.

On Twitter/X, TikTok, YouTube, Instagram, or any other social media platform, creators make content based on what performs well. They get feedback, and they adjust.

Influencers inform what we buy, the food we eat, and how we speak to one another. This is not a new phenomenon. Languages evolve and change as young people freshly discover language each generation and make it their own. These habits are part of our individual and collective experience. Culture has always relied on the wisdom of crowds, like the ensemble algorithm that tells the latest predict-a-text toy which word to spit out next.

When I was in middle school the rise of spell-check struck fear in the hearts of educators because they thought we would all forget how to spell. I’m happy to report that we are all fine. I even got a little bit better at spelling, because the words I commonly misspelled were pointed out in real time, giving me the feedback I needed to get used to the correct spelling. That hit differently than getting the paper back, marked in red, three weeks later. Instant feedback is meaningful.

That’s because humans learn how to communicate and how to write based on our observations, interactions, and feedback. While I think that more research is needed to understand the role of AI tools when children are learning to write for the first time, for adolescent and adult writers, interactions with ChatGPT might help make them better writers. The consequence of that is that there might be more uniformity between stylistic choices made by ChatGPT and human writers.

In a newsletter published by North Carolina State University in March 2023, English Professor Chris Anson writes:

In such cases, using AI-based natural language processors (NLP) to fulfill the task does little to challenge the cognitive processes of the human writer. The reason is that the writer is ordinarily not significantly changed **** by the writing task, especially when it uses boilerplate-like language – language that is often repeated with a few unique details inserted. Instead, these softwares improve efficiency and make time for the person to do higher-level, more cognitively sophisticated kinds of writing.

If you provide ChatGPT with a good prompt and a draft that has misspellings, lacks structure, and could use some more variety in phrasing, ChatGPT will likely give you back a better paper.

If you do that enough times, you begin to pick up on patterns that can help you write better. Then, you develop your own little algorithm for expressing yourself. Human writers are lucky if they can get one person to look at their paper and help them do a rewrite. ChatGPT can help you do multiple revisions in a single session (and you don’t have to buy them coffee).

However, Anson also warns that students using ChatGPT to generate responses to topics that they do not understand is a threat to education system and can negatively impact students by providing unchecked false information. It is likely that accuracy of these models will improve over time, but students still need to know how to evaluate responses from LLMs for factual accuracy.

Consider the following, from the same article by Anson:

Because NLP systems increasingly will become part of our daily lives, educators need to find principled ways to integrate them into instruction. For example, I asked ChatGPT to explain the cognitive consequences of machine-generated writing, it gave me an additional idea that I had not considered, but only after I had written my statement. The software’s response? "Machine-generated writing may lead to a homogenization of writing style…"

There are two good points here. One is that our efforts should be spent trying to figure out how to incorporate AI into learning experiences in a way that benefits the learner and society.

The second is that we are training the models, and then the models in turn train us.

Our writing style will change through our interactions with ChatGPT, and we have no way of knowing how quickly that will happen. As we continue to produce content, and models are updated, they will change with us. As our relationship with the tools becomes more intertwined, how will we suss out the robots masquerading as students?

Instead of trying to "catch" students cheating with ChatGPT, we need to show them how to use it in an ethical manner that enhances their learning experience. Why are we devoting so many resources to preventing "cheating", instead of devoting resources to researching how we can use these tools to improve the education system?

A Better Way to Screen Candidates Based on How They Use AI Tools

While the Director wanted an easy way to find out if his candidates were taking the easy way out, the method I suggest requires a little bit more critical thinking. The results will speak for themselves.

First, as an employer, consider what constitutes the ethical use of generative AI in your industry.
Then, ask your candidates to explain why using AI in that situation would be unethical.
Finally, AI text detection tools cannot flag poor ethics (yet). So you will have to evaluate these yourself.

Once you do, you will have a more meaningful way of measuring if the person you are hiring values the same things you do.

Humanize the Conversation about AI Ethics

There is no consensus on AI ethics.

We are all constantly bombarded by AI tools designed to magically make our lives better through the power of generative text. So people use them.

The feeding frenzy and unrelenting ads make it seem like everyone is using AI to get more done, faster and cheaper. Candidates are penalized for using AI, but employers are using the same technology to exclude us. Since the rules are so unclear, no person can know when it is OK and when it is not unless it is explicitly stated.

So where do we draw the line?

At building resumes or creating cover letters with Teal?
At generating cover letters on Swooped?
At using ChatGPT to improve the structure of your essay?
At running Grammarly to catch your mistakes or rephrase "wonky analogies" and "weird sentence structure"?
At using LinkedIn AI tools to improve your "About Me"?

I respect an engineer who knows how to manage her time and prioritizes something other than writing an essay model drift from scratch for an application with a company that is likely to ghost her (and possible call her out with great specificity on LinkedIn).

Would you hire an engineer that refuses to use a calculator, because they want to continuously demonstrate to you that they know how to solve the same equation?

Technology and humanity are evolving together. Human learning and expression has always been augmented by technology. I would implore the Director not to accuse candidates of trying to circumvent the system by using AI tools. Instead, reject them for their poor writing (as per your criteria) and stop giving the AI detectors free data.

Also, I wrote this all by myself. Let me know what you think about my human writing skills in the Responses 💬 !

No chatbots were harmed in the creation of this article. 🤖

If you enjoyed this article, check out this blog on Social Network Analysis with Python and NetworkX.

👩🏼 ‍💻 Christine Egan | medium | github | linkedin