How does an AI detector work? Poking around the Technicalities
While a simple article cannot completely cover all the fine points of how AI detectors work, this article will give you enough information to make you sound smart as and when the topic comes up. In case the conversation moves into deeper waters, you can use the classic “I have an urgent phone call” line and wiggle your way out.
So, the basic question, ‘What is an AI detector?’ is a good starting point. In simple words, it is any tool that can analyse bodies of text and determine if the text is generated by AI or a human. Typically, most AI detectors can detect content from popular models like ChatGPT and Google Gemini. Good AI detection tools also keep a check on other models as well.
A good AI detector tool has a robust machine learning algorithm that is fed with large volumes of data comprising content created by AI and humans. The larger the variety of data the model is trained with, the better the tool. It first analyzes the text and then compares the data it is already trained on. This means that a good AI detector needs to be continuously updated to stay on par with the variety of tools in the market.
Typically, when you feed the content to the tool, the text is tokenized. In simple words, it is broken down into smaller units, which may be words or phrases. This is followed by normalization, where capitalizations are converted to lower case and punctuation as well as special characters are removed.
From here, the tool conducts a linguistic analysis where it measures the perplexity and burstiness of the provided text. Perplexity is the measure of predictability of the text, which is often high in AI-generated content. This is because AI has a predictable pattern of words as opposed to human generated content, which aids in readability. Burstiness is the variety of the length and complexity of the sentence, which is recorded to be low in AI-generated content as opposed to human content. Human generated content usually contains a mix of sentences with varying lengths.
The next step is called embedding representations. While you can write a few pages on this topic alone, for simplicity, you can say that AI uses certain models to help it understand the relationship between the words and the semantics of the sentences. This is done because, unlike humans, machines understand numbers, and the text needs to be mapped in a format that can be understood by the AI.
From here, a machine learning classifier starts sorting the text. The classifier does this by assigning a probability score to the text to help indicate the chances of the text being AI-generated or created by humans. The system does an additional check to ensure that it doesn’t flag false positives and negatives. This may involve analyzing the context and consistency of the text.
Finally, the results are provided in the form of what percentage of text is AI-generated and what percentage is human-made. Along with this classification, a confidence score is also allocated to ensure that we keep track of the certainty of the detection. From here, feedback is provided to the user, and feedback is taken to help calibrate the accuracy of the tool.
This is the general structure of how an AI detector works. Depending on the creators of the tool, there may be variations to ensure more accuracy. Some tools add provisions to gather input from the user in various ways, like submitting a link that has the content to be tested, uploading a document, and so on. While there is no tool that is 100 percent accurate, AI detectors strive to be as accurate as possible. This is mainly because new models are created and the existing models are improved upon. So, it takes time for the AI detector to be tested with a new data set.
Nonetheless, adding AI detectors as part of the workflow would be non-negotiable in this day and age where AI is not used responsibly. In the context of academia or any field where a person’s caliber needs to be tested, AI detectors must be used. And this need not be a time-consuming or expensive affair, as free AI detection tools like HireQuotient’s AI detectors are available on the market. It can take in large volumes of text, and it is constantly calibrated to achieve high accuracy. Test it out and see how you can optimize your workflow to accommodate this valuable tool.