Rendered at 13:22:31 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
andyfilms1 18 hours ago [-]
I genuinely think any ML approach to detecting ML will always be unreliable. Models can be intentionally poisoned or tricked, and there is a lot of incentive from AI users to do so. It will always be a losing battle against a moving target.
I think in the long run, deterministic algorithmic approaches with complex pipelines will be needed.
miguel_martin 15 hours ago [-]
I disagree.
1. Constructing a "deterministic algorithmic approaches with complex pipelines" is an ML approach. You're simply changing how you optimize (e.g. gradient descent with human constructed rules) and what you are optimizing (i.e. the model from a statistical one to a set of rules similar to a decision tree)
2. "Models can be intentionally poisoned or tricked" this is adversarial examples. Your deterministic and complex pipeline will have attack vectors, but just of a different distribution compared to an LLM (or neural net in general). Adversarial examples are likely unavoidable, you will always have a set of inputs that will cause your model to mis-classify examples. You can aim to minimize the size of this distribution/set, but for language: the set of possible inputs is so large that you will never fully be able to train or test on them all, and thus you will always have a back & forth between finding new attack vectors vs. defending against them: "deterministic" or not.
To expand on 1:
How do you construct a complex pipeline? Hopefully, by following roughly standard ML principles.
That is, you have a train set that you observe and find patterns/rules in. Then you iteratively construct your complex pipeline until you've minimized the error for a train set. Hopefully after this initial version is constructed you evaluate it on your (independent) val set. Then you iteratively improve your complex pipeline until your validation numbers improve. In the end, since you've optimized a val set, you need to use a third independent test set to ensure that you haven't overfit to your val&train sets. This is standard ML practice.
In other words, this process is what an "ML approach" is, just manually performed by a human possibly using some data analysis. Again, you've just replaced the optimization process (e.g. from gradient descent) and the underlying ML model (e.g. an LLM with differentiable parameters) with a more "deterministic approach" similar to a decision tree.
Yes you could automate this process to construct the rules and chain them, in which case your process and your complex pipeline will likely look similar to a decision tree (e.g. xgboost), but you're simply closer to the thing you think you are trying to avoid.
mentalgear 18 hours ago [-]
agreed, and I might add: the name is neuro-symbolic or hybrid.
promptunit 17 hours ago [-]
[flagged]
2 days ago [-]
deminature 16 hours ago [-]
This research has been commercialized by a company called Pangram, who sells access to AI detection as a service, via an API.
observationist 15 hours ago [-]
The low bar for human quality makes this a more or less nonsensical endeavour. Trivial edits like introducing deliberate misspellings, common transposes, and an occasional autocorrect candidate breaks the semantic patterns that LLMs are designed to produce. Throw in things like humanizing skills, a good, stylometricly comprehensive prompt framework, and a systematic approach to the task of producing human-like text, and you can defeat the detectors completely.
The false positive rate in identifying human writing as AI nullifies any particular advantage in systematic detection.
At best - at the absolute best, ideal, perfect case scenario - a system like this will be suitable to flag a piece of writing for review, and additional evidence, context, and reasoning will be required.
A majority of the time, this will be used in a lazy, cover-your-ass corporate fashion to arbitrarily "detect" and penalize users, students, or other targets.
The fundamental issue is that the false positive rate is so high as to make the statistical value of any particular detection nearly null. It doesn't matter if it detects 99.99999% of AI writing if it also deems 15% or more of human writing to be AI as well.
I don't know that it's 15%. I suspect it could easily be that high. Even if it's 2%, that's unacceptable in any situation for which there are significant consequences for a false positive - derailing an academic career, automated rejection of resumes, etc.
The moral purview of peddling this sort of detection as a service is somewhere deep on the wrong side of the line between neutral and evil.
People need to sue the ever loving pants off of companies that sell this shit to schools and companies and universities, because a handful of ignorant administrators have nowhere near the competence and understanding of how to properly mitigate the damage they will inevitably cause through the gratuitous use of this sort of automation.
Company 1: Imagine you have a drug test and you randomly test employees. It's 100% accurate at detecting meth use. It has a 15% false positive rate.
Company 2: You randomly drug test employees. The test is 95% accurate at detecting meth use. It's got a .000015% false positive rate.
See the issue? Let's say the bosses mandate that there's a zero tolerance policy and that any indication of meth use means termination on the spot.
If the incidence rate of meth use is a standard .5%, of 1000, and they randomly test 2 people per week for a year, how many people does company 1 fire, and subsequently expose themselves to liability for wrongful termination? What about company 2?
The base rate fallacy, or false positive paradox, is a huge problem with AI detectors. Company 1 would fire 16 people, all of whom would be overwhelmingly unlikely to be actual meth users. Company 2 would fire 1 person every other year, and they'd be almost entirely certain that the detection was legitimate.
Software like this might be good at detecting one-shot, lazy, rewrites. If you're a big AI platform, you might have some clever steganographic tricks up your sleeve to watermark text. The second someone puts effort into it, they become completely indistinguishable from the majority of human writers, to the extent that the false positive rate becomes unacceptable for use in any real world scenario. Throw in the fact that kids are enthusiastically learning their vocabulary, writing styles, and textual mannerisms from ChatGPT, Claude, and Gemini, and it makes the commercial use of detection software an outright ignorant, twisted, and evil thing to do.
I think in the long run, deterministic algorithmic approaches with complex pipelines will be needed.
1. Constructing a "deterministic algorithmic approaches with complex pipelines" is an ML approach. You're simply changing how you optimize (e.g. gradient descent with human constructed rules) and what you are optimizing (i.e. the model from a statistical one to a set of rules similar to a decision tree)
2. "Models can be intentionally poisoned or tricked" this is adversarial examples. Your deterministic and complex pipeline will have attack vectors, but just of a different distribution compared to an LLM (or neural net in general). Adversarial examples are likely unavoidable, you will always have a set of inputs that will cause your model to mis-classify examples. You can aim to minimize the size of this distribution/set, but for language: the set of possible inputs is so large that you will never fully be able to train or test on them all, and thus you will always have a back & forth between finding new attack vectors vs. defending against them: "deterministic" or not.
To expand on 1:
How do you construct a complex pipeline? Hopefully, by following roughly standard ML principles.
That is, you have a train set that you observe and find patterns/rules in. Then you iteratively construct your complex pipeline until you've minimized the error for a train set. Hopefully after this initial version is constructed you evaluate it on your (independent) val set. Then you iteratively improve your complex pipeline until your validation numbers improve. In the end, since you've optimized a val set, you need to use a third independent test set to ensure that you haven't overfit to your val&train sets. This is standard ML practice.
In other words, this process is what an "ML approach" is, just manually performed by a human possibly using some data analysis. Again, you've just replaced the optimization process (e.g. from gradient descent) and the underlying ML model (e.g. an LLM with differentiable parameters) with a more "deterministic approach" similar to a decision tree.
Yes you could automate this process to construct the rules and chain them, in which case your process and your complex pipeline will likely look similar to a decision tree (e.g. xgboost), but you're simply closer to the thing you think you are trying to avoid.
The false positive rate in identifying human writing as AI nullifies any particular advantage in systematic detection.
At best - at the absolute best, ideal, perfect case scenario - a system like this will be suitable to flag a piece of writing for review, and additional evidence, context, and reasoning will be required.
A majority of the time, this will be used in a lazy, cover-your-ass corporate fashion to arbitrarily "detect" and penalize users, students, or other targets.
The fundamental issue is that the false positive rate is so high as to make the statistical value of any particular detection nearly null. It doesn't matter if it detects 99.99999% of AI writing if it also deems 15% or more of human writing to be AI as well.
I don't know that it's 15%. I suspect it could easily be that high. Even if it's 2%, that's unacceptable in any situation for which there are significant consequences for a false positive - derailing an academic career, automated rejection of resumes, etc.
The moral purview of peddling this sort of detection as a service is somewhere deep on the wrong side of the line between neutral and evil.
People need to sue the ever loving pants off of companies that sell this shit to schools and companies and universities, because a handful of ignorant administrators have nowhere near the competence and understanding of how to properly mitigate the damage they will inevitably cause through the gratuitous use of this sort of automation.
Company 1: Imagine you have a drug test and you randomly test employees. It's 100% accurate at detecting meth use. It has a 15% false positive rate.
Company 2: You randomly drug test employees. The test is 95% accurate at detecting meth use. It's got a .000015% false positive rate.
See the issue? Let's say the bosses mandate that there's a zero tolerance policy and that any indication of meth use means termination on the spot.
If the incidence rate of meth use is a standard .5%, of 1000, and they randomly test 2 people per week for a year, how many people does company 1 fire, and subsequently expose themselves to liability for wrongful termination? What about company 2?
The base rate fallacy, or false positive paradox, is a huge problem with AI detectors. Company 1 would fire 16 people, all of whom would be overwhelmingly unlikely to be actual meth users. Company 2 would fire 1 person every other year, and they'd be almost entirely certain that the detection was legitimate.
Software like this might be good at detecting one-shot, lazy, rewrites. If you're a big AI platform, you might have some clever steganographic tricks up your sleeve to watermark text. The second someone puts effort into it, they become completely indistinguishable from the majority of human writers, to the extent that the false positive rate becomes unacceptable for use in any real world scenario. Throw in the fact that kids are enthusiastically learning their vocabulary, writing styles, and textual mannerisms from ChatGPT, Claude, and Gemini, and it makes the commercial use of detection software an outright ignorant, twisted, and evil thing to do.