The Problem
There is a problem in the ‘infosphere’. In a nutshell: we now have high quality text that has low, zero or even negative value for the reader. Authorship is no longer something you can trust implicitly, even science is not immune to this and the problems seems to be set to become significantly larger in the near future. You could solve this by mandating authors label their work product but if there is money to be made by mis-labeling then you can expect people to do so.
Wikipedia has been fighting this trend for many years, they position themselves as a tertiary source: they will require references from secondary sources and those secondary sources should report on primary sources. This serves as an input filter of sorts because the assumption that secondary sources are going to do their homework was a good one. Until now.
Secondary sources are now incentivized to no longer do their homework. Just about every secondary source now uses automatic ways to generate content where it seems quantity, rather than quality has become a driving force. This is the equivalent of replacing quality home cooked food with fastfood: it is much easier to produce low quality food in quantity than it is to produce high quality food in quantity, if it can be done at all.
AI is a force multiplier here: it allows for the generation of immense quantities of very good looking text but with low quality. The information diet equivalent of fast food. And unlike fastfood, which usually is at least nourishing even if it isn’t the best of foods, the AI work product as a rule is poisonous: it contains ‘hallucinations’, which when allowed to take root can cause false information to spread widely. Now, it could be argued that fast food is when taken in quantity also poisonous so maybe the analogy is stronger than I intend for it to be. But the fact is that a single hamburger is not going to kill you. But a single falsehood just might, especially when the subjects are health, technology, engineering and so on.
Of course ‘unassisted humans’ also make mistakes. But we have ways to deal with that: reviews and editors. At the quantity of information produced that is manageable. But with the volume of automatically generated text exploding and absolutely dwarfing our combined work product it is a matter of time before the ‘slop’ will outnumber the combined human product by a considerable margin. Humans won’t be able to review or edit any of that simply because there wouldn’t be enough humans to do and besides the whole point seems to be to not employ all of these humans in the first place.
A possible solution
There are a couple of solutions but they are all going to rely to some degree on collaboration, ethics and honesty, both corporate and individually. If the advertising industry has taught us something then I guess that we do not have to hope for this to be successful. And bad actors will be able to put one over on good actors by gaining an unfair advantage. And since we only seem to want to punish people who gain an unfair advantage in life by clobbering old ladies over the head to make off with their handbags I have little hope that this will succeed. But I still think it is worth trying because the idea of giving up what we’ve achieved so far seems to me to be too high a price to pay.
AI - obviously - has its place. It can help us to have a personal teacher - who you can’t quite trust not to pull the wool over your eyes - with access to a vast amount of information. But as soon as you hit the ‘prompt / generate / cut / paste / profit’ cycle you are no longer operating from the same playbook as the rest of humanity: you are passing off the AIs work product as your own, whereas all you did was to create the prompt. And that’s fine, but then you should at least be transparent about it.
In order to help AI companies to create better products it would be good if we could create the digital equivalent of a sworn statement of authorship. I can see several ways of doing this and one in particular has my fancy because (1) it scales and (2) it would be very easy to implement.
Classification
Let’s start off by grading the different kinds of works
Original works. These are 100% human created, there is absolutely no question about the provenance of the text. We know who the author is, the author has made a clear statement of provenance and there is no reason to doubt that statement.
Assisted works. These are hybrids: the author has put considerable work into the text but has used AI to either stylistically improve the work or because some of the text has been AI generated. Ideally that would be a minority, as soon as you cross the 50% mark you will be in the next class.
AI generated, human augmented. These are hybrids too, but from the other side. Now the balance has shifted to majority AI input and minority human.
AI generated. 100% AI generated, no human input required.
One way to be able to distinguish what is human written and what is AI written would be by extending the Unicode character set with a complete duplicate and to legally mandate that all AI companies are only allowed to use this character set for their output. Transliteration to the human set would be strictly forbidden and should result in some civil penalty. Shipping an AI model that does not output in this transliterated set would result in a criminal penalty.
That’s harsh, yes. But I think that there is enough at stake here that this kind of measure is more than warranted. And AI companies have a huge incentive to play ball here: for one their execs would likely not want to be found engaging in criminal acts, for another they would be able to avoid that one thing that they’re all struggling with: to avoid eating their own (and each other’s) tail. Because AIs trained on slop will eventually degrade to the point where progress will stall or even reverse. Pre-AI text will have roughly the same value as low radiation steel (https://en.wikipedia.org/wiki/Low-background_steel). By labeling their output in this way AI companies will be able to sift out their own and other AI companies’ output.
Finally, a human run attestation service could be run: a sufficiently secure cryptographic hash run on a piece of input text that measures the ratio of AI vs human generated text in the input could give a meta label that can easily be affixed to the content. Search engines and AI ingestion could then use these marks to differentiate between various sources and hopefully use this as a degree of measuring quality and giving higher quality results priority. So you don’t accidentally end up biting into fast food when you were expecting something much better.
Problems with the above
Of course, it is easy to dismiss this all out of hand because it isn’t - and won’t be - perfect. I can already see many sub-problems that will need to be resolve and the knee-jerk reaction will - of course - be that the race is already run and it is much too late to do something about it now so we may as well lie down and be run over by the bulldozer.
But I’m an optimist by nature: I think we can recognize that even if this plan is flawed the spirit of it has merit, fix the flaws and then implement them to the best of our ability. It’s the spirit that counts here, not the letter, so let your imagination loose on taking the idea and improving on it rather than to bash it with the first middlebrow dismissal that comes to mind.
–
This text was written by hand, using a brain and a computer in the most passive way possible, to record keystrokes. No automatic processes contributed to the content. AI percentage: 0.