Artificial Intelligence or Artificial Idiocy?
As a pioneer of statistical and neural learning technologies for natural language processing and (embodied) conversational agents, it is has been great to see the advances that larger and larger language models and clever use of embeddings, attention and filtering, have made in the last couple of years.
It is important to understand that large language models (LLMs) themselves are statistical models that predict what words and phrases are likely to come next, and a model like GPT4 is trained on a very expensive run through a very large fixed body of text (the corpus) - and so itself doesn't learn any more, and doesn't actually understand anything about the world or what it is saying. So no real intelligence there yet...
However, the same kinds of models can be trained on speech, where the units are phonemes rather than characters, and phrasing is conveyed by intonation rather than punctuation. Moreover, similar models are being trained with images and videos, and this does start to give us information about the world.
There are also risks that come from the social and legal pressures that are being brought to bear on the development of these systems.
When I started working with embeddings and LLMs in the 1970s (for speech and text), there were three big problems beyond the the actual language and learning domain: the ability to find large enough amounts of text and/or speech, the size and cost of primary and secondary storage (memory and disk/tape), and the speed of the computers and their storages systems. I was working initially with (multiple) 8 and 16-bit computers where the memory sizes where just a few KB (64KB was the limit) and disk sizes were just a few MB (my first hard disk was 10MB) and a large corpus was thus of the order of a couple of million characters.
Now large language models are trained on millions of books, billions of webpages, trillions of characters of text (viz. TB). They also have access to movies/photos/images and programs/code. A major risk thus relates to availability, quality, privacy and copyright. Have these materials been used and copied illegally? And should copyright holders get redress (demands including not just financial recompense but deletion of any LLMs that include their work or a tainted by illegal use of it as training data)? Or should the copyright legislation and its fair use exceptions be modernized to allow and control such use (as happened with fair use copying for print, audio and video)?
I am an interested party on both sides of this: my books and other publications have been indexed and analysed by Google and others, but I see this as different from pirating. It seems to me to be fair use as it enables me to search the web and find the books and papers - and see enough context to have a fair idea whether it is worthwhile obtaining and reading it (my university has subscriptions to most things I want, and can get others on interlibrary loans or as fair use copies). There are provisions in law for this, including payment of usage fees to copyright agencies on behalf of university and school users. It is also possible for this to be paid by the government or the benefiting industry sectors (and covered by appropriate taxes and tarifss, as e.g. happened with audiovisual recording media). This is the path I hope and expect governments will follow, but as usual technology moves faster than government, and the courts do the best they can, and the legislation that results is not always technologically sound and pragmatically useful (and indeed new lobocracy laws may contractict a user's fair use rights).
From the AI side of the question, the LLM does not actually include a copy of any particular copyright work. Rather information from many works are "embedded" in statistical and/or neural frameworks that find the commonalities and interrelations. So what is generated is unlikely to be a full or unfair (more than 10%) use of a work, although it is quite likely to generate phrases and statements (linguistic or programmatic) that appear in similar forms in multiple works. Most of the language (and code) we use is commonplace and idiomatic and only the names (or variables) change. What is not commonplace but unusual or novel is what actually constitutes intellectural property or literary or technical invention. If I use an LLM-based system as a sounding board to fleshout my ideas and bring them together, that is very useful whether that is coded in natural language, a programming language or the results are encoded as an image.
Over the years before ChatGPT, Copilot, Gemini and the like made their appearance, I was using similar techniques to provide a hands free interface (based on eye-tracking and EEG for people with disabilities) to allow searching the web, collating the results into a report, and provide/suggest appropriate quotations and citations. As a teacher, I teach students how to do this properly, quoting and attributing things properly and avoiding academic dishonest, plagiarism and the like. This is much like what these LLM systems try to do, although at the moment they don't do it very well - much like my undergraduate students. But because of outcries about copyright and plagiarism, or getting the LLM to do student's assignments, or not reflecting today's politically correct prescriptions and proscriptions, the systems are being hamstrung, downgraded and restricted to the point where they are not as useful as they could be (and indeed they are thus getting worse rather better in terms of utility).
The AI/LLM models that you are playing with may have both a fixed LLM model at the heart, other models trained to help with composing images or speech, prompts and filters that guide and censor them to produce answers of an acceptable form in an appropriate format, and so on. These additional layers can retain information within and between sessions, can look at images and can search the web. However, currently sessions tend to be limited with no direct memory of previous sessions and no actual learning across sessions, and the results of searches tend not to be retained fully even within a session - and the "robots.txt" limits on searches may impact the ability to refine the answer to an ongoing question (so it will want to start a new session on a new topic).
These conversations themselves may be used by a combination of human and automatic processing to improve the overall AI experience even though the underlying LLM hasn't changed. And of course, such experience can feed into future LLMs with greater quality control - although those Large Language Models take months of training on thousands of GPUs.
Writing reports and papers
One of the opportunities (from the point of view of employees and students) for these models is to research topics and write summaries and reports (and of course, they can also be used to try to identify and distinguish real human/student work from artificial/faked work).
The models are by their nature inclined to make up stories and facts, and are limited in their access to real facts (both those in the original corpus, due to the compilation into embeddings; and those in the searchable web). The report-writing wrappers around the LLMs may thus push them to write in a formal dotty way with references to the sources - although these sources need to be checked as they do not alway contain the "fact" asserted. A good way to test out the models is on area where you are expert (and for me that's me, my research areas and my writings).
From the perspective of a teacher we have several problems. One is that students who use them are not learning things themselves, and don't know the area well enough even to see what is right and wrong. Longer term however, it is appropriate for students to learn how to use AI tools to be more efficient and effective - but we have big problems ensuring the accuracy of the AI's results, which requires a separate fact-checking step, and ideally would involve grounding in the real world and actual understanding of what it is talking about.
From the point of view of a user, whether academic or personal, there is a real problem with us believing what the system tell us, even though it is often wrong and can be persuaded to change its mind and tell you something different. There are ethical issues as well with certain uses, including as a "friend" or "adviser" or "counsellor", that were already considered by Joseph Weizenbaum in his 1970 book, "Computer Power and Human Reason: from Judgement to Calculation" — which was written after the "success" of his famous 1960s Eliza/Doctor program.
However, this is a track we are going down with automated systems providing help and advice, and we are currently addressing the dangers of bad advice or even just lack of empathy.
There are even fake research papers being written with the help of these LLMs.
But there are also some missed opportunities. Currently most of the citations (hyperlinked/footnote references) are spurious. They will mention some relevant words but will not in general contain the actual fact or argument that attributed to them. I have never yet seen an LLM chatbot give me properly quoted and attributed references that link directly to the source (and give precise page numbers for work that appears in printed or printready form). Every single response I've ever received (of thousands, across many companies' models) would receive a fail in terms of scholarly presentation and academic integrity and critical acumen. And generally you can lead them to give you "facts" that agree with what you've proposed.
We don't need a committee of yes-AIs, but critical analysis, synthesis and appraisal. And yes, multiple AI models with different training and different purposes can be combined to help refine the question (prompts), collate and present the facts (search), and argue the pros and cons on any issue or proposal (SWOT analysis).
But as it stands, whether you select creative, balanced or precise (in models like Copilot that offer this), you are likely to get faction rather than facts.
Writing stories/books
So while these models may find it difficult to stick the the facts, surely that must mean that story-telling is a natural opportunity. Indeed there are now many AI-generated stories and books being published, and detecting these is a major headache for the publishing industry, and authors are being asked to disclose if "AI" has been used in the creation of the story, or the images, or the narration (technically we should also say yes if we used Word since it uses AI for spelling and grammar corrections/suggestions - but I don't use that as they are generally wrong after my initial typos are eliminated).
Conversely, marketers are using "AI" to produce blurbs and teasers, to select keywords and categories, for human written books - so even with these attempted protections, the books may be authentic human stories, but what you see when you purchase may be computer generated.
Human authors, may also use "AI" in brainstorming for ideas. But that may lead to inadvertent plagiarism as the LLM can generate phrases, sentences and larger sequences from its training and search data. Also, a book needs a consistent world and history that gives rise to its own set of fictional "facts" - and a lot of the filtering on top of an LLM is to ensure that it remains consistent within a conversation.
So to write a longer story or hold a longer conversation, it is important to use the LLM itself to summarize facts in a way that can be included in later prompts. Indeed bigger AI systems may use multiple LLMs to produce and manage and optimize prompts, and to filter and check the results - currently more internal constency than external accuracy.
We've also explored using LLMs to write a story in the style of a particular author, and/or target it to an appropriate audience. Generally, they can do pretty well at these stylistic things. But of course in a novel, you have to give each character their own personality, and would some how have to capture and maintain that in a sequence of prompts.
So far, I'm not finding them much competition for me! or much help...
Narrating stories/books
Natural "AI" voices are getting pretty good, and for me the big opportunity is to do audiobooks with authentic character voices. So I've recently produced audio versions of some stories in five different ways (some stories/poems/extracts are airing on radio, and I have some audiobook versions of my novels in the works).
I've now produced my first audiobook (of Time for PsyQ) using Google Play AI technology, with the earlier chapters narrated in five different ways using three different toolchains (not all of which involve AI) as I experimented. For Apple Books, I've used their single female AI voice to autonarrate a second version. At this point the technology is new, and with significant restrictions about which outlets will accept what.
In fact, I think this technology is going to change the whole nature of audiobooks, making them more like the radio plays our parents or grandparents listened to. Down the track, we can expect to see multicharacter autonarrated audiobooks and even autoacted videobooks that compete with telemovie adaptations.
Unfortunately, reading is a dying art: both reading out loud, with appropriate expression; and reading to oneself, with good comprehension. Reading books ourselves requires us to interpret the author's descriptions of scenes and characters and emotions constructively, imagining them. Thus books impose the most cognitive load, audiobooks less, and videos/movies/plays the least. This suggests why people now watch movies and teleseries, including adaptations from books, more than they actually read books, with audiobooks now starting to overtake ebooks for market share so that they look like occupying an intermediate position.
The rise of audiobooks is somewhat controversial in relation to their effect on literacy. It still requires interpretation of scene and character details, but a good narrator will convey emotions and distinguish the characters with slightly different pitch, accent and/or mannerism. A radioplay or dramatized audiobook goes further by adding sound effects (and I have experimented with character voices and sound effects for some of my stories/chapters voiced for radio) - but they are not permitted by the Google and Apple AI narration and Amazon says they cause problems for their AI-mediated Whispersync.
Educators, including teacher/librarians, face a variety of so-called scholarly evidence and other inputs and recommendations, involving a great deal of only partially accurate information about this subject.
It is partly true that the same brain areas are involved in "reading" audio books and electronic or print books, in the sense that our language areas are active, and the cognitive areas responsible for understanding and interpreting the text are active. To an extent even some speech/hearing areas are active, as nearby areas are involved in phonological, lexical and grammatical processing, and there are also "mirror" neurons that fire across multiple modalities (which Time for PsyQ's 11-year-old heroine mentions in the book, which has a lot of brain science and technology in it). Actually in my PhD (late 70s early 80s) I predicted and modeled mirror neurons as being necessary for language learning around the same time they were being discovered elsewhere (although unfortunately, the paper about their discovery was rejected by Science so publication was delayed till after my PhD thesis was complete).
The use of parallel hearing and reading of texts is also useful for comprehension - one of the reasons we use multimodal methods in teaching. I used this approach in learning Chinese (where the characters are more semantic than phonetic, and have different pronunciations in different languages/dialects). The phonic approach of teaching people to read has the disadvantage of focussing on letters rather than words and sentences, and hearing a audiobook as they read along in the text — or having it read to them by a parent or teacher as they read along — helps trigger those mirror neurons, helps them learn the pronunciation of less phonetic words and names, and models and encourages good fluent reading (if the reader is good - many cheap audiobooks of classics have rather poor readers who mispronounce the less common words).
In fact, the main reason I chose to adapt Time for PsyQ to an audiobook format was that a considerable number of parents and teachers had mentioned that they had enjoyed reading the book out loud with their children.
On the other hand, I still have reservations about audiobooks, and note that the western world is seeing increasing loss of literacy, with most Americans reading at primary school level or less. I expect to see the increasing prevalence of audiobooks, particularly in schools and libraries, to drive literacy to even lower levels.
The typical adult in an English-speaking country spends over 5 hours a day watching video of one form or another, and around 2 hour a day listening to audio of one form or an other, with audiobooks approaching 1 hour a day on average, while reading physical or electronic books has fallen to 16 minutes a day on the average in the US, with a similar amount of time spent reading traditional news sources.
There is also a corresponding transition from face-to-face and voice-telephony interaction to social media, and smart phones improved speech interfaces are impacting use of the reading/writing/typing modalities still further.
Nonetheless, I chose to go ahead and produce an audiobook of Time for PsyQ, which has just been published through Google Play and Findaway Voices, and is already available for Kobo (although will not be available on Amazon or Apple in the near future due to their rules regarding AI-medidated narration).
Single Narrator
Signal Processing
Voice Changing/Voice to Voice (V2V)
Because the commercial systems were so inconvenient to use, this took a huge amount of time. I may write my own wrapper around one of the open-source voice-changers to make this a bit more automatic (I had to identify all quotations, pull them out, individually, change frequencies and regenerate in the target voice, then paste it back into an appropriate character track).
This gives really good results because more parameters are controllable than I could manage manually in GarageBand (the V2V systems usually use a Python toolchain, although I often use Matlab). I also added sound effects at the various breaks in this version (which will be used on radio).
No comments:
Post a Comment