Artificial Intelligence or Artificial Idiocy?
As a pioneer of statistical and neural learning technologies for natural language processing and (embodied) conversational agents, it is has been great to see the advances that larger and larger language models and clever use of embeddings, attention and filtering, have made in the last couple of years.
It is important to understand that large language models (LLMs) themselves are statistical models that predict what words and phrases are likely to come next, and a model like GPT4 is trained on a very expensive run through a very large fixed body of text (the corpus) - and so itself doesn't learn any more, and doesn't actually understand anything about the world or what it is saying. So no real intelligence there yet...
However, the same kinds of models can be trained on speech, where the units are phonemes rather than characters, and phrasing is conveyed by intonation rather than punctuation. Moreover, similar models are being trained with images and videos, and this does start to give us information about the world.
The AI/LLM models that you are playing with may have both a fixed LLM model at the heart, other models trained to help with composing images or speech, prompts and filters that guide and censor them to produce answers of an acceptable form in an appropriate format, and so on. These additional layers can retain information within and between sessions, can look at images and can search the web. However, currently sessions tend to be limited with no direct memory of previous sessions and no actual learning across sessions, and the results of searches tend not to be retained fully even within a session - and the "robots.txt" limits on searches may impact the ability to refine the answer to an ongoing question (so it will want to start a new session on a new topic).
These conversations themselves may be used by a combination of human and automatic processing to improve the overall AI experience even though the underlying LLM hasn't changed. And of course, such experience can feed into future LLMs with greater quality control - although those Large Language Models take months of training on thousands of GPUs.
Writing reports and papers
One of the opportunities (from the point of view of employees and students) for these models is to research topics and write summaries and reports (and of course, they can also be used to try to identify and distinguish real human/student work from artificial/faked work).
The models are by their nature inclined to make up stories and facts, and are limited in their access to real facts (both those in the original corpus, due to the compilation into embeddings; and those in the searchable web). The report-writing wrappers around the LLMs may thus push them to write in a formal dotty way with references to the sources - although these sources need to be checked as they do not alway contain the "fact" asserted. A good way to test out the models is on area where you are expert (and for me that's me, my research areas and my writings).
From the perspective of a teacher we have several problems. One is that students who use them are not learning things themselves, and don't know the area well enough even to see what is right and wrong. Longer term however, it is appropriate for students to learn how to use AI tools to be more efficient and effective - but we have big problems ensuring the accuracy of the AI's results, which requires a separate fact-checking step, and ideally would involve grounding in the real world and actual understanding of what it is talking about.
From the point of view of a user, whether academic or personal, there is a real problem with us believing what the system tell us, even though it is often wrong and can be persuaded to change its mind and tell you something different. There are ethical issues as well with certain uses, including as a "friend" or "adviser" or "counsellor", that were already considered by Joseph Weizenbaum in his 1970 book, "Computer Power and Human Reason: from Judgement to Calculation" — which was written after the "success" of his famous 1960s Eliza/Doctor program.
However, this is a track we are going down with automated systems providing help and advice, and we are currently addressing the dangers of bad advice or even just lack of empathy.
There are even fake research papers being written with the help of these LLMs.
Writing stories/books
But while these models may find it difficult to stick the the facts, surely that must mean that story-telling is a natural opportunity. Indeed there are now many AI-generated stories and books being published, and detecting these is a major headache for the publishing industry, and authors are being asked to disclose if "AI" has been used in the creation of the story, or the images, or the narration (technically we should also say yes if we used Word since it uses AI for spelling and grammar corrections/suggestions - but I don't use that as they are generally wrong after my initial typos are eliminated).
Conversely, marketers are using "AI" to produce blurbs and teasers, to select keywords and categories, for human written books - so even with these attempted protections, the books may be authentic human stories, but what you see when you purchase may be computer generated.
Human authors, may also use "AI" in brainstorming for ideas. But that may lead to inadvertent plagiarism as the LLM can generate phrases, sentences and larger sequences from its training and search data. Also, a book needs a consistent world and history that gives rise to its own set of fictional "facts" - and a lot of the filtering on top of an LLM is to ensure that it remains consistent within a conversation.
So to write a longer story or hold a longer conversation, it is important to use the LLM itself to summarize facts in a way that can be included in later prompts. Indeed bigger AI systems may use multiple LLMs to produce and manage and optimize prompts, and to filter and check the results - currently more internal constency than external accuracy.
We've also explored using LLMs to write a story in the style of a particular author, and/or target it to an appropriate audience. Generally, they can do pretty well at these stylistic things. But of course in a novel, you have to give each character their own personality, and would some how have to capture and maintain that in a sequence of prompts.
So far, I'm not finding them much competition for me! or much help...
Narrating stories/books
Natural "AI" voices are getting pretty good, and for me the big opportunity is to do audiobooks with authentic character voices. So I've recently produced audio versions of some stories in five different ways (some stories/poems/extracts are airing on radio, and I have some audiobook versions of my novels in the works).
I've now produced my first audiobook (of Time for PsyQ) using Google Play AI technology, with the earlier chapters narrated in five different ways using three different toolchains (not all of which involve AI) as I experimented. For Apple Books, I've used their single female AI voice to autonarrate a second version. At this point the technology is new, and with significant restrictions about which outlets will accept what.
In fact, I think this technology is going to change the whole nature of audiobooks, making them more like the radio plays our parents or grandparents listened to. Down the track, we can expect to see multicharacter autonarrated audiobooks and even autoacted videobooks that compete with telemovie adaptations.
Unfortunately, reading is a dying art: both reading out loud, with appropriate expression; and reading to oneself, with good comprehension. Reading books ourselves requires us to interpret the author's descriptions of scenes and characters and emotions constructively, imagining them. Thus books impose the most cognitive load, audiobooks less, and videos/movies/plays the least. This suggests why people now watch movies and teleseries, including adaptations from books, more than they actually read books, with audiobooks now starting to overtake ebooks for market share so that they look like occupying an intermediate position.
The rise of audiobooks is somewhat controversial in relation to their effect on literacy. It still requires interpretation of scene and character details, but a good narrator will convey emotions and distinguish the characters with slightly different pitch, accent and/or mannerism. A radioplay or dramatized audiobook goes further by adding sound effects (and I have experimented with character voices and sound effects for some of my stories/chapters voiced for radio) - but they are not permitted by the Google and Apple AI narration and Amazon says they cause problems for their AI-mediated Whispersync.
Educators, including teacher/librarians, face a variety of so-called scholarly evidence and other inputs and recommendations, involving a great deal of only partially accurate information about this subject.
It is partly true that the same brain areas are involved in "reading" audio books and electronic or print books, in the sense that our language areas are active, and the cognitive areas responsible for understanding and interpreting the text are active. To an extent even some speech/hearing areas are active, as nearby areas are involved in phonological, lexical and grammatical processing, and there are also "mirror" neurons that fire across multiple modalities (which Time for PsyQ's 11-year-old heroine mentions in the book, which has a lot of brain science and technology in it). Actually in my PhD (late 70s early 80s) I predicted and modeled mirror neurons as being necessary for language learning around the same time they were being discovered elsewhere (although unfortunately, the paper about their discovery was rejected by Science so publication was delayed till after my PhD thesis was complete).
The use of parallel hearing and reading of texts is also useful for comprehension - one of the reasons we use multimodal methods in teaching. I used this approach in learning Chinese (where the characters are more semantic than phonetic, and have different pronunciations in different languages/dialects). The phonic approach of teaching people to read has the disadvantage of focussing on letters rather than words and sentences, and hearing a audiobook as they read along in the text — or having it read to them by a parent or teacher as they read along — helps trigger those mirror neurons, helps them learn the pronunciation of less phonetic words and names, and models and encourages good fluent reading (if the reader is good - many cheap audiobooks of classics have rather poor readers who mispronounce the less common words).
In fact, the main reason I chose to adapt Time for PsyQ to an audiobook format was that a considerable number of parents and teachers had mentioned that they had enjoyed reading the book out loud with their children.
On the other hand, I still have reservations about audiobooks, and note that the western world is seeing increasing loss of literacy, with most Americans reading at primary school level or less. I expect to see the increasing prevalence of audiobooks, particularly in schools and libraries, to drive literacy to even lower levels.
The typical adult in an English-speaking country spends over 5 hours a day watching video of one form or another, and around 2 hour a day listening to audio of one form or an other, with audiobooks approaching 1 hour a day on average, while reading physical or electronic books has fallen to 16 minutes a day on the average in the US, with a similar amount of time spent reading traditional news sources.
There is also a corresponding transition from face-to-face and voice-telephony interaction to social media, and smart phones improved speech interfaces are impacting use of the reading/writing/typing modalities still further.
Nonetheless, I chose to go ahead and produce an audiobook of Time for PsyQ, which has just been published through Google Play and Findaway Voices, and is already available for Kobo (although will not be available on Amazon or Apple in the near future due to their rules regarding AI-medidated narration).
Single Narrator
Signal Processing
Voice Changing/Voice to Voice (V2V)
Because the commercial systems were so inconvenient to use, this took a huge amount of time. I may write my own wrapper around one of the open-source voice-changers to make this a bit more automatic (I had to identify all quotations, pull them out, individually, change frequencies and regenerate in the target voice, then paste it back into an appropriate character track).
This gives really good results because more parameters are controllable than I could manage manually in GarageBand (the V2V systems usually use a Python toolchain, although I often use Matlab). I also added sound effects at the various breaks in this version (which will be used on radio).