Language, Learning and Logic
Language, Learning and Logic are the three
critical aspect of intelligence, and language is in focus in the Turing Test as
a way of determining whether an AI can be regarded as ‘thinking’ or ‘intelligent’.
Understanding not just how to write correctly and affectively, involves considering how it affects reading. this is more important than ever today – especially when you consider how widely you may be read. What you write today will in general be read not just by humans, but a variety of AI programs – whether they are trying to index it or understand it, or just reading it out loud.
Spoken vs Written
One thing that is often missed in such
discussion is that language ‘evolved’ in its spoken form as part of our
culture, but the rules of grammar,
spelling, punctuation were ‘invented’ in association with writing. When a
Conversational Agent communicates with us in writing, that is already a long
way from our natural modality - but provided Turing with a natural way of
isolating the contestants so that we couldn't use sight and sound to
distinguish them.
Whether Embodied Conversational Agents, GPS guidance systems, or e-book readers, AI systems can
be made to look and sound arbitrarily realistic – although there is still a fair bit of work to do on emotion, expression and prosody.
Ironically, this is also exactly where we
have issues with writing.
Morphology, Grammar and Punctuation
In relation to writing, the origin of
morphology, grammar and punctuation was to capture the contextual rules that
underlie language in the larger sense that includes not just words and sentences, but emotion, expression and
prosody. When we are speaking, our tone of voice, rate of speaking, facial
expressions, etc. all contribute to conveying meaning that goes beyond the
actual words and grammatical forms used. Punctuation was invented to provide
the cues that allowed us to read back the intended emotion, expression and
prosody (whether read out loud or subvocally).
Unfortunately, the invention of printing
changed all of this. Printers were more concerned with aesthetics and practicality than the
correct logical use of punctuation. Some
rules specifically relate to the days of typesetting with lead, and the
ease of chipping off small punctuation marks at the end of a line of type.
But we will here consider the fundamental
punctuation marks in relation to the fundamentals of prosody - where the first
and most important consideration is how it affects your breathing: that is when
you take a breath.
A Prosodic view of Punctuation
Comma (,) is, at heart, a breathing point. It
can thus serve a number of functions including separating a list of short
phrases or clauses that don't themselves have commas in them – which would be
confusing.
Semicolon (;) gives us a second level of more
major break/breath points that can also separate a list of phrases or clauses
which can now have commas in them – there is no possibility of comma confusion now.
One minor detail here (the Oxford comma) is
whether you should or shouldn't have a comma before the ‘and’ that marks the
final term in a list. You can have wars about this if you want, but a pragmatic
rule is simply “put one in if you want to tell readers they need to
break/breathe here”.
Colon (:) is used when you announce a list,
with the list itself separated by either commas or semicolons as appropriate – although the complexity threshold where you feel you need semicolons will be
lower. The longer the sentence the more
likely you are to want to use semicolons.
Colon (:) can also be used to introduce
direct quotation (in “double quotes” usually), although comma is more often used for this these days. Colon is the correct way of introducing an explanation or elaboration.
Single quotes are often used for quotes within quotes, ‘scare quotes’ where you
acknowledge that this usage is not common or needs to be taken with a grain of
salt, or highlighting ‘interesting’ words that you are discussing while not
actually quoting anyone saying them.
Semicolon (;) can also be used to used to
separate parallel ideas where one doesn't explain or follow from the other (in
which case colon is used). This is really what's happening when it is used in a list too.
Colon and semicolon often come at the end of
a clause (typically a sentence nucleus that has subject, verb and object) where
the sentence is not quite finished yet because something else needs to be said
to complete or fill out the idea. It is a good idea to read such sentences and see if you
can replace these clause-level forms of punctuation with sentence-terminating
punctuation (which means the part after the colon has to have its own subject and predicate).
Sentential punctuation
The sentence-final punctuation point or stop takes
three forms, full stop (.), exclamation mark (!), and question mark (?). Note that all three include a point as a stop
mark at the baseline, the stop or point indicating a brief break or pause.
If you have multiple clauses in a question or
an exclamation, it is good to review to see if they all have the same (interrogatory or exclamatory) character.
Often only the first clause does, and so that is where the relevant mark should go –
particularly where what follows goes on to suggest or discuss possible
responses or provide parenthetical information. Other times you might need to write a series of questions, or a mix of
questions, exclamations and statements.
The biggest issue relating to the use of
question marks is the complex multi-clause sentence. In this case, more clarity can
often (but not always) be achieved by breaking it up into a series of smaller sentences/questions.
The exclamation mark has come into a lot of
criticism – and, as often happens with language-related issues, well-meaning
editors have thrown the baby out with the bathwater! If you overuse exclamation
marks, then you are likely to feel you have to resort to multiple exclamation
marks to make your point!! We will come back to this!!!
Parenthetical punctuation
The other main use for punctuation is for
some sort of bracketing – this includes the use of matched single or double
quotes (which should ‘curl’ like round brackets) as well as the round, square
and curly brackets. Generally the bracket forms are not seen much in fiction,
but there are particular conventions as to how they should be used in technical
writing – which is largely beyond the scope of the present discussion, although
the so-called Harvard convention is explained here.
When we put something in round brackets (like
this) it indicates that nothing is lost (grammatically or contextually) when we leave it out, but that this is
helpful reminder or pointer, or an interesting aside – and it is up to the
reader whether you take it or leave it.
When it is integral to the story, including an elaboration of a point
you are making, a pair of commas is used for ‘parenthesis’ - or potentially a pair of dashes. Commas and dashes used to mark parenthesis do not need to close if they are sentence final (so full stop and its variants close them).
The same rule applies in citations: you put
the references to the literature in parentheses (Author,Date) when the citation
is parenthetical in that is doesn't have a grammatical role to play in the
sentence; Author (Date) introduces something that Author said or did, but only
the date is parenthetical. In some conventions square brackets or even curly brackets are used, particularly when a numerical marker is used. Leaving out what is in brackets should not affect understanding of the sentence and who did the primary work or came up with the idea (even in the numerical/footnote/endnote usages). A footnote or endnote is simply a parenthetic explanation that is left out of the main text to avoid disrupting readability.
Avoiding parentheses (when you don't want
people to ignore what you've said in parenthesis) can be tricky, and commas can
be confusing. Commas (or even semicolons) are promoted to dashes when the
parenthetical comments are relatively long or complex, allowing them to
incorporate their own commas without confusion. Generally punctuation should go inside brackets or quotes if they make sense their, and outside if that is where they fit logically. Where both make sense, generally use the inner position – in particular, US printers and publishers tend to place comma and full stop inside for aesthetic reasons, even when it goes against the logic of the sentences.
Furthermore, the dash is often
interchangeable with the colon – indicating that the explanation bit is
somewhat optional/parenthetic. This explanatory usage thus looks like dash parenthesis closed by
clause or sentence level punctuation (usually a full stop). I have used quite a few of these auto-closed dashes –
they are very convenient and often more readable than the alternatives (with a lighter and less pedantic feel than colon). But
sometimes it would be possible to just make them a separate sentence.
Generally punctuation should go inside brackets or quotes if they belong to the quoted sentence, and outside if that is the sentence they belong to. Where both make sense, generally use the inner position – in particular, US printers and publishers (inc. the much deprecated Chicago Manual of Style) tend to place comma and full stop inside for aesthetic reasons, even when it goes against the logic of the sentences (The New York Times Manual of Style and Usage is quite logical and is recommended for the US market).
That dashed hyphen
There is a huge difference between a
dash and a hyphen – and not just the differences in size.
A hyphen is morpheme-to-word or multiple-word punctuation that
is used when without the hyphens normal grammar rules don't suffice to allow us
to make sense of it, but but the phrase is not settled enough to be joined directly into a single word. Often these hyphenated words will be ‘collocations’ that have actually
risen to have an idiomatic and close to a word-like status in current usage. Generally, when using such a concept as
multiple words in their original sense, with standard grammar, hyphens aren't
used – but when the same idiomatic or technical phrase is used as an adjective
hyphen will be required. (Note the use of ‘multiple word’ with and without
hyphen in this paragraph.)
A dash is word-to-sentence level punctuation and is
used to intersperse or conclude with parenthetic clauses that are too important
to be enclosed in parentheses (round brackets), or which need more separation than is afforded by a colon or semicolon. Since it occupies a word-like level
(unlike hyphen) it is appropriate to use an en-dash (viz. the size of the
letter 'N') surrounded by spaces – the printer's usage of an em-dash without
spaces (based on the size of 'M') is to be deprecated as both ugly
and misleading—horrid! Furthermore, if the em-dash is used it must be separated from neighbouring words by thin or hair breaking spaces (el-space being the size of 'l' and corresponding to unicode 2009; best is often unicode 200A hair space). This is important so that word-processing and typesetting/formatting systems can break and justify appropriately (otherwise words in two different segments of the sentence get treated as a single word). Note that some (modern) fonts make the dashes and spaces too big — best is an Old Style (or Antiqua) font where em/en-space/dash are matched to M/N (or m/n for a bigger visual contrast).
There is one final very important punctuation
mark… This is one that is very useful and can help resolve the awkwardness of trying to model real speech, and can often be used in place of a dash or a colon,
and sometimes can even replace an exclamation mark. This is the ellipsis mark '…' — which is
actually a single character in Unicode (2026). Many people erroneously use (em-)dash when an ellipsis is required.
Elision refers to the dropping out of some
words or parts of a word). Note that apostrophe is used for just omitting part
of a word so that “She would have …” becomes “She'd've” (not “She'd of” –
although that sounds similar it is not grammatically correct). With ellipsis
the '…' has spaces around it except
where part of a word is missing in which case it touches the residual word.
Where it represents a pause, and particularly where the last sound is held
during thought, or for precise timing, again no space before the '…' ('exit expected
in… 24 minutes').
Ellipsis is often used in quotes to indicate
something has been omitted that was in the original text, but it is also used sentence-final for speech/thought that peters out prior to the end of the
sentence. There are several possibilities here: One is that they got themselves
into an awkward place and want to restart their sentence; another is that the
implications are blindingly obvious; another is that the implications are obviously
important but complex and uncertain.
In such cases, the person either can't finish the sentence or needn't finish the
sentence.
Dash should not be used for any elliptic purpose, although this is a common error perpetrated and perpetuated by certain publisher and editors. It can be used for an interruption in direct speech, whether from an external source or the speaker's own thoughts. The normal rules of parenthetical dash apply in direct speech, except that quotes need to be closed and reopened if the parenthetical remark was not in the actual reported speech.
Note the different between em-dash marking a break beyond the speaker's control and ellipsis allowing their thought to peter out as they reconsider or change their mind or leave out something that is already understood... [by both reader and writer]. Note that square brackets can be used for ellipsis where words are omitted from a formal citation, but replaced with other words to retain the grammar and meaning (e.g. a pronoun may be replaced by a name).
The question of exclamation mark
This leads us to a final comment on question
mark and exclamation mark (or 'point' in US-usage). These should be used
immediately at the end of the actual question or exclamation, and followed by a
new sentence (or follow up question).
Exclamations include things like “Wow!” or “Hi!”
as well as commands like “Shut up!” or “Follow me!” and it is incorrect not to
use the exclamation mark after such exclamations and imperatives — it actually indicates a sharp raised
tone of voice. Vocatives (naming someone by name or kind) as a call for
attention also require the exclamation mark, e.g. “Fred! Where have you got to?”
or “Boy! Go get your mother!”
Exclamations that start with a question word
tend to deserve the mark too — particularly when they lack a verb and thus fail
to qualify as a sentence (“What a beauty!”; “How about that!”). But best is to stick
to the question mark for a rhetorical question: “Why didn't I think of that?”; “Why
not?”
The more optional uses of exclamation are to
mark surprise or unexpectedness, or something that would be an unfavourable
outcome. Where you mainly want a pause, as room for thought, the ‘…’ may be
appropriate. Where you want to race on,
then a simple ‘.’ may be best — or, if the idea is logically connected, a colon.
One of the reasons people have grown to dislike
exclamation marks (particularly in e-mails and other e-comments) is that it can
often be interpreted as deprecating or indicating surprise at someone's ignorant
views (‘sneer mark’); it can also be seen as self-adulation (‘laughing-at-your-own-joke
mark’).
Where you have a choice between ellipsis,
terminal punctuation, dash or colon, think about the length of pause you want: the '...' ellipsis is most explicitly indicating a pause or longer break, while colon may be more or less pronounced than a full stop. Interestingly dash (like
parentheses and quotes) may actually bracket a rapid aside – that is spoken faster and with a higher tone – with the normal rate and tone resuming
to mark the end of the bracketed information (actually indicating content is less germane but still of interest). In the case of quotes, when
quoting orally left and right finger-wiggles may be used while speaking the
quoted word or phrase (along with the higher-pitch intonation).
Sometimes people like to give rules-of-thumb
about the use of particular punctuation marks, e.g. high frequency of use of
exclamation marks (instead of full stops), and low frequency use of full stops
(because of use of colon, semicolon, dash, etc. to make longer sentences).
Indeed, measures of readability tend to penalize you for both longer words and
longer sentences – both are easy to fix!
But don't be worried if that leads to you
sentences that start with words like 'and', ‘but’ or ‘so’. ‘However’ is, however, good to start a
paragraph, being more major than ‘but’, but can also be used for a less confrontational contrast – particularly in a parenthetical second position in the sentence
For exclamation marks, if you have more than
one a page (for a book with lots of dialogue and/or introspective thought or
character point-of-view), then that's probably too much. Another warning sign
might be having more exclamation marks than question marks. But saying you shouldn’t use any, or at most
one or two in a book, is overreacting – people are so used to reacting against
the poor usages that they throw out the appropriate ones too. But still, if it is not an exclamation to
attract attention, or an imperative used to urge a course of action, or an
ejaculation that is forced out of you by the circumstances… consider discarding
it.
But all of these measures depend on both your
genre and your audience and the style of writing you are using.
Starting sentences with a conjunction
One reason people might be inclined to have long sentences, marked with punctuation like dash, colon or semicolon, is because they've been told it is wrong to start a sentence with 'And' or 'But'. This is hogwash — and indeed in some languages there are special versions of 'and' used for starting sentences (or in second position in the sentence). It is particularly likely that you'll need to start a sentence with 'And' or 'But' when reporting natural speech. But if a sentence naturally ends, don't force other punctuation in there! And feel free to retain the 'And' or 'But' if that is what feels and sounds natural, particularly if you are in the point of view of some character.
Ending sentences with a preposition
Another common furphy concerns words that you shouldn't end a sentence with. The words concerned are called prepositions when they introduce a noun phrase ('with a preposition' -> 'you shouldn't end a sentence with a preposition'). But the same set of words do double duty as particle associated with verbs, either introducing an infinitive or participle ('to travel' or 'for traveling' ) or as part of a separable verb ('put up' -> ('he put the picture up', 'put up with' -> 'a habit I will not put up with'). Winston Churchill (jokingly) miscorrected something like this when talking about things 'up with which I will not put'.
And as fars as prepositional phrases go, you actually have to 'front' a noun in many circumstance (e.g. when a subject or an object). For example: 'The boy I gave a book to, the girl I gave some money. I couldn't see the boy I gave the book to. The boy I gave the book to wasn't there.'
Subjects and Objects
One final point that relates to common grammatical errors concerns the use of nominative and accusative case, viz. the difference between 'I' and 'me', or 'he' and 'him'. Unfortunately, teachers have often been unhelpful in saying "Don't say 'me and John', say 'John and I'", in that they haven't been clear that this only applies for subjects. And many modern editors and authors thus don't understand this. Furthermore, there is another aspect to this. It is politeness not grammar that insists you put yourself last, because it is impolite to put yourself first.
The simplest way of deciding what is correct is to leave out the complicating elements (e.g. the conjunction or the subtending clause). For example in considering 'for [my brother and] me' we see that 'me' is correct, not 'I' - all prepositions take the accusative, that is the 'object' form (me/us, him/her/them). Similarly for objects of verbs: 'he saw [my brother and] me [running|run down the street]'.
This use of 'run' (as opposed to 'running') is an infinitive verb (versus a present participle in active form). A more common use of the infinitive involves the particle 'to'. E.g. 'he wanted [my brother and] me [to run to the shops]'. The accusative/object form is always used in front of an infinitive verb (which explains what is observed/desired of the object) — noting that the infinitival phrase can be omitted as illustrated in both the 'run' examples above.
One area that is complicated is the use of 'than' and 'as', which in modern usage can act as prepositions and take the accusative. E.g. 'My sister is bigger than me but I am stronger than her.' However, traditionally they are used with clauses: 'My sister is bigger than I [am], but I am stronger than she [is].' And then it is possible to drop the predicate when it is obvious or redundant. So either case could be 'correct' depending on the viewpoint.
This is what's happening in the well-known joke,
— 'Can you jump higher than a house?'
— 'No, of course not!'
— 'I can: houses can't jump."
Which we could change to
'Can you jump higher than me?' which is generally used to question whether 'You can jump higher than I [can]?"
The latter now sounds stilted, while the former technically retains the ambiguity reflected in the joke.
Prescriptive grammarians trained in the classical tradition will impose Greek or Latin grammar on English and thus insist you should say 'It is I!' rather than 'It's me!'
But in modern English, 'is/am/are/be' tends to be treated like any other verb and follows the Subject Verb Object (SVO) order-based grammar of English. So 'It was me not him' is quite appropriate when the speaker is not a classical prescriptive grammarian, while 'It was I not he' should be reserved for the classical pedant.
In classical languages, where adjectives and nouns are inflected for case (not just pronouns as in modern English), rather than the case (subject/object) primarily being determined by word order, the copula 'to be' is treated as being commutative (reversible) like 'equals', with the words on both sides having to be in the same case (nominative in the normal finite case: 'I am good/I am he'; accusative in the infinitive case: 'He told me to be good/He told me just to to be me [myself].') In a reflexive context (both bits referring to the same person), you can fix the problem using '-self'.
We can do this with prepositions too, in a broader set of cases involving repeated reference to a person, e.g. 'on behalf of my wife and myself' sounds better than 'on behalf of my wife and me' because it retains the pattern and balances the two halfs of the conjuctive ('and') phrase in terms of structure and syllables.
Further Reading (to think about -
don't expect to agree with everything)
My Paradisi Lost stories
Kindle paperback edition ISBN-13: 9781696380911 justified Iowan OS 11
Kindle enlarged print edn ISBN-13: 9781708810108 justified Times NR 16
Kindle large print edition ISBN-13: 9781708299453 ragged Trebuchet 18