Maths is a curious subject, which is reflected in its practitioners. Even by academic standards, mathematicians tend to come across as distinctly odd. For most of the world, maths means a school exam which they did not enjoy. A modest proportion can probably do some basic mental arithmetic and occasionally find that useful, although most prefer not to delve to closely into exactly where their last pay cheque went. If they really need to add something up, they will use a calculator, or maybe a spreadsheet. They might remember doing some simple algebra at school, but unless they are puzzle nuts who use it to solve occasional “brainteasers”, they are probably glad to remember almost nothing about it. If pressed, they will assume that engineers and boffins will use maths to help create all the technology around us.
That indeed is correct. There are thousands or millions of people using maths in science and engineering for every person who is interested in creating it or studying it for its own sake (the proper definition of a mathematician). The relationship between maths and the real world is deeply mysterious, and I do not even have time to touch on it here. But in practice there is a drastic difference in outlook between the mathematician and the scientist or engineer. The latter simply regard maths as a tool; the former wants to understand why results are true and to find new results.
When I got to Trinity in 1968, there were two famous old mathematicians occasionally to be seen doddering around. One was Abram Besicovitch, who is best known for his comment that you judge a mathematician by the number of bad proofs he has produced. This sounds paradoxical, but Besicovitch was using “bad” in a rather technical sense. He did not mean bogus proofs, he meant proofs which are correct but far too long. The empirical fact is that the first proof of any significant result is almost always bad. Usually a significant result will have been around for quite a while before anyone finds a proof.
Of course, the snag is that until a proof is found, no one is quite sure whether the result is really true. After struggling with it for a year (or more usually much less) and failing to prove it, most people will begin to have the awful sinking feeling that maybe it is not true and give up. Often people will suspect it is half-true, in the sense that it needs a rather stronger hypothesis. Indeed that is usually the case. Indeed someone may find a “pathological” counter-example, meaning a counter-example which is weird, not the kind of thing anyone was envisaging when they looked at the result. That usually leads to a stronger hypothesis on the basis of which someone will manage to prove the result.
But many results are not like that. Fermat’s last theorem (there are no integer solutions to xn + yn = zn with n>2) was proved for vast numbers of special cases, but no counter-examples were found and the original conjecture remained irritatingly unproved 357 years after Pierre Fermat had stated it in 1637. Finally, Wiles and Taylor did manage to prove it, and a truly awful proof it is. No mathematician who is not a specialist in that area will find it at all easy to follow.
Another rather mysterious feature of maths is the interplay between geometry and more abstract branches of maths, such as algebra and analysis. The charm of geometry is that it easier to bring visualisation to bear on it. Mental pictures are much harder to come by if you are just manipulating symbols, and that means that it is much harder to think in a concentrated way about the problem with your eyes closed – usually considered the only way to solve tough math problems.
With that background, we turn to groups. In a sense the group is the simplest possible structure. Sets, of course, are simpler, but they do not have any inherent structure. Indeed “set theory” only really gets interesting when you allow infinite sets, and that rapidly becomes hard and counter-intuitive. So it is probably not a good place to start if you are trying to find your feet.
A group is a set of objects G with a binary operation. A binary operation is just something that takes two objects of G and gives an object of G. It is usual to represent the operation as a product, so that if the two objects are b and c, their product is written bc. Groups are partly an abstraction of ordinary integers with the operation +. But there is an important difference: the operation is not necessarily commutative. In other words, we do not require that bc = cb. The axioms are:
(1) the operation is associative, in other words for all g, h, k we have (gh)k = g(hk);
(2) there is an object 1 in G, such that:
(A) for any g in G we have 1g = g1 = g; and
(B) for any g in G we can find an object h in G with hg = gh = 1.
The object 1 is called the identity, and in (B) the object h is called the inverse of g, written g-1.
It is almost obvious that 1 is unique and that each object has a unique inverse. For suppose e is an identity. Then since eg = g for all g, we have in particular that e1 = 1. But since 1 is an identity, we also have that e1 = e. Hence e = 1. That establishes the uniqueness of the identity.
Suppose g has another inverse h. Then since g-1g = 1 and 1h = h, we have that h = (g-1g)h. But (g-1g)h = g-1(gh) = g-11 = g-1, so h = g-1. That establishes the uniqueness of the inverse.
Where G is finite, we can conveniently represent the group by a table, eg

We find the product of b and c by looking along the row labeled b and down the column labeled c. It is obvious that this group is commutative (the table is symmetrical about the main diagonal), 1 is the identity and every element is its own inverse. The only tricky part is showing that it is associative. There are better methods, but brute force works (there are only 43 = 64 cases to try).
Note also that we can regard it as “generated” by two elements a and b subject to:
ba = ab; and
a2 = b2 = 1.
From this point of view, c is just shorthand for ab. A more complicated example is:

This is not commutative. For example, ab = d, but ba = f. We can regard it as generated by a and d subject to:
da = ad2; and
a2 = d3 = 1.
In this case brute force is clearly an unappealing (but feasible) way to check associativity (63 = 216 cases to try).
Now a mathematician will immediately suspect that these axioms are somewhat redundant. In other words, we are probably assuming more than we need.
Suppose we replace (2) by:
(2′) there is an object e in G, such that:
(A) for any g in G we have eg = g; and
(B) for any g in G we can find an object h in G with hg = e.
So speaking loosely, we are saying that G has a “left identity” and each object in G has a “left inverse”. Is that enough? In other words, can we prove that (2) follows? Well that is your first puzzle. It is non-trivial. In other words, it is harder than anything you will get in today’s A-levels, and plenty of first year maths undergraduates would fail to get a correct proof in a reasonable time (say 15 minutes). But it is far from hard. The proof is just a few lines of manipulation similar to what we used above to show uniqueness.
Now suppose we replace (2) by:
(2′′) We call e a “left identity” if for any g in G we have eg = g.
(A) G has (at least one) left identity;
(B) for any g in G we can find h in G such that hg is a left identity.
Note that this is weaker than (2′) because the left inverses of different elements could be associated with different left identities.
But is it enough? In other words does (2) follow? That is your second puzzle. You have to produce either a proof that it does, or a counter-example – a particular example of G and an operation satisfying (1) and (2′′) but not (2).
This is significantly harder. It soon becomes clear that a proof requires more than a few lines of doodling. On the other hand, although it is easy to produce a small set G with a binary operation satisfying (2′′) but not (2), it is less clear how we find one which is also associative.