You are currently browsing the monthly archive for March 2009.

Philip Tetlock

Philip Tetlock: scourge of the experts

Thirty years ago at the University of Yale, psychologist Philip Tetlock witnessed an experiment which was to inspire his life’s work: an epic 20-year investigation, the results of which are highly surprising and hold important lessons for us all.

The set up of the experiment was not particularly novel: it involved a rat, a maze, some food and a group of students to observe the results. The rat was released into the maze and given a choice to go left or right at a T-junction. Food was placed to the left on 60% of occasions, and to the right the rest of the time. It didn’t take the rat long to work out that it was more likely to be found to the left than to the right, and it soon chose to go left each time, guaranteeing itself a snack on 60% of occasions.

But this wasn’t the real point of the experiment. At the same time the rat was working for its dinner, the students were also asked to guess on which side the food would be found (they weren’t aware of the set 60% ratio of lefts to rights). The students, in their search for a pattern, switched their predictions regularly between right and left, resulting in a success rate of 52%. The rat was thus declared to have greater predictive powers than a roomful of Yale students.

Tetlock wondered if these observations pointed to a wider problem in the human brain’s ability to predict the future, and whether this problem extended to predictions made by professional forecasters. He embarked on a 20 year odyssey to investigate the ability of these experts to anticipate future trends.

Tetlock studied the forecasts of 284 experts who made their living identifying political and economic trends, periodically asking them to make predictions relating to their fields of expertise. Rather than giving them yes/no questions, he asked them to assess probabilities, for example: ‘What are the chances of inflation in the UK rising, staying the same or falling in the next 12 months?’, or ‘Will casualties in the Israel-Palestine conflict fall, rise, or stay the same in the next year?’ He didn’t just ask for their opinions, but studied the basis of their decision and how they reacted when they were proven to be wrong. After 20 years on the project, Tetlock ended up with an enormous database of 82.361 expert predictions.

The project’s findings were astounding. Just like in the rat test, arbitrarily assigning equal probabilities to the ‘worse’, ’same’ and ‘better’ outcomes would have yielded more accurate predictions than those given by the ‘experts’. In Tetlock’s words, a dart-throwing chimp would have been as much help as the paid experts in predicting the future (although please don’t try this at home, kids).

Other studies into the reliability of economic forecasts have yielded similar results, indicating that basing your forecast on a repeat of the result in the previous period would be more accurate than professional forecasts. This is more than a little disconcerting. Even the most independent-minded of us relies on professional expertise to tell us what the future may bring.

What is it that prevents people who know a subject inside out from outperforming rats and chimps in predicting the future? Some clues can be found in Tetlock’s analysis of his results. He looked for what personal characteristics made his forecasters more or less likely to make good predictions. One trend he noticed was that there was a significant correlation with the fame of the person. The more famous they were, the worse their forecasts tended to be. He also found that, beyond a basic working knowledge, the more an expert knew on the subject they were making predictions about, the more likely their prediction was to be wrong.

What lies behind this highly counterintuitive finding? Tetlock suggests that the experts made similar mistakes to the Yale students. The more you think you know, the more you are likely to overanalyse a situation which is inherently unpredictable or subject to far simpler rules than those applied by the sophisticated forecaster. The more an expert has invested in their complicated theories and arcane knowledge, the more likely they are to unnecessarily apply it to their prediction.

Another contributing factor could be Black Swans. The concept comes originally from the work of philosopher John Stuart Mill, but was popularised by Karl Popper in his influential treatise “The Logic of Scientific Discovery”, in which he dealt with the problem of induction. Induction is a process of thinking whereby a universal conclusion in reached on the basis of a limited number of observations. For anyone living in Europe prior to the discovery of Australia and New Zealand, the statement “all swans are white” would have seemed reasonable. If all the swans you and everyone you know and have ever encountered are white, it seems natural to assume it to be a universal fact that all swans are white – until, that is, someone ruins everything by finding black swans in the Antipodes.

In his excellent book, The Black Swan, Nassim Nicholas Taleb expands the idea of the Black Swan to any phenomenon which comes out of the blue, has a major impact, and which, in hindsight, is analysed to the point where its occurrence seems to have been almost inevitable. Example cited by Taleb include 9/11, World War I, and the advent of the PC and internet. To that list could be added the current financial crisis. Taleb argues that such events have a much larger effect on the world than is anticipated by traditional forecasting methods and by our hard-wired perception of the future.

Take as an example economic forecasts for Canada’s growth in GDP in 2010. According to the Bank of Canada, a growth rate of 3.8% is expected next year, a figure which is so incongruous that it looks suspiciously politically motivated. In comparison, the IMF expects growth of 1.6%, which seems more plausible at first sight. But what do these figures actually mean? What assumptions are they based upon? No more bank failures, one more failure, or how many exactly? Is the collapse of GM factored into the figures or not? How would they be affected by another large terrorist attack on North American soil, or an earthquake in San Francisco? How about a tsunami hitting Tokyo, or a bird flu pandemic? They may not be are not very cheery thoughts, but these and a myriad of other inherently unpredictable events and knife-edge outcomes in aggregate make these predictions worth less than the paper they’re written on. Economists may argue that the law of averages cancels out ‘external factors’, but this is patently untrue in the case of events of this magnitude. The economists are either fooling themselves, or just trying to fool us to keep their jobs intact.

Tetlock also noted in his analysis that the approach of the more successful forecasters differed significantly from the least successful. To explain the difference, he used the analogy of foxes and hedgehogs. According to the ancient Greek poet Archilochus, “The fox knows many things, but the hedgehog knows one big thing”. In other words, for all the cunning a fox has, a hedgehog only needs one tactic (rolling into a ball) to frustrate it. The philosopher Isaiah Berlin used the distinction in his celebrated essay “The Hedgehog and the Fox” to categorise thinkers and writers according to whether their work was based on a unified, consistent view of the world, or whether they brought multiple perspectives into their work.

Tetlock, similarly, labeled as hedgehogs those experts whose predictions were based on the application of one big idea, and who did not waver in their beliefs when they made mistakes, but rather made excuses for their failures. Their in-depth specialization in their subject led them to try to force the square peg of their pet theories into the round holes of subjects where they didn’t fully apply, and thus negatively affected their forecasting capabilities.

Foxes, on the other hand, had no grand theory, but instead drew on a broad range of ideas to determine their prognosis, and were always eager to learn from their mistakes. Tetlock found that foxes were far better forecasters than hedgehogs on average. They may have still only had around a 60% success rate (being a fox does not help one predict black swans), but this was significantly better on average than the hedgehogs.

It should be noted, however, that hedgehogs tend to be far more controversial than foxes, and thus get more column inches and air-time. Television, in particular, loves to pit two hedgehogs with alternate world views against each other. It makes a far more dramatic spectacle than two foxes trying to understand each others’ opinions. This helps to explain the inverse relationship between fame and forecasting prowess.

So where does that leave the layperson in need of expert advice? Firstly, it is clear that one should be skeptical about the worth of any opinions coming from so-called experts. If you have a basic understanding of the subject, you can probably do just as good a job yourself. Tetlock recommends following the predictions of those with a proven track record of correct forecasting. As a caveat to this, however, I would say that this only counts if these predictions demonstrate the expert’s ability to be flexible in his or her outlook. For example, the media has lauded those few brave souls who predicted the current financial crisis. At least some of these, however will probably have been right just because they have an extreme pessimistic bias. Relying on such individuals to accurately predict the bottom of the slump almost certainly be very unwise.

The news that experts are little or no better at predicting the future than ourselves suggests that we don’t need them. Do not expect, however, for them to be thrown out on the street any time soon. Even if I had never heard of chaos theory, my observation of the unreliability of long-term weather forecasting would tell me that they are as good as useless beyond a window of about three days. Nevertheless, every day I check the seven day forecast.

It seems that our brains are so uncomfortable dealing with an undefined future that we will will gladly listen to anyone willing to predict it. For some it may be fortune tellers and tarot cards, for others it may be economists or political analysts. Either way, the reassurance of having some expected future to plan for seems far more important to us than the reliability of the forecast itself. Rather than mocking the experts, perhaps we should first look a little closer to home.

reCAPTCHA

Boxes like the one above will be familiar, and perhaps frustrating, to anyone who logs on to websites these days. The text is often difficult to read, and I doubt that I am alone in often misreading the contents and guessing wrong on a regular basis. The test is, of course, meant to be hard. The idea is to provide something which is barely legible, so that computers cannot decipher the text but humans can. To do this, the text is deliberately distorted to the point where it often takes some effort to figure it out.

Such a box is known as a CAPTCHA, which stands for “Completely Automated Public Turing test to tell Computers and Humans Apart”. In general, tests of this type are known as reverse Turing tests, after the mathematical genius and computing pioneer Alan Turing.

Alan Turing sculpture

Sculpture of Alan Turing with apple in Manchester, England

In the course of his tragically short life, Alan Turing laid the foundations for the new field of computer science, broke the code of the German naval Enigma machine, thus saving countless lives in the Second World War, and came up with the test that bears his name. Even though computers were little more than a concept in his time, Turing was deeply interested in the potential of computing machines, and how far they could go toward thinking like humans. Rather than ask ‘can computers think?’, a question which he thought was mired in semantic problems, he came up with the idea that, if a computer could fool a human being into believing that it was communicating with another human, it could be said to be demonstrating ‘artificial’ human intelligence. He proposed that a test could be devised to determine whether a computer met this criteria, which became popularly know as a Turing Test.

In 1952 he was convicted of homosexuality, which was illegal in the UK at the time. In order to avoid prison, he was compelled to undergo hormonal injections to suppress his libido. He also lost his security clearance, which ended his career. On 8 June 1954, he was found dead with traces of cyanide in his system and a half eaten apple next to his bed (the bite out of Apple’s logo is rumoured to be a tribute to Turing). The death was ruled as suicide. Turing was 42 years old. Time Magazine named him as one of the 100 most influential people of the 20th century.

Designing a test to reliably tell computers from humans  is precisely the problem facing internet security analysts. In the early days of the worldwide web, internet mail sites such as Hotmail and Yahoo! found that spammers could program computers to  set up multiple accounts, and use them to launch spam. Similarly, blogging  sites and internet forums were plagued by spammers plying their wares via comments and posts.

In 2000, Yahoo! started using CAPTCHAs designed by Luis von Ahn and Manuel Blum to counter the spam threat. Computers could not read the images, and thus were not able to complete the log-in process. Von Ahn and Blum had designed a Turing test that stopped spammers in their tracks. The idea was soon taken on by many other popular websites.

Luis von Ahn

Luis von Ahn, virtually

Luis von Ahn has become a highly influential figure in web thinking. Discover magazine named him one of the fifty best brains in science, and he was awarded a prestigious McArthur Fellowship in 2006 (also known as the Genius Award).

His principal area of study is the field of  human computation. Turning traditional thinking on its head, Von Ahn is interested in how humans can help computers to think. For example, humans look at an image and can immediately make out its constituent elements, whereas to a computer it is just a series of zeros and ones. Ideally a computer could tag photos uploaded to the internet so that humans could search for them by keyword, but there is no way currently that computers can do this. Von Ahn’s idea is to find ways to use human thinking power to achieve tasks like this. For example, he came up with the concept for the Google Image Labeler, which provides tags for google images by means of an online game. Two players work as a team, independently describing a series of photos, and are awarded points if they both come up with the same description. Google uses the information gained from this game to more accurately tag its stock of images so they can be found by its search engine.

In the light of his breakthroughs in human computation, von Ahn reconsidered the enormous aggregate human brain power being used to decode millions of his CAPTCHAs each day. It occurred to him that while millions of people every day were having to decode ambiguous text purely as part of a logon process, projects to convert old texts to digitised formats were foundering due to the shortcomings of their text recognition software. He decided to use his Genius Award grant to design a system which would allow CAPTCHAs to be usefully employed to assist in the conversion of old texts to digitised formats.

Example of reCAPTCHA process

The limitations of optical character recognition (OCR)

The largest text digitisation project assisted by von Ahn’s reCAPTCHA process, as it is known, is run by the Open Content Alliance. The non-profit organisation has already converted over one million documents to digital format, all provided free of charge via their website. To digitise the documents, employees scan the pages of out-of-copyright texts, and the scanned images are then sent for optical character recognition (OCR) analysis. The analysis is about 90% accurate for newer books, but falls off to 60% for older texts. The unidentifiable text is then broken down into individual words and sent to participating websites as reCAPTCHAs. Two words, one which is already identified and which acts as a control, and an unidentified word, are presented to the user, having been stretched and distorted in ways which perplex computers but not people. The person logging in then enters both words. Correct recognition of the control word provides the login authorisation, but the user’s input for the other word is noted. Each unidentified word is served up a number of times until a consensus is reached as to what the word is. The process has a success rate of around 99.1%. The answer is then returned and inputted into the scanned text.

The reCAPTCHA service is provided free of charge to participating websites, which now include Facebook, Twitter, Craigslist, Stumbleupon and many more. Around 160 books’ worth of words are digitised every day through the process. ReCAPTCHA is also being used to digitise the entire archive of the New York Times, all the way back to 1851, which will be posted online with free access. Von Ahn’s goal is to digitise every out of copyright book and provide them all free of charge on the internet. This will keep us all busy logging in for a while, though. At the current rate of progress, the job will take 400 years to complete.

Jon Favreau

“Dude, we won. Oh my God.”

These were the words 28-year old Jon Favreau excitedly typed to his best friend on hearing that Barack Obama was to be the next President of the United States.

The language may have been representative of the content of millions of texts, emails and twitters which flew between excited twenty-somethings that night, but they are far from typical of the style of writing for which Favreau has become famous. Jon Favreau is Director of Speechwriting for the President of the United States, the man Obama calls his mind reader.

How does one get to be the most influential speechwriter in world? In Favreau’s case, luck had a lot to do with it. Having graduated from the College of the Holy Cross, Massachusetts in 2003 with a degree in Political Science, he found himself a year later occupying a junior position in John Kerry’s presidential campaign team, compiling talk radio new clips. In the early days, Kerry found himself a long way behind early favourite Howard Dean, and as his chances seemed to dwindle, so did the size of his campaign team. When a deputy speechwriter left, Favreau was selected to replace him, more or less by default.

As Dean’s star waned, however, John Kerry unexpectedly came to the fore, and Favreau ended up helping to write speeches for the Democratic presidential nominee. At the Democratic National Convention rally, members of the Kerry team listened to the newly elected Senator Barack Obama practising his keynote speech. When Obama spoke the words “there are no red states or blue states, there are only the United States of America, all of us pledging allegiance to the red, white, and blue”, Jon Favreau received a nudge from his colleagues. There was a very similar line in John Kerry’s speech, and Favreau was given the task of telling Obama that he must change it. Obama must have wondered who the hell this young kid was, telling him to change his speech, but change it he did. Obama’s speech turned out to be a highlight of the Convention, watched by 9.1 million Americans, propelling Obama into the media spotlight.

The next time Obama and Favreau met was the following year after Robert Gibbs, press secretary on Kerry’s campaign before becoming an advisor to Obama, recommended Favreau for the position of speechwriter to the Senator. During the interview, Obama asked Favreau, “What is your theory of speechwriting?”

According to Favreau’s memory of the meeting, he responded, “I have no theory, but when I saw you at the convention, you basically told a story about your life from beginning to end, and it was a story that fit with the larger American narrative. People applauded not because you wrote an applause line but because you touched something in the party and the country that people had not touched before. Democrats haven’t had that in a long time.” Favreau was hired, and the two have been speechwriting partners ever since.

Writing speeches for Barack Obama must be a daunting task. Favreau likens his position to being Ted Williams’s batting coach (Ted Williams is, arguably, the most accomplished hitter in the history of baseball). Obama is probably the most literary president since Abraham Lincoln, and would be perfectly capable of writing his own speeches if he didn’t have a country to run. Favreau succeeds because he has learnt how to ‘channel’ Obama. He has memorised many of his boss’s speeches, reportedly carries a copy of ‘Dreams from my Father‘ with him wherever he goes, and obsessively records Obama’s anecdotes, turns of phrase and speech patterns.

When preparing major speeches, Obama starts by explaining his concept of the speech to Favreau, who takes down the details. Favreau then goes away, prepares a draft, and emails it to his boss. Obama then returns it with his comments and rewrites, and Favreau works on it some more and passes it back. This continues until the speech is perfected.

The President’s inaugural address was probably the most eagerly anticipated speech since the Kennedy era. The pressure on both Obama and Favreau to deliver was immense. Favreau carefully studied previous inaugural addresses, particularly those delivered in troubled times, while his staff consulted historians to learn more about previous crises in American history. Favreau also elicited the advice of his speechwriting heroes, including Peggy Noonan, his all-time favourite speechwriter, in spite of her conservative background. Her Pointe du Hoc speech, delivered by Reagan on the 40th anniversary of the Normandy landing, is Favreau’s favourite, and is also one of the most emotive speeches I have ever read (it can be found online here). In the end, Obama and Favreau resisted the temptation to wow the world with memorable soundbites, opting instead for a perfectly understated call-to-arms for a nation with its back to the wall.

With the inaugural address out of the way, the real work has now begun. Favreau is used to working back-to-back sixteen hour days, and never getting more than six hours sleep a night, but there is a different type of pressure in the Whitehouse. There is no question that he can produce the goods for the big events. Obama’s address to the joint sessions of congress hit all the right buttons, for example. But working as Director of Speechwriting brings many other challenges. Favreau admits that he is not a natural organiser, but now he has to manage a team of speechwriters in a professional environment.

During the presidential campaign, Favreau’s immaturity proved to be a liability. Pictures surfaced on the internet of him groping and dancing with a cardboard cutout of Hillary Clinton. Favreau apologised immediately to both Obama and Clinton, and the Clinton team had the good grace to comment: “Senator Clinton is pleased to learn of Jon’s obvious interest in the State Department, and is currently reviewing his application.” One can imagine, however, that Clinton was far less amused behind the scenes by his juvenile antics. In the words of the biblical quotation he borrowed for the inauguration address, he now needs to “set aside childish things“. If he can rise to this challenge, and can learn from his new environment, to borrow a soundbite from another politician’s campaign, “things can only get better”.

The pedants are revolting! News of Birmingham’s banning of apostrophes from street signs has been widely reported in the international media, with many commentators clearly shocked that a British city is taking such liberties with the Queen’s English.

Councillors in Birmingham have defended the decision, stating that too much council time is spent debating the correct placing of apostrophes in the city’s street signs. For example, should it be King’s Heath, Kings’ Heath or Kings Heath. The council has also argued that search engines and GPS navigation devices will be confused by mis-spellings. So Birmingham has decided to give up by removing all apostrophes, seeing this as a way to remove confusion.

With retailers such as Barclays, Boots and even Harrods already ditching their apostrophes, while companies such as Tesco insist on offering us “chart CD’s”, it’s no wonder people are confused about the correct grammatical rules.

Tesco, j'accuse

No thank's. I'll go to Currys.

Where I take issue most with Birmingham City Council’s decision is their apparent belief that omitting apostrophes counts as an abstention in the apostrophe debate. However, to an already confused general public, the removal of appropriate apostrophes from street signs just further muddies the waters, and will lead to more offerings like this:

The future

I may sound like a grammar Nazi, but I’m actually quite forgiving of aberrant apostrophe use in personal communication. I’m sure that I make my own fair share of howlers as well. There is no excuse, however, for such mistakes in the signs erected by retailers and public bodies. Just on my daily commute to work today I drove past a beauty salon unfortunately named “Victoria Nail’s and Spa”. How such an error can make it all the way to a storefront with no-one pointing out the error I honestly cannot understand. Perhaps they could do us pedants a favour and donate their apostrophe to a nearby Shoppers Drug Mart.

As for the argument that modern search engines will be foxed by apostrophe usage, how many people using GPS devices search by road name? Postal code is a far more convenient and unambiguous way to search. Even if you don’t have the postal code and try to enter the street name, it will autocomplete after the first few letters. As for search engines, in my experience Google is intelligent enough to take account of such variations.

Birmingham example

In most cases, there should surely be no ambiguity anyhow. In the above example, given that the square is unlikely to have been named after more than one St Paul, there should surely be no confusion (unless it is as to whether the sign is actually a proclamation regarding St Paul’s lack of trendiness, but that’s another story entirely). As such, if there were any problems with modern technology not understanding differences in punctuation, I would imagine that I along with many others would be victims of the Council’s new directive, expecting there to be an apostrophe where one is not to be found. Remember that only Birmingham (and now Wakefield, I understand) have so far taken this step.

Birmingham City Council are hardly innovators in the abolition of apostrophes, however. The US Board on Geographic Names removed all apostrophes from American place names the 1890s with only a handful of exceptions, such as Martha’s Vineyard. Canada took a similar step in the same decade.

There are some lessons, however, from the Canadian experience. When Hudson’s Bay had its apostrophe removed, for example, it became Hudson Bay, not Hudsons Bay. This is more understandable. The Bay is named after Henry Hudson, and it was deemed that there is no logic in retaining the final ’s’ after the apostrophe was removed. Similarly, the street sign St Pauls Square makes no sense without an apostrophe. Either remove the ’s’ or put back the apostrophe. The story does not end there for the Canadian apostrophe, however. In the 1970s, the ‘ban’ on apostrophes was lifted for places where the apostrophe was still in regular usage in spite of over 70 years of prohibition. As such, places such as St John’s in Newfoundland and Lion’s Head in Ontario officially reverted back to a spelling which had never been abandoned by most people.

In the US, however, there has be no such rethink, much to the chagrin of those who are fed up with falling standards of grammar and spelling in the country. Last year, Jeff Deck and Benjamin Herson, referring to themselves as TEAL (the Typo Eradication Advancement League), set off with a bag of magic markers and correction fluid on a grammatical odyssey across America, correcting erroneous punctuation and spelling as they went. They would normally alert sign owners of the errors of their ways, offering to correct the signs for them, but sometimes just made the amendments anyway.

Jeff Deck at work

They ran into trouble in the Grand Canyon National Park, where they couldn’t resist correcting a 70 year old sign inside the Desert View Watchtower on the South Rim which, unknown to them, was considered “a unique historical object of irreplaceable value”. They had posted all records of corrections they had made on their blog, giving the authorities all the evidence they needed to prosecute them, resulting in a hefty fine and a ban from US National Parks for a year. Their blog is now no more than an apology for their act of defacement. Let this be a warning to anyone heading to Birmingham with a permanent marker at the ready.

This issue may be considered a big fuss about nothing to many. As has been pointed out, the Council had already removed many of the apostrophes from the signs in Birmingham before this furore, and only made this declaration to silence the protesting pedants once and for all. Meanwhile, the man in the apostrophe-stripped street was oblivious to the changes. But this ignorance is no argument to justify the apostrophe cull. Where would it all end. Given the most common forms of written communication in our modern age, how long would it be, as one commentator has quipped, before Birmingham’s Great Charles Street becomes GR8 Chas St? First they came for the apostrophes…

Pedants revolt

Who says pedants don't have a sense of humour?