Cathy O’Neil

The math-­powered applications powering the data economy were based on choices made by fallible human beings. Some of these choices were no doubt made with the best intentions. Nevertheless, many of these models encoded human prejudice, misunderstanding, and bias into the software systems that increasingly managed our lives. Like gods, these mathematical models were opaque, their workings invisible to all but the highest priests in their domain: mathematicians and computer scientists. Their verdicts, even when wrong or harmful, were beyond dispute or appeal. And they tended to punish the poor and the oppressed in our society while making the rich richer.

3 thoughts on “Cathy O’Neil

  1. shinichi Post author

    The Math Whizzes Who Nearly Brought Down Wall Street
    After 2008, it was all too clear that the housing crisis, the collapse of major financial institutions, and the rise of unemployment had been aided and abetted by mathematicians wielding magic formulas.

    by Cathy O’Neil

    When I was a little girl, I used to gaze at the traffic out the car window and study the numbers on license plates. I would reduce each one to its basic elements — the prime numbers that made it up. 45 = 3 x 3 x 5. That’s called factoring, and it was my favorite investigative pastime. As a budding math nerd, I was especially intrigued by the primes.

    My love for math eventually became a passion. I went to math camp when I was 14 and came home clutching a Rubik’s Cube to my chest. Math provided a neat refuge from the messiness of the real world. It marched forward, its field of knowledge expanding relentlessly, proof by proof. And I could add to it. I majored in math in college and went on to get my Ph.D. My thesis was on algebraic number theory, a field with roots in all that factoring I did as a child. Eventually, I became a tenure-track professor at Barnard, which had a combined math department with Columbia University.

    And then I made a big change. I quit my job and went to work as a quant [quantitative analyst] for D.E. Shaw, a leading hedge fund. In leaving academia for finance, I carried mathematics from abstract theory into practice. The operations we performed on numbers translated into trillions of dollars sloshing from one account to another. At first I was excited and amazed by working in this new laboratory, the global economy. But in the autumn of 2008, after I’d been there for a bit more than a year, it came crashing down. The crash made it all too clear that mathematics, once my refuge, was not only deeply entangled in the world’s problems but also fueling many of them. The housing crisis, the collapse of major financial institutions, the rise of unemployment — all had been aided and abetted by mathematicians wielding magic formulas. What’s more, thanks to the extraordinary powers that I loved so much, math was able to combine with technology to multiply the chaos and misfortune, adding efficiency and scale to systems that I now recognized as flawed.

    If we had been clear-headed, we all would have taken a step back at this point to figure out how math had been misused and how we could prevent a similar catastrophe in the future. But instead, in the wake of the crisis, new mathematical techniques were hotter than ever and expanding into still more domains. They churned 24/7 through peta­bytes of information, much of it scraped from social media or e-commerce websites. And increasingly they focused not on the movements of global financial markets but on human beings, on us. Mathematicians and statisticians were studying our desires, movements, and spending power. They were predicting our trustworthiness and calculating our potential as students, workers, lovers, criminals.

    This was the Big Data economy, and it promised spectacular gains. A computer program could speed through thousands of résumés or loan applications in a second or two and sort them into neat lists, with the most promising candidates on top. This not only saved time but also was marketed as fair and objective. After all, it didn’t involve prejudiced humans digging through reams of paper, just machines processing cold numbers. By 2010 or so, mathematics was asserting itself as never before in human affairs, and the public largely welcomed it.

    The math-powered applications powering the data economy were based on choices made by fallible human beings.
    Yet I saw trouble. The math-­powered applications powering the data economy were based on choices made by fallible human beings. Some of these choices were no doubt made with the best intentions. Nevertheless, many of these models encoded human prejudice, misunderstanding, and bias into the software systems that increasingly managed our lives. Like gods, these mathematical models were opaque, their workings invisible to all but the highest priests in their domain: mathematicians and computer scientists. Their verdicts, even when wrong or harmful, were beyond dispute or appeal. And they tended to punish the poor and the oppressed in our society while making the rich richer.

    I came up with a name for these harmful kinds of models: Weapons of Math Destruction, or WMDs for short. I’ll walk you through an example, pointing out its destructive characteristics along the way.

    As often happens, this case started with a laudable goal. In 2007, Washington, D.C.’s new mayor, Adrian Fenty, was determined to turn around the city’s underperforming schools. He had his work cut out for him: At the time, barely one out of every two high school students was persisting to graduation after ninth grade, and only 8 percent of eighth graders were performing at grade level in math. Fenty hired an education reformer named Michelle Rhee to fill a powerful new post, chancellor of Washington’s schools.

    The going theory was that the students weren’t learning enough because their teachers weren’t doing a good job. So in 2009, Rhee implemented a plan to weed out the low-performing teachers.

    This is the trend in troubled school districts around the country, and from a systems engineering perspective the thinking makes perfect sense: Evaluate the teachers. Get rid of the worst ones, and place the best ones where they can do the most good. In the language of data scientists, this “optimizes” the school system, presumably ensuring better results for the kids. Except for “bad” teachers, who could argue with that? Rhee developed a teacher assessment tool called IMPACT, and at the end of the 2009–10 school year, the district fired all the teachers whose scores put them in the bottom 2 percent. At the end of the following year, another 5 percent, or 206 teachers, were booted out.

    Sarah Wysocki, a fifth-grade teacher, didn’t seem to have any reason to worry. She had been at MacFarland Middle School for only two years but was already getting excellent reviews from her principal and her students’ parents. One evaluation praised her attentiveness to the children; another called her “one of the best teachers I’ve ever come into contact with.”

    Yet at the end of the 2010–11 school year, Wysocki received a miserable score on her IMPACT evaluation. Her problem was a new scoring system known as value-added modeling, which purported to measure her effectiveness in teaching math and language skills. That score, generated by an algorithm, represented half of her overall evaluation, and it outweighed the positive reviews from school administrators and the community. This left the district with no choice but to fire her, along with 205 other teachers who had IMPACT scores below the minimal threshold.

    There was a logic to the school district’s approach. ­Administrators, after all, could be friends with terrible teachers. They could admire their style or their apparent dedication. Bad teachers can seem good. So Washington, like many other school systems, would minimize this human bias and pay more attention to scores based on hard results: achievement scores in math and reading. The numbers would speak clearly, district officials promised. They would be more fair.

    Data scientists all too often lose sight of the folks on the receiving end of the transaction.
    Wysocki, of course, felt the numbers were horribly unfair, and she wanted to know where they came from. “I don’t think anyone understood them,” she later told me. How could a good teacher get such dismal scores?

    Well, she learned, it was complicated. The district had hired a consultancy, Princeton-based Mathematica Policy Research, to come up with the evaluation system. Mathematica’s challenge was to measure the educational progress of the students in the district and then to calculate how much of their advance or decline could be attributed to their teachers. This wasn’t easy, of course.

    The researchers knew that many variables, from students’ socioeconomic backgrounds to the effects of learning disabilities, could affect student outcomes. The algorithms had to make allowances for such differences, which was one reason they were so complex.

    Attempting to reduce human behavior, performance, and potential to algorithms is no easy job. “There are so many factors that go into learning and teaching that it would be very difficult to measure them all,” Wysocki says. What’s more, attempting to score a teacher’s effectiveness by analyzing the test results of only 25 or 30 students is statistically unsound, even laughable.

    The numbers are far too small given all the things that could go wrong. Indeed, if we were to analyze teachers with the statistical rigor of a search engine, we’d have to test them on thousands or even millions of randomly selected students. Statisticians count on large numbers to balance out exceptions and anomalies. (And WMDs often punish individuals who happen to be the exception.)

    Equally important, statistical systems require feedback — something to tell them when they’re off track. Statisticians use errors to train their models and make them smarter. If, through a faulty correlation, started recommending lawn care books to teenage girls, the clicks would plummet, and the algorithm would be tweaked until it got it right. Without feedback, however, a statistical engine can continue spinning out faulty and damaging analysis while never learning from its mistakes.

    Many WMDs behave like that. They define their own reality and use it to justify their results. This type of model is self-perpetuating, highly destructive — and very common.

    When Mathematica’s scoring system tags Sarah Wysocki and 205 other teachers as failures, the district fires them. But how does it ever learn if it was right? It doesn’t. The system itself has determined that they were failures, and that is how they are viewed. Two hundred and six “bad” teachers are gone. That fact alone appears to demonstrate how effective the value-added model is.

    It is cleansing the district of underperforming teachers. Instead of searching for the truth, the score comes to embody it.

    This is just one example of a WMD feedback loop. Others include employers who are increasingly using credit scores to evaluate potential hires. Those who pay their bills promptly, the thinking goes, are more likely to show up to work on time and follow the rules. In fact, there are plenty of responsible people and good workers who suffer misfortune and see their credit scores fall.

    But the belief that bad credit correlates with bad job performance leaves those with low scores less likely to find work. Joblessness pushes them toward poverty, which further worsens their scores, making it even harder for them to land a job. It’s a downward spiral. And employers never learn how many good employees they’ve missed out on by focusing on credit scores. In WMDs, many poisonous assumptions are camouflaged by math and go largely untested and unquestioned.

    For years, Washington teachers complained about the arbitrary scores and clamored for details on what went into them. It’s an algorithm, they were told. It’s very complex. That’s the nature of WMDs. The analysis is outsourced to coders and statisticians. And as a rule, they let the machines do the talking.

    You cannot appeal to a WMD. That’s part of their fearsome power. They do not listen. Nor do they bend. They’re deaf not only to charm, threats, and cajoling but also to logic — even when there is good reason to question the data that feed their conclusions. Yes, if it becomes clear that automated systems are screwing up on an embarrassing and systematic basis, programmers will go back in and tweak the algorithms. But for the most part, the programs deliver unflinching verdicts, and the human beings employing them can only shrug, as if to say, “Hey, what can you do?” The human victims of WMDs are held to a far higher standard of evidence than the algorithms themselves.

    After the shock of her firing, Sarah Wysocki was out of a job for only a few days. She had plenty of people, including her principal, to vouch for her as a teacher, and she promptly landed a position at a school in an affluent district in northern Virginia. So thanks to a highly questionable model, a poor school lost a good teacher, and a rich school, which didn’t fire people on the basis of their students’ scores, gained one.

    Ill-conceived mathematical models now micromanage the economy, from advertising to prisons. These WMDs have many of the same characteristics as the model that derailed Sarah Wysocki’s career in Washington’s public schools. They’re opaque, unquestioned, and unaccountable, and they operate at a scale to sort, target, or “optimize” millions of people. By confusing their findings with on-the-ground reality, most of them create pernicious WMD feedback loops.

    But there’s one important distinction between a school district’s model and, say, a WMD that scouts out prospects for extortionate payday loans. They have different payoffs. For the school district, the payoff is a kind of political currency, a sense that problems are being fixed. But for businesses, it’s just the standard currency: money. For many of the businesses running these rogue algorithms, the money pouring in seems to prove that their models are working. Look at it through their eyes and it makes sense. When they’re building statistical systems to find customers or manipulate desperate borrowers, growing revenue appears to show that they’re on the right track. The software is doing its job. The trouble is that profits end up serving as a stand-in or proxy for truth. This dangerous confusion crops up again and again.

    This happens because data scientists all too often lose sight of the folks on the receiving end of the transaction. They certainly understand that a data-crunching program is bound to misinterpret people a certain percentage of the time, putting them in the wrong groups and denying them a job or a chance at their dream house. But as a rule, the people running the WMDs don’t dwell on those errors. Their feedback is money, which is also their incentive. Their systems are engineered to gobble up more data and fine-tune their analytics so that more money will pour in. Investors, of course, feast on these returns and shower WMD companies with more money.

    But the poor are hardly the only victims of WMDs. Far from it. Malevolent models can blacklist qualified job applicants and dock the pay of workers who don’t fit a corporation’s picture of ideal health. These WMDs hit the middle class as hard as anyone. Even the rich find themselves micro-targeted by political models. And they scurry about as frantically as the rest of us to satisfy the remorseless WMD that rules college admissions and pollutes higher education.

    It’s also important to note that these are the early days. Naturally, payday lenders and their ilk start off by targeting the poor and the immigrants. Those are the easiest targets, the low-hanging fruit. They have less access to information, and more of them are desperate. But WMDs generating fabulous profit margins are not likely to remain cloistered for long in the lower ranks. That’s not the way markets work. They’ll evolve and spread, looking for new opportunities. WMDs are targeting us all. And they’ll continue to multiply, sowing injustice, until we take steps to stop them.

    How do we start to regulate the mathematical models that run more and more of our lives? I would suggest that the process begin with the modelers themselves. Like doctors, data scientists should pledge a “First do no harm” Hippocratic Oath, one that focuses on the possible misuses and misinterpretations of their models. Following the market crash of 2008, two financial engineers, Emanuel Derman and Paul Wilmott, drew up such an oath. It reads in part:

    ~ I will remember that I didn’t make the world, and it doesn’t satisfy my equations.

    ~ Though I will use models boldly to estimate value, I will not be overly impressed by mathematics.

    That’s a good philosophical grounding. But solid values and self-regulation rein in only the scrupulous. To eliminate WMDs, our laws need to change, too.

    Data is not going away. Nor are computers — much less mathematics. Predictive models are, increasingly, the tools we will be relying on to run our institutions, deploy our resources, and manage our lives. But these models are constructed not just from data but from the choices we make about which data to pay attention to — and which to leave out. Those choices are not just about logistics, profits, and efficiency. They are fundamentally moral.

    We must come together to police these WMDs, to tame and disarm them. My hope is that they’ll be remembered, like the deadly coal mines of a century ago, as relics of the early days of this new revolution, before we learned how to bring fairness and accountability to the age of data. Math deserves much better than WMDs, and democracy does too.

  2. shinichi Post author


    by citta_invisibliあなたを支配し、社会を破壊する、AI・ビッグデータの罠–キャシー・オニール/dp/4772695605/

    【以下は原著Weapons of Math Destructionのレビューの転載。翻訳版は未読です】

    そのような数学的モデルの中には社会に大きな害を与えているものもある。本書の著者キャシー・オニールが「数学的破壊兵器(Weapons of Math Destruction、以下WMD)」と呼ぶのはそのようなモデルである。彼女によればWMDとは、人々を評価する、ランク付けする、分類する、あるいは人々の行動を予測するなどの目的で作られた統計的数学的モデルであり、大規模に使用され、間違っていて、不公正で、有害で、不透明で、自己成就予言的であるといった特徴を持つものである。





    本書の内容からはややはみ出すが、人間や組織を評価するための数学的モデルを作るという営為そのものが包含するバイアスについても私たちは注意を払うべきかもしれない。というのも数学的モデルを作るという決定がすでに、定量化できない価値、コンピューターで処理できるデータの形式に落とせない価値については無視するというある種の重大な価値判断を含んでいるからである。あることを評価する数学的モデルを作るという決定それ自体がバイアスのかかったものなのだ(この点については例えばRobert J. Whelchel, “Is Technology Neutral?” IEEE Technology and Society Magazine, Vol. 5(4), Dec. 1986を参照されたい)。科学技術と市場経済を中心にして築かれた社会において、「定量化して評価するべし」というプレッシャーは強く、本来そのような評価がそぐわない領域(教育、行政、医療、芸術、学問など)までも浸食している。


    しかし筆者はこのシステムには問題があると指摘する。LSI-Rの質問の中には例えば「初めて警察と関わったのはいつか」というようなものもあるのだが、著者によればこの質問は不正である。というのも警官は裕福な人々が住む地域よりも貧しい人々が住む地域を重点的に見回り、そして白人より黒人やヒスパニックをはるかに頻繁に呼び止めて職務質問や身体検査をするからだ。このためたとえば未成年飲酒や喧嘩や大麻の所持などの軽い犯罪で捕まる可能性は平等ではない。「初めて警察と関わったのはいつか」というような質問は暗に回答者の生まれや育ち、家族や近隣の環境、人間関係といったことを計算に入れる効果を持つのである。裁判の場ではこういった要素が判決に影響を与えることは認められない。著者が言うように法廷においては「我々は行いによって裁かれるのであり、どんな人間であるかによって裁かれるのではない」(原著 p. 26)。しかしLSI-Rは実質的にそういった要素を、客観性の名のもとに、システマティックに組み込んでいる。




    現在のデータ経済においては、情報空間を満たす大量のデータの中から、微妙なパターンを見つけ出して、利用できるものが莫大な富を手にすることができる。それゆえに企業はますます多くの、ますます多種多様なデータを取得することに力を注ぎ、そのデータを活かして利益を上げる方法を探している。もちろんそれらすべてが悪辣なものというわけではない。しかし一部には確実に、倫理的に問題があるもの、あるいは民主主義の基本的理念に反するものがある(それゆえ例えば法学者の山本龍彦は著書『恐ろしいビッグデーターー超類型化AI社会のリスク』(朝日新書、2017)において、貪欲なデータの取得と利用に対抗するために、憲法に定められた権利を武器にすることを提案する)。とはいえ現在のところ、たとえそれが不正なものでも、弱者やマイノリティを食い物にすることによって成り立っているものでも、多くは従来の規制や法の手の届かないところにある。現在の状況は、産業革命の後、企業が労働者や環境を濫用することで利益を得ていた時代のようなものだ、と著者は考えている。私たちの社会は少しずつその状況を改善して、企業活動に規制を課し、人々が不当な不利益を被ることがないようにしてきた。それゆえ著者はWMDに関しても何らかの監査や規制を設け、「取り締まり、手なずけ、非武装化する」(原著 p. 218)ことを提案している。データに基づいた数学的モデルはこれからもますます利用されていくだろう。その流れをとめることはできない。しかしそれらは一部の人たちの利益のために多くの人を犠牲にするような使い方ではなく、社会を安全にするために、苦しんでいる人々を救うためにも利用できる。しかしそのためには現在のように野放しに使われていてはいけない。

    「数学的モデルは私たちの道具であるべきで、主人であるべきではない」(原著 p. 207)と著者はいう。もともとが数学者であった著者は、数学的モデルの濫用を嘆き、それがよりよい仕方で使われるように訴えている。そして「数学はWMDよりもずっと良いものに値するし、民主主義も同様である」(原著 p. 218)という言葉で本書を締めくくっている。


Leave a Reply

Your email address will not be published.