The Gray Lady against Big Techs

Carlos Castilho
5 min readJan 19, 2024

--

At the end of December, The New York Times (NYT), nicknamed the Gray Lady, started a lawsuit for breach of copyright against OpenAI and Microsoft, accusing both of violating copyright laws in the US. The newspaper was also setting the scenario for a complex economic war involving billions of dollars between the legacy press and the new big tech corporations.

Photo Wikimedia / CC

It’s a complicated confrontation because two different corporate strategies are at play, both concerned with the profitability of their respective business models (1). The battle between the NYT and OpenAi, linked to Microsoft, marks a new episode in the race by high-tech companies in search of information archived in digitized databases which are, for the most part, under the control of legacy enterprises. Artificial Intelligence depends on the existence of voluminous databases because it is based on searching for information through pre-programmed algorithms (electronic robots).

It was precisely the artificial intelligence algorithms used by OpenAI/Microsoft that snooped on the NYT database without paying anything, in an operation classified as search training. The newspaper had already reached rental agreements for its database with the companies Meta (Facebook), Google and Apple, but the lack of dialogue with OpenAI ended up taking the matter to the North American courts.

See below a comparison between texts originally published by the NYT and later reproduced by GPT:

Printscreen published by Jason Kint on X (ex Twitter, comparing texts from NYT and GPT

The dispute over the ‘morgue’

The NYT’s database of digitized information, also known as the Morgue, contains 13 million texts and eight million photos, graphics and drawings, all produced since September 18, 1851, the newspaper’s founding date. Most of the database is hosted in the digital cloud of the company Alphabet, owner of Google. There is no official information on the amount paid by the paper for digitizing the entire archive of printed editions, but there are estimates that the total could have reached 50 million dollars.

The storage of statistics, numbers, figures, facts, illustrations and news over almost two centuries meant that only a few news corporations were able to set up commercially viable databases. Those who digitized their files thought they would recover the million-dollar investment they made by selling information, but ended up discovering a gold mine in artificial intelligence. Journalism companies’ databases are at the centre of the battle with Big Techs because they contain contextualized data, more easily processed by AI algorithms than raw figures from most financial and demographic archives.

For their part, the large digital platforms that control social networks handle an infinitely greater amount of data daily than that of a daily edition of a major newspaper. But the platforms are less than 20 years old, so their archive is very recent compared to that of newspapers that are more than a century old, establishing a paradoxical situation of mutual dependence, despite the financial dispute. Artificial Intelligence loses profitability without journalistic databases and the press needs AI information to survive in the race for “news datafication”. (2)

Artificial intelligence is currently an unregulated space, a situation frequently present in the first years of any technological innovation when the lack of rules and social control creates conditions for abuse and unscrupulous actions. Digital platforms have forgotten their initial idealism when they promised a better world and now put profit above all else, like any large old-fashioned multinational corporation. Just look at how social networks turn a blind eye to fake news and misinformation to ensure growing revenue.

Data colonialism

This is not the first time that large companies have resisted technological innovations in the area of communication. This happened 90 years ago on the radio, as shown in the article America’s Press-Radio War of the 1930s, by Gwenyth Jackaway, from Fordham University, published in 1994 (3). The major newspapers, at the time, tried for 10 years to prevent the radio broadcasting of news, fearing they would lose advertising revenue for the printing press. The same blocking of innovation happened in the 40s of the last century when the company RCA delayed the introduction of FM (Modulated Frequency) in its radio broadcasts for almost a decade to preserve the profitability of AM (Extended Modulation) programming, which emerged at the beginning of the 20th century.

Everything means that, as in the past, the battle between big tech and the press should end in a legal draw because the parties involved will end up discovering that it is preferable to ‘lose the rings to save the fingers’. The NYT lawsuit is a move to gain positions of strength when the cohabitation agreement with digital platforms and AI tech companies becomes inevitable. The main weapons of the press in this court and political battle will be conservatism and slowness of justice and legislative power. Big techs will take advantage of their opponent’s lack of knowledge and lack of intimacy in the use of technological tools to advance the exploration of new applications based on artificial intelligence. It won’t be a surprise if, when some sort of agreement is achieved, the Big Techs launch a new application that can bypass what has been settled.

However, there is a serious problem that is not addressed by the large communication conglomerates nor by the five largest digital technology companies (Meta, Alphabet, Apple, Microsoft and Twitter). The data that is at the centre of the fight over artificial intelligence does not belong to any of the parties involved and if the issue of rights were respected, it would have an original owner. The data in question was extracted from our conversations, research, commercial transactions, texts, images and sounds, without us receiving payment for the same copyrights that are now disputed by the press and big techs. This appropriation has already been called “data colonialism”. (4)

(1) More details about the NYT lawsuit at https://ankurraina.medium.com/new-york-times-vs-microsoft-openai-quick-d-ac7bd579bb50

(2) Datafication is producing journalistic news based on the interpretation and processing of digitized data. (More details at /(Datafication of Journalism: Strategies for data Driven Storytelling and Industry)

(3) See Jackaway, Gwenyth. (1994). <i>America’s press-radio war of the 1930s: a case study in battles between old and new media</i>. Historical Journal of Film, Radio and Television, 14(3), 299–314. doi:10.1080/01439689400260211

(4) Data colonialism is an expression created by British sociologist Nick Couldry. See at https://www.sup.org/books/title/?id=28816

--

--

Carlos Castilho
Carlos Castilho

Written by Carlos Castilho

Jornalista, pesquisador em jornalismo comunitário e professor. Brazilian journalist, post doctoral researcher, teacher and media critic

No responses yet