AI Ethics Crisis: Data Theft, Bias and Surveillance

“Dude. Just move on. You did your hard work reporting the case, but your article is not your IP.”

This was a recent post on X, responding to a journalist who spoke about how his carefully researched and reported article had been used without permission by a video content provider. The content provider said they had acknowledged the journalist’s work in the video and that should be enough. But they had not asked for consent or even informed the journalist that they would be using his work.

As expected, people have taken sides, with some saying the journalist’s work is his and just being in the public domain does not give anybody the right to use it for free. Others echo the post cited above and say all is fair in love and the Internet.

In this case, there is a content company versus a journalist. But in an increasing number of cases, it is artificial intelligence models versus original content creators. It’s theft, cry the artists and creators whose works have been scraped to train massive AI tools. It’s training, say the AI companies.

Decades before AI was a thing, computer scientists, legal scholars, labour researchers, writers, and civil rights activists called for ethical guard rails to keep machine intelligence in check. Back in 1942, Isaac Asimov introduced the three laws of robotics in a short story. These were intended to be a moral code that would prevent robots or machines from becoming dangerous.

More than 80 years later, Pope Leo XIV has issued an encyclical on morality in the age of AI. “If technological development advances without a corresponding ethical and social progress,” his encyclical warns, “the result may be an increase in means without a growth in humanity: ‘having more’ without ‘being more’.”

That AI needs an ethical or moral core is now a mainstream topic. The tech industry has not ignored this. It has simply absorbed it into corporate language. And so, we have “responsible AI”, “trustworthy AI”, and “ethical frameworks” dropped into any AI conversation even as AI systems grow more powerful, more deeply embedded in everyday life, and more insulated from public accountability..

The problem with AI ethics is not a lack of principles. It is that the systems are built around forms of extraction—of data, of labour, of creative work, and of trust—while the mechanisms needed to regulate that extraction remain weak, uneven, or absent.

The myth of neutrality

AI does not need ethical guard rails, say its strongest votaries. It is neutral and human users are biased, not the product. Tools do not have intentions, the argument goes; a hammer is a hammer whether it builds a house or breaks a window, so how can it be biased? Extend that logic to AI: these are machines crunching data and predicting outcomes based on that data. It is a tool that learns. Where, exactly, is the bias? This is an argument that brings comfort because if the machine is neutral, humans bear no responsibility for what it produces.

But neutrality is a myth. A hammer, unlike AI, was not built by studying millions of past hammer swings and learning from them which kinds of swings to repeat. AI is the product of human choices: what data to feed it, what outcomes to optimise for, what trade-offs to accept, and whose voices to include in its design..

AI Ethics Crisis: Data Theft, Bias and Surveillance

A recent post on X shows this bias in action. A subscriber whose first name is Jen showed how an AI tool behaved when it was fed the same information with just one name changed. She said she fed her résumé to Gemini, Google’s AI tool, along with a lot of other information, and asked for a new résumé. Gemini provided. Only, it marked one of her key projects as “community service”. “The project I’m proudest of, flattened into a footnote. It turned me into a helper who pitched in a little. Not the architect,” Jen posted. Then she fed the same information with the same request—and changed her name to Jeff. “The difference wasn’t in the facts,” she posted. “It was in the framing. Jennifer’s volunteer work was ‘community service’. Jeff’s was ‘leadership’. Jennifer ‘assisted with’ and ‘collaborated on’. Jeff ‘engineered’ and ‘architected’.”

The myth of neutrality is not just wrong, it is actively dangerous because there is no accountability. Nobody is going to blame Gemini or the people who train it. And yet, they have created a machine that can make or break careers and lives.

The bias is in how an AI system learns. It ingests enormous quantities of data, text, images, behavioural patterns, and historical records, and extracts a model of how the world works. Ask anyone who writes AI prompts and they will tell you that more data means more truth. But here is what nobody says out loud: the data record a world that is biased, which includes social hierarchies, exclusions, and historical injustices. Feed a machine the past, and it will replicate the past, with superhuman confidence and at superhuman scale.

This is not something new. In the early days of the AI era, one person defined the debate around bias and ethics in AI: Timnit Gebru, then a Google computer scientist working on algorithmic bias.

In 2018, Gebru and her colleague Joy Buolamwini published the “Gender Shades” study, which audited commercial facial recognition software and found that while these systems identified white men with accuracy rates consistently above 99 per cent, the rate of misclassification for darker-skinned women dropped to as low as 65.3 per cent. In some cases, the misclassification rate for dark-skinned women was more than 35 times higher than for light-skinned men..

These systems are used by governments and law enforcement to identify suspects, screen passport holders, track crowds, and determine who gets flagged as a person of interest. An error rate of 35 per cent for one demographic group is not a technical glitch. It is a structural injustice.

The cost of data hunger

Large language models like GPT-4 and Claude are trained on quantities of text so vast they are difficult to comprehend: essentially the indexable Internet, billions of books, academic papers, social media posts, forum discussions, and digitised archives going back decades. Image generation models have been trained on hundreds of millions of photographs and artworks. The scale is staggering, and it raises a question that is now entering courtrooms: whose content is this, and who gave permission for it to be used?

In December 2023, The New York Times sued OpenAI and Microsoft, alleging that its journalism had been used without consent or compensation to train AI systems that now directly compete with the publication by generating news summaries. Dow Jones and The New York Post followed with their own lawsuits against Perplexity AI. Writers, visual artists, musicians, and software developers have filed class-action claims arguing that their creative work was scraped from the Internet and fed into machines without their knowledge or approval.

In May 2025, the US Copyright Office released a document that concluded that certain uses of copyrighted material in AI training cannot be defended as fair use. That was a significant analysis, and a major blow to the industry’s preferred legal position. The report found that where AI outputs are substantially similar to the training data, there is “a strong argument” that the models themselves infringe on the reproduction rights of the original works..

In their paper “The Great Scrape: The Clash Between AI Scraping and Privacy”, the legal scholars Daniel Solove and Woodrow Hartzog say that scraping, “the automated extraction of large amounts of data from the internet”, has worked so far because scrapers act as if all publicly available data are free for the taking.

The snooping state

The question of data is not only about ownership. It is about surveillance, and India is where this manifests most starkly. The country has embraced AI-driven monitoring with enthusiasm, and with almost no legislative framework to govern it.

At the Maha Kumbh in early 2025, one of the largest human gatherings on earth, police deployed facial recognition technology alongside 2,700 AI-enhanced CCTV cameras to monitor the crowd.

In cities like Hyderabad, extensive camera networks enable real-time tracking across public spaces. In Kashmir, 300 AI-enabled cameras scan crowds for “persons of interest”. Along the Jammu-Srinagar National Highway, a facial recognition system was installed to “pre-empt attacks”.

That AI needs an ethical or moral core is now a mainstream topic. The problem with AI ethics is not a lack of principles. It is that the systems are built around forms of extraction—of data, of labour, of creative work, and of trust—while the mechanisms needed to regulate that extraction remain weak, uneven, or absent.
| Photo Credit:
Getty Images

India does not have a dedicated law authorising the use of this technology. The 2023 Digital Personal Data Protection Act permits the government to process personal data without consent in certain specified cases, and that seems to have given authorities free rein. The Supreme Court had ruled that any state action infringing on the right to privacy must satisfy tests of legality, necessity, and proportionality. Whether mass biometric surveillance at a religious festival meets that test remains contested.

India’s situation is not unique. It is, in many ways, a preview of where the world is heading. From the Metropolitan Police’s use of live facial recognition on London’s streets to China’s integration of AI surveillance into its social credit architecture, governments everywhere are discovering that AI gives them capabilities of monitoring and control that would have been unthinkable a decade ago.

The technology moves faster than the law; the law moves faster than democratic accountability; and by the time accountability arrives, the infrastructure is already embedded. Against this backdrop, a number of AI companies have made increasingly public commitments to what they variously call “responsible AI”, “trustworthy AI”, or what Anthropic calls “constitutional AI”. These are not entirely empty phrases, but they are also not sufficient substitutes for external accountability.

What price ethics?

Constitutional AI, as Anthropic describes it, attempts to embed ethical principles directly into the training process. Rather than relying solely on human feedback to shape model behaviour, it trains the model on a set of principles, or a constitution, and uses AI feedback to evaluate whether outputs conform to those principles. The values embedded include helpfulness, harmlessness, and honesty.

The results have been revealing in their imperfections. Studies examining AI systems built on these principles found persistent bias even after mitigation efforts: tendencies to favour certain demographic groups, to disadvantage older users, to reflect in subtle ways the values of the predominantly Western, educated dataset from which these systems learned. Mitigation strategies, including anti-discrimination prompts, showed mixed results. The inescapable conclusion is that you cannot simply write a constitution for a machine and expect the machine to be just. Justice requires not just principles but power structures that enforce them and accountability mechanisms that catch failures.

Anthropic, OpenAI, Google, Meta. These are American companies, shaped by American legal traditions, American cultural assumptions, and American investor expectations. Their AI systems are deployed globally, including across India, Africa, South-East Asia, and Latin America, where the communities affected have had no seat at the table when the constitutions were written.

The human cost

The process of training AI systems requires not only vast quantities of data but enormous amounts of human judgment. Someone must label the data, classify images, correct transcriptions, and flag harmful content. Someone must perform “reinforcement learning from human feedback” and review AI outputs. Much of this work is done by contractors in Kenya, the Philippines, India, and other countries in the Global South, who are paid low wages and exposed to disturbing content, including depictions of child abuse and graphic violence, that causes documented psychological harm.

Facial recognition cameras can identify people in crowds and retrieve their personal data. The Indian government has used such technology not just to prevent crimes but also to stifle protests.
| Photo Credit:
Getty Images

The economist and AI researcher Kate Crawford calls this (and her book) the Atlas of AI, or the mapping of data extraction, labour, and natural resources on which Silicon Valley depends.

The AI industry’s carbon footprint, driven by the energy demands of training and running large models, is substantial and growing. The environmental costs fall disproportionately on communities that have contributed least to AI development and will benefit least from its applications.

Healthcare AI systems trained predominantly on data from Western patients perform worse on patients from other populations, with potentially lethal consequences for misdiagnosis. Credit-scoring algorithms trained on historical lending data replicate the racial and economic exclusions of the past. Predictive policing tools, deployed in cities from New Delhi to Detroit, direct heightened surveillance towards already over-policed communities, creating a feedback loop: more policing produces more arrests, which produces more data confirming that these communities are high-risk, which produces more policing.

What will it take to get ‘ethical’ AI?

Ethics, in the context of AI, is not a checklist or a corporate values statement. It is a set of structural demands.

It requires transparency: the right to know when an AI system is making a decision about you, what data it used, and how it reached its conclusion. It requires accountability: the ability to challenge those decisions and to hold the organisations that deploy AI systems legally responsible for the harms they cause. It requires diversity: in the teams that build AI systems, in the data that train them, and in the communities that have genuine input into how they are governed. And it requires redistribution: a serious reckoning with the fact that the benefits of AI are flowing overwhelmingly to the already wealthy, while the costs, in terms of labour exploitation, environmental damage, algorithmic discrimination, and surveillance, are paid by the rest.

Pope Leo XIV’s concerns resonate here. In his first interview as pontiff, published in September 2025, he warned that “extremely rich people” investing in AI were “totally ignoring the value of human beings and of humanity”. Two months later, he posted on social media that the Church “calls all builders of AI to cultivate moral discernment as a fundamental part of their work”. That did not go down well with Silicon Valley: Marc Andreessen mocked the post online, and Peter Thiel dismissed Leo as a “woke American pope”. None of it stopped Pope Leo from issuing his encyclical the following May, warning that the world faces “new forms of dehumanisation” when humanity is not at the heart of technology.

What next?

AI is relatively new, but addressing its harms calls for societal change, not technical ones. We are at a defining moment in the history of technology: a moment when the systems being built will shape how future societies make decisions, how institutions exercise power, how individuals understand themselves and each other. The choices made now, about data, about accountability, about whose values are encoded and whose are excluded, may not be easily reversed.

Some of these stakes are already visible. Algorithmic nostalgia is reshaping how people relate to their own memories as platforms use machine learning to curate and amplify throwback content, turning personal history into an engagement metric. AI is generating synthetic voices, faces, and creative works that blur the line between human expression and machine production.

If the ethics crisis crisis in AI is to be more than just a chorus of concern, the next step must be structural. That means moving beyond corporate pledges and towards binding regulations, where governments set enforceable limits on surveillance, bias, and data extraction.

It means creating independent oversight bodies with the power to audit algorithms and sanction misuse. It means investing in global governance frameworks so that communities in Nairobi or Noida have as much say in shaping AI as those in Silicon Valley. And it means recognising the labour, environmental, and social costs of AI as part of its true price.

The question is not whether AI can be ethical but whether societies are willing to redistribute power so that ethics is enforced and not merely proclaimed.

Also Read | The creator is dead

Also Read | How to build an Indie model