Does All Semiconductor Manufacturing Depend on Spruce Pine Quartz? (2024) (construction-physics.com)

I'm a contractor for one of these companies. It pays okay ($45+/hour) if you can pass qualifications for your area of expertise but the work isn't steady and communication is non-existent. The coding qualifications I did were difficult FAANG algorithm analysis questions. The work has definitely gotten harder over the last year and often says we need to come up with Masters/PhD level work or problems that someone with 5+ years of experience in a field would have difficulty solving. I wish I had a regular job but I live in rural North Carolina and remote work is hard to come by.

iandanforth · 1h ago

"Google said in a statement: “Quality raters are employed by our suppliers and are temporarily assigned to provide external feedback on our products. Their ratings are one of many aggregated data points that help us measure how well our systems are working, but do not directly impact our algorithms or models.” GlobalLogic declined to comment for this story." (emphasis mine)

How is this not a straight up lie? For this to be true they would have to throw away labeled training data.

creddit · 1h ago

Because they are doing it to compute quality metrics not to implement RLHF. It’s not training data.

Gracana · 1h ago

They probably don’t do it at a scale large enough to do RLHF with it, but it’s still useful feedback the people working on the projects / products.

zozbot234 · 1h ago

More recent models actually use "reinforcement learning from AI feedback", where the task of assigning a reward is essentially fed back into the model itself. Human feedback is then only used to ground the training, on selected examples (potentially even entirely artificial ones) where the AI is most highly uncertain about what feedback should be given.

teiferer · 1h ago

Key word: "directly"

It does so indirectly, so it's a true albeit misleading statement.

simonw · 1h ago

Something I'd be interested to understand is how widespread this practice is. Are all of the LLMs trained using human labor that is sometimes exposed to extreme content?

There are a whole lot of organizations training competent LLMs these days in addition to the big three (OpenAI, Google, Anthropic).

What about Mistral and Moonshot and Qwen and DeepSeek and Meta and Microsoft (Phi) and Hugging Face and Ai2 and MBZUAI? Do they all have their own (potentially outsourced) teams of human labelers?

I always look out for notes about this in model cards and papers but it's pretty rare to see any transparency about how this is done.

ics · 15m ago

I have been a generalist annotator for some of the others you mentioned, due to NDA will not specify which. I would venture to guess that basically all major models use some degree of human feedback if there is money coming in from somewhere.

yvdriess · 59m ago

One of the key innovations behind the DNN/CNN models was Mechanical Turk. OpenAI used a similar system extensively to improve the early GPT models. I would not be surprised that the practice continues today; NN models needs a lot of quality ground truth training data.

simonw · 51m ago

Right, but where are the details?

Given the number of labs that are competing these days on "open weights" and "transparency" I'd be very interested to read details of how some of them are handling the human side of their model training.

I'm puzzled at how little information I've been able to find.

whilenot-dev · 52m ago

So why do you think asking this question here would yield a satisfying answer, especially how the HN community likes to dispute any vague conclusions for anything as hyped as AI training?

To counter your question, what makes you think that's not the case? Do you think Mistral/Moonshot/Qwen/etc. are all employing their own data labelers? Why would you expect this kind of transparency from for-profit bodies that are evaluated in the billions?

simonw · 18m ago

If you don't ask the question you'll definitely not get an answer. Given how many AI labs follow Hacker News it's not a bad place to pose this.

"what makes you think that's not the case?"

I genuinely do not have enough information to form an opinion one way or the other.

whilenot-dev · 7m ago

> If you don't ask the question you'll definitely not get an answer.

Sure, but the way you're formulating the question is already casting an opinion. Besides, no one could even attempt to answer your questions without falling into the trap of true diligence... one question just asks how all (with emphasis!) LLMs are trained:

> Are all of the LLMs trained using human labor that is sometimes exposed to extreme content?

Who in the world would even be in such a position?

happy_dog1 · 41m ago

I've shared this once on HN before, but it's very relevant to this question and just a really great article so I'll reshare it here:

https://www.theverge.com/features/23764584/ai-artificial-int...

it explores the world of outsourced labeling work. Unfortunately hard numbers on the number of people involved are hard to come by because as the article notes:

"This tangled supply chain is deliberately hard to map. According to people in the industry, the companies buying the data demand strict confidentiality. (This is the reason Scale cited to explain why Remotasks has a different name.) Annotation reveals too much about the systems being developed, and the huge number of workers required makes leaks difficult to prevent. Annotators are warned repeatedly not to tell anyone about their jobs, not even their friends and co-workers, but corporate aliases, project code names, and, crucially, the extreme division of labor ensure they don’t have enough information about them to talk even if they wanted to. (Most workers requested pseudonyms for fear of being booted from the platforms.) Consequently, there are no granular estimates of the number of people who work in annotation, but it is a lot, and it is growing. A recent Google Research paper gave an order-of-magnitude figure of “millions” with the potential to become “billions.” "

I too would love to know more about how much human effort is going into labeling and feedback for each of these models, it would be interesting to know.

simonw · 9m ago

That was indeed a great article, but it is a couple of years old now. A lot of of the labeling work described there relates to older forms of machine learning - moderation models, spam labelers, image segmentation etc.

Is it possible in 2025 to train a useful LLM without hiring thousands of labelers? Maybe through application of open datasets (themselves based on human labor) that did not exist two years ago?

yanis_t · 45m ago

From my shallow understanding, it seems that human training is involved heavily in the post-training/fine-tuning stage, after the base model has been solidified already.

In that case, how is the notion of truthiness (what the model accepts as right or wrong) affected during this stage , that is affected by human beings vs. it being sealed into the basic model itself, that is truthiness being deduced by the method / part of its world model.

cs702 · 2h ago

The title is biased, blaming Google for mistreating people and implying that Google's AI isn't smart, but the OP is worth reading, because it gives readers a sense of the labor and cost involved in providing AI models with human feedback, the HF in RLHF, to ensure they behave in ways acceptable to human beings, more aligned with human expectations, values, and preferences.

zozbot234 · 1h ago

RLHF (and its evolution, RLAIF) is actually used for more than setting "values and preferences". It's what makes AI models engage in recognizable behavior, as opposed to simply continuing a given text. It's how the "Chat" part of "ChatGPT" can be made to work in the first place.

cs702 · 12m ago

Yes. I updated my comment to reflect as much. Thank you.

giveita · 1h ago

> Sawyer is one among the thousands of AI workers contracted for Google through Japanese conglomerate Hitachi’s GlobalLogic to rate and moderate the output of Google’s AI products...

Depends how you look at it. I think a brand like Google should vet a mere one level down the supply chain.

FirmwareBurner · 1h ago

I had no idea Hitachi was also running software sweatshops.

throwaway106382 · 1h ago

What is a "human value" and whose preferences?

rs186 · 1h ago

> to ensure the AI models are more aligned with human values and preferences.

to ensure the AI models are more aligned with Google's values and preferences.

FTFY

falcor84 · 1h ago

I'm a big fan of cyberpunk dystopian fiction, but I still can't quite understand what you're alluding to here. Can you give an example value that google align the AI with that you think isn't a positive human value?

ToucanLoucan · 1h ago

Their entire business model? Making search results worse to juice page impressions? Every dark pattern they use to juice subscriptions like every other SaaS company? Brand lock-in for Android? Paying Apple for prominent placement of their search engine in iOS? Anti-competitive practices in the Play store? Taking a massive cut of Play Store revenue from people actually making software?

simonw · 1h ago

How does all of that affect the desired outputs for their LLMs?

scotty79 · 2m ago

You'll see once they figure it out.

Ygg2 · 1h ago

"Adtech is good. Adblockers are unnatural"

smokel · 1h ago

Google Gemini 2.5 Pro actually has a quite nuanced reply when asked to consider this statement, including the following:

> "Massive privacy invasion: The core of modern adtech runs on tracking your behavior across different websites and apps. It collects vast amounts of personal data to build a detailed profile about your interests, habits, location, and more, often without your full understanding or consent."

watwut · 34m ago

Google likes it when it can show you more ads, it is not positive human value.

It does not have to have anything ro do with cyberpunk. Corporations are not people, but if they were people, they would be powerful sociopaths. Their interests and anybody elses interests are not the same.

add-sub-mul-div · 1h ago

Yes, and one more tweak: the values of Google or anyone paying Google to deliver their marketing or political messaging.

lm28469 · 2h ago

> to ensure the AI models are more aligned with human values and preferences.

And which are these universal human values and preferences ? Or are we talking about silicon valley's executives values ?

alehlopeh · 17m ago

Well, it doesn’t say universal so it’s clearly going to be a specific set of human values and preferences. It’s obviously referring to the preferences of the humans who are footing the bill and who stand to profit from it. The extent to which those values happen to align with those of the eventual consumer of this product could potentially determine whether the aforementioned profits ever materialize.

zerodaysbroker · 1h ago

The title seems kinda misleading, this is from the article (GlobalLogic is the company contracted by Google):

"AI raters at GlobalLogic are paid more than their data-labeling counterparts in Africa and South America, with wages starting at $16 an hour for generalist raters and $21 an hour for super raters, according to workers. Some are simply thankful to have a gig as the US job market sours, but others say that trying to make Google’s AI products better has come at a personal cost."

imperio59 · 1h ago

It's employment at will. They are free to go work somewhere else if they don't like it...

teiferer · 1h ago

That argument is as old as any mistreated worker complaining about their situation and as old as any argument against workers rights in general. Anybody not liking their job could just leave right? Simple! No, the world just isn't that simple and it didn't become simpler just because it happens in an AI context that produces a tool you like.

There are lots of jobs out there that suck and people do them anyway. Because the freedom that they supposedly have is not as free as you imagine.

bitshiftfaced · 21m ago

What explains not changing jobs because you find it distressing and claiming that you're being paid below what you're worth? It seems like if that were true, then you'd be motivated to find a job that pays market rate. And if you couldn't, then you could at least find another job that pays less than market rate, like your current job, but isn't so distressing.

refactor_master · 6m ago

Maybe these people are trying to keep their skills and degrees honed somehow in a bad market, rather than going straight for a less-distressing-but-also-lower-paying job that does nothing to their skillset.

simianwords · 11m ago

Their work doesn’t seem that bad. This article tries really hard to portray that a simple freelance desk job is somehow literally exploitation or something.

Lots of people would do anything to get such work.

ants_everywhere · 1h ago

When they switch to aligning with algorithms instead of humans we'll get another story about how terrible it was that they removed the jobs that were terrible when they existed.

This doesn't sound as bad to me as the Facebook moderator job or even a call center job, but it does sound pretty tedious.

oefrha · 45m ago

> [job] … has come at a personal cost.

Congratulations, you just described most jobs. And many backbreaking laborers make about the same or less, even in the U.S., not to mention the rest of the world.

mentalgear · 26m ago

In many things "AI" is just another form exploiting the poor to make the rich even wealthier. A form of digital colonialism.

kerblang · 2h ago

Are other AI companies doing the same thing? Would like to see more articles about this...

thepryz · 1h ago

Scale AI’s entire business model was using people in developing countries to label data for training models. Once you look into it, it comes across as rather predatory.

This was one of the first links I found re: Scale’s labor practices https://techcrunch.com/2025/01/22/scale-ai-is-facing-a-third...

Here’s another: https://relationaldemocracy.medium.com/an-authoritarian-work...

benreesman · 1h ago

There's nontrivial historical precedent for this exact playbook: when a new paradigm (Lisp machines and GOFAI search, GPU backprop, softmax self-attention) is scaling fast, a lot of promises get made, a lot of national security money gets involved, and AI Summer is just balmy.

But the next paradigm breakthrough is hard to forecast, and the current paradigm's asymptote is just as hard to predict, so it's +EV to say "tomorrow" and "forever".

When the second becomes clear before the first, you turk and expert label like it's 1988 and pray that the next paradigm breakthrough is soon, you bridge the gap with expert labeling and compute until it works or you run out of money and the DoD guy stops taking your calls. AI Winter is cold.

And just like Game of Thrones, no I mean no one, not Altman, not Amodei, not Allah Most Blessed knows when the seasons in A Song of Math and Grift will change.

lawgimenez · 1h ago

Couple of months ago I received a job invite for Kotlin AI trainers from the team at Upwork. I asked what the job is about and she says something like "for the opportunity to review & evaluate content for generative AI." And I'm from a developed country too.

jhbadger · 1h ago

Karen Hao's recent book "Empire of AI" about the rise of OpenAI goes into detail how people in Africa and South America were hired (and arguably exploited) for their training efforts.

jkkola · 1h ago

There's a YouTube video titled "AI is a hype-fueled dumpster fire" [0] that mentions OpenAI's shenanigans. I haven't fact checked that but I've heard enough stories to believe it.

[0] https://youtu.be/0bF_AQvHs1M?si=rpMG2CY3TxnG3EYQ

wslh · 1h ago

It seems a deja vu of previous Amazin's Mechanical Turk[1] discussions[2] but with AI.

[1] https://www.mturk.com/

[2] https://tinyurl.com/4r2p39v3

dolphinscorpion · 1h ago

"Google" posted a job opening. They applied for and took the job, agreeing to posted pay and conditions. End of the story. It's not up to the Guardian to decide

xkbarkar · 1h ago

I agree, article is pretty low quality ragebait. Not good journalism at all.

lysace · 47m ago

It is amazing how much their quality levels have fallen during the past two decades.

I used to point to their reporting as something that my nation’s newspapers should seek to emulate.

lysace · 1h ago

with wages starting at $16 an hour for generalist raters and $21 an hour for super raters, according to workers

That’s sort of what I expect the Guardian’s UK online non-sub readers to make.

Perhaps GlobalLogic should open a subsidiary in the UK?

mallowdram · 1h ago

Gemini is faked.

How this industry managed to not grasp that meaning exists entirely separate from words is altogether bizarre.

a3w · 1h ago

AI means actual indians, did we not learn that from the initial OpenAI GPT 3.0 training? It made it to HN.

philipallstar · 1h ago

If they're underpaid and overworked, by definition words that are relative to other options, they should go to one of the better options.

bflesch · 53m ago

The way you defend against an article citing "thousands of workers" by using a nitpicky criticism about grammar style makes me suspect that it raises a cognitive dissonance in your head that you are not ready to address yet.

sjiabq · 36m ago

This line of reasoning that goes “I don’t like your comment, you should go to therapy” is very feminine.

bflesch · 19m ago

Let's hope you are as good at real gymnastics as you are at mental gymnastics.

sjiabq · 18m ago

Like when you said that quoting the title of the article verbatim was nitpicky criticism?

CPLX · 56m ago

Glad to learn from your post that the labor market has recently become perfectly competitive and efficient.

blactuary · 48m ago

Yeah they should simply buy widgets from the abundance of other widget sellers since this is a perfectly competitive market with no transaction costs and perfectly symmetric information

A store that generates products from anything you type in search (anycrap.shop)

My First Impressions of Gleam (mtlynch.io)

SkiftOS: A hobby OS built from scratch using C/C++ for ARM, x86, and RISC-V (skiftos.org)

UTF-8 is a brilliant design (iamvishnu.com)

Java 25's new CPU-Time Profiler (1) (mostlynerdless.de)

How to Use Claude Code Subagents to Parallelize Development (zachwills.net)

Weird CPU architectures, the MOV only CPU (2020) (justanotherelectronicsblog.com)

How 'overworked, underpaid' humans train Google's AI to seem smart (theguardian.com)

QGIS is a free, open-source, cross platform geographical information system (github.com)

Many hard LeetCode problems are easy constraint problems (buttondown.com)

FFglitch, FFmpeg fork for glitch art (ffglitch.org)

The treasury is expanding the Patriot Act to attack Bitcoin self custody (tftc.io)

Japan sets record of nearly 100k people aged over 100 (bbc.com)

Does All Semiconductor Manufacturing Depend on Spruce Pine Quartz? (2024) (construction-physics.com)

Resizing images in Rust, now with EXIF orientation support (alexwlchan.net)

Life, work, death and the peasant: Rent and extraction (acoup.blog)

Raspberry Pi Synthesizers – How the Pi is transforming synths (gearnews.com)

I used standard Emacs extension-points to extend org-mode (edoput.it)

Tips for installing Windows 98 in QEMU/UTM (sporks.space)

EU court rules nuclear energy is clean energy (weplanet.org)

Social media promised connection, but it has delivered exhaustion (noemamag.com)

Meow: Yet another modal editing on Emacs (github.com)

America's Largest Homebuilders Shift the Cost of Shoddy Construction to Buyers (hntrbrk.com)

3D modeling with paper (arvinpoddar.com)

I unified convolution and attention into a single framework (zenodo.org)

OCI Registry Explorer (oci.dag.dev)

Behind Kamathipura's Closed Doors (failedarchitecture.com)

AI Coding (geohot.github.io)

Reduce bandwidth costs with dm-cache: fast local SSD caching for network storage (devcenter.upsun.com)

Legal win (ma.tt)

How FOSS Projects Handle Legal Takedown Requests (f-droid.org)

Chatbox app is back on the US app store (github.com)

Close the loop: analytics that teach your chatbot to fix itself (hoverbot.ai)

Zodiac Sign Is 2k Year Out of Date (nytimes.com)

Corporations are trying to hide job openings from US citizens (thehill.com)

Discovery of a new satellite or ring arc around Quaoar (phys.org)

OpenAI Grove (openai.com)

Antlr-Ng Parser Generator (antlr-ng.org)

Windows-Use: an AI agent that interacts with Windows at GUI layer (github.com)

I don't like curved displays (blog.danielh.cc)

Unauthorized Windows/386 (virtuallyfun.com)

California lawmakers pass SB 79, housing bill that brings dense housing (latimes.com)

Oq: Terminal OpenAPI Spec Viewer (github.com)

Racintosh Plus – Rackmount Mac Plus (identity4.com)

How to become a pure mathematician or statistician (2008) (hbpms.blogspot.com)

Proton Mail suspended journalist accounts at request of cybersecurity agency (theintercept.com)

Rust: A quest for performant, reliable software [video] (youtube.com)

I Made a Mechanical Laptop (youtube.com)

Show HN: I made a generative online drum machine with ClojureScript (dopeloop.ai)

Automate compile_flags for C/C++ projects on the Zig build system (simonhartcher.com)

How 'overworked, underpaid' humans train Google's AI to seem smart

Comments (62)