(Note: The word "local" in the headline means "in Australia")
crowcroft · 35m ago
A serious company would not consider this research. Zero evidence of anything is presented. No one in the organisation or advisory board have any expertise in building modern LLMs, and most are self-fashioned AI Execs that moved into the space all of about 2 years ago...
Looking for generous donors with this headline I'm sure.
Maxious · 1h ago
> Our AI future is being built overseas. We can’t afford that
> Unless we develop our own sovereign AI capability from the ground up, organisations will forever be looking over their shoulder, dogged by fear of ending up on the front page of the papers for all the wrong reasons.
They're starting out allocating $10MM AUD for copyright payments, when Anthropic has just paid $2.3BB AUD to settle their lawsuit. While I give them credit for realizing that $10MM is just the starting point, I don't understand how they can possibly build a competitive model while spending less than $100MM when others are spending 20x that amount just on copyright.
gpm · 3h ago
Anthropic paid $3000USD per work because they pirated works and US copyright law comes with statutory damages completely unrelated to the amount it would have cost to acquire the same thing legally.
The same thing, legally, according to the judge in that lawsuit would have been a purchased (potentially used) copy of the book scanned - i.e. what Anthropic also did after pirating works. It'd be surprising if that would cost even $30USD/work, two orders of magnitude less.
$10AUD million doesn't seem sufficient for a competitive set (and as you say they aren't saying it is), but if you told me $50AUD million was enough to build a legal (according to Judge Alsup's interpretation of US law) repository of training data I would not be surprised.
apparent · 2h ago
If they spend half of their budget on copyright, does that leave enough for hardware, energy, salary, etc.?
myhf · 36m ago
LLM training is not fair use. It would cost trillions to genuinely secure the rights to use any data set that could include any excerpts of copyrighted work.
The millions and billions you hear about in copyright "settlements" are just the amount it takes to bribe a local court, so $10MM is reasonable for Australia.
danielbln · 11m ago
I don't think it has been settled yet of training is fair use or not. Also, how is a settlement a bribed court? The other party has to accept a settlement, not the judge.
wtbdbrrr · 2h ago
Now ... of course.
And do they mention anything about how much of the work is going to be outsourced and where to? Or are they gonna import workers to do the job and send them back home when their local AI can replace most of the easy and tedious stuff? Or are they gonna use local models to do all that right away?
The site is loading ...
yahoozoo · 3h ago
Yes it is much cheaper when you just train off ChatGPT and Claude responses.
zerotolerance · 3h ago
Seems like a feature. This was always going to be the case. Just like how it was cheaper to train those models on billions of prior works than to have generated or paid to generate all those works in-house.
throwawayoldie · 2h ago
> Seems like a feature
If by "feature" you mean "pathway to model collapse" meaning "disappearing up one's own asshole" then yes. And the sooner the better.
daveguy · 3h ago
> "One test for potential investors will be their willingness to support Sovereign Australia AI’s decision to earmark $10 million of its future funding to compensate copyright owners for the data used to train its model. This includes working with news services under a paid model and buying books and music where needed."
> “We don’t want the adversarial relationship of most other AI builders around the world who chose not to take that proactive approach to copyright.”
> "Sovereign Australia AI said it would not scrape the pages of publishers who have added “robot.txt” files to their web pages. This is a line of code that tells bots not to scrape the information, but it is frequently ignored. The company will add a meta tag to every piece of data it acquires, recording where it came from and how it was sourced."
> To build its model, Sovereign Australia AI says it has placed Australia’s largest-ever order for sovereign AI capacity: 256 of the latest Nvidia Blackwell B200 GPUs which will be hosted inside one of NextDC’s Melbourne data centres
So... almost the exact opposite. Please read the article before commenting next time.
kadushka · 1h ago
256 of the latest Nvidia Blackwell B200 GPUs
Did they forget to add "k" to that number? OpenAI plans to have one million GPUs by the EOY.
(Note: The word "local" in the headline means "in Australia")
https://sovereign-au.ai/preserving-australias-digital-voice-...
> Unless we develop our own sovereign AI capability from the ground up, organisations will forever be looking over their shoulder, dogged by fear of ending up on the front page of the papers for all the wrong reasons.
> Michelle Ananda-Rajah; Senator for Victoria
https://www.afr.com/technology/our-ai-future-is-being-built-...
The grift that keeps giving
The same thing, legally, according to the judge in that lawsuit would have been a purchased (potentially used) copy of the book scanned - i.e. what Anthropic also did after pirating works. It'd be surprising if that would cost even $30USD/work, two orders of magnitude less.
$10AUD million doesn't seem sufficient for a competitive set (and as you say they aren't saying it is), but if you told me $50AUD million was enough to build a legal (according to Judge Alsup's interpretation of US law) repository of training data I would not be surprised.
The millions and billions you hear about in copyright "settlements" are just the amount it takes to bribe a local court, so $10MM is reasonable for Australia.
And do they mention anything about how much of the work is going to be outsourced and where to? Or are they gonna import workers to do the job and send them back home when their local AI can replace most of the easy and tedious stuff? Or are they gonna use local models to do all that right away?
The site is loading ...
If by "feature" you mean "pathway to model collapse" meaning "disappearing up one's own asshole" then yes. And the sooner the better.
> “We don’t want the adversarial relationship of most other AI builders around the world who chose not to take that proactive approach to copyright.”
> "Sovereign Australia AI said it would not scrape the pages of publishers who have added “robot.txt” files to their web pages. This is a line of code that tells bots not to scrape the information, but it is frequently ignored. The company will add a meta tag to every piece of data it acquires, recording where it came from and how it was sourced."
> To build its model, Sovereign Australia AI says it has placed Australia’s largest-ever order for sovereign AI capacity: 256 of the latest Nvidia Blackwell B200 GPUs which will be hosted inside one of NextDC’s Melbourne data centres
So... almost the exact opposite. Please read the article before commenting next time.
Did they forget to add "k" to that number? OpenAI plans to have one million GPUs by the EOY.