What a nice project. What inspired this initially?
FYI there's a broken link in your readme:
https://rumca-js.github.io/internet full internet search
didip · 35m ago
This is amazing. Thanks for sharing!
hobs · 55m ago
Cant you just request the ICANN’s zone files and have the canonical list of the day?
OJFord · 18m ago
'Google rival' is quite a stretch, surely 'search engine' is not just more accurate, but clearer too with all that Google does today, as if that's new.
luizfelberti · 2h ago
I was trying to do this in 2023! The hardest part about building a search engine is not the actual searching though, it is (like others here have pointed out), building your index and crawling the (extremely adversarial) internet, especially when you're running the thing from a single server in your own home without fancy rotating IPs.
I hope this guy succeeds and becomes another reference in the community like the marginalia dude. This makes me want to give my project another go...
moduspol · 1h ago
Is the common crawl usable for something like this?
Too bad it doesn't support android. It is much more energy efficient than anything else I can spare (for 100% uptime contribution)
ge96 · 2h ago
The IP thing is interesting, I was trying to make this CSGO bot one time to scrape steam's prices and there are proxy services out there you rent, tried at least one and it was blocked by steam. So I wonder if people buy real IPs.
kccqzy · 2h ago
Yeah people buy residential IPs on the black market. They are essentially infected home PCs and botnets.
The crawl seems hard but the difference between having something and not having it is is very obvious. Ordering the results is not. What should go on page 200 and do those results still count as having them?
ofrzeta · 2h ago
"The beefy CPU running this setup, a 32-core AMD EPYC 7532, underlines just how fast technology moves. At the time of its release in 2020, the processor alone would have cost more than $3,000. It can now be had on eBay for less than $200"
why do I never get deals like that when I am shopping for the homelab on eBay?
progval · 2h ago
You need to spend a lot of time looking through badly labeled offers, and be willing to buy from sellers with no reputation.
robrtsql · 1h ago
I searched "AMD EPYC 7532" and there are a ton of listings for $150-$200. Are you just regretful that it wasn't like this when you were shopping parts for your homelab?
throwawayffffas · 10m ago
I got a 7551p plus motherboard and ram for about 600 bucks from China this January. I may have overpaid but it works great, and gets the job done.
_fat_santa · 1h ago
Not for a CPU but earlier this year I bought a Thinkpad workstation off eBay for $500. It's a machine from 2020 and when it was new cost $5,700.
I see this for pretty much all hardware out on eBay, just go back 5 years and watch the price fall 10x.
saalweachter · 1h ago
Has eBay fixed their "and then they ship you a box of rocks" problem?
I feel like there was a five year span where everyone I talked to said buying or selling electronics on eBay was a nightmare, so I'm a little curious if I need to re-evaluate my priors.
throwawayffffas · 13m ago
You don't get that with used old stuff, you get it with unrealistic low prices for new stuff.
A 7532 CPU is now ewaste for all the datacenters out there 1/10 of original price is reasonable, but the latest Nvidia GPU for 200 bucks is obviously a scam.
apetresc · 1h ago
My understanding is that eBay sides with the buyer on all disputes, to the point of ridiculousness. So you should be fine.
The real issue is being a seller and solving the "and then the customer claims I shipped them a box of rocks" problem.
buildbot · 1h ago
Yep selling is way more risky. Ebay might be the most safe (refund wise) marketplace for buyers… I have more trouble with amazon.
buildbot · 1h ago
Yes, it’s extremely rare to be stuck with a broken/wrong/missing item as a buyer on eBay. Selling is quite risky in some ways because eBay will nearly always side with a buyer. Every missing or broken thing I have purchased has been refunded or replaced. On the other hand, 3 things I have sold were claimed to not arrive. The only case where eBay decided in my favor was when the buyer had signed for the package in a literal USPS office :)
accrual · 34m ago
> Has eBay fixed their "and then they ship you a box of rocks" problem?
I've personally never had that problem after over a decade and hundreds of purchases on eBay. I've had some defective parts, but never outright fraud. IME eBay favors buyers.
ThatMedicIsASpy · 1h ago
Epyc7000+MB+256GB-512GB RAM (from china) usually starts at 800 euros + import tax
cheema33 · 3h ago
I tried the search site at https://searcha.page/ by searching for something random and got the following message:
"An error has occurred building the search results."
authnopuz · 3h ago
hug of death? I fear the temperature will get very high in his laundry room
DannyBee · 3h ago
I'm sure it depends on how much laundry he is doing - his dryer is probably heated entirely by servers.
He can then exhaust the remaining server heat through the dryer vent stack.
debo_ · 2h ago
Keep going. I love dry humor.
No comments yet
ArekDymalski · 1h ago
Untill the exhaust starts "Feeling leaky" I guess.
It claims I reached the article limit. The last time I saw a fastcompany link must have been a decade ago! I was nostalgically looking forward to read another article of theirs. Alas...
> The secret to making it all happen? Large language models. “What I’m doing is actually very traditional search,” Pearce says. “It’s what Google did probably 20 years ago, except the only tweak is that I do use AI to do keyword expansion and assist with the context understanding
> Fellow ambitious hobbyist Wilson Lin, who on his personal blog <https://blog.wilsonl.in/search-engine/> recently described his efforts to create a search engine of his own, took the opposite approach from Pearce.
> And then there’s the concept of doing a small-site search, along the lines of the noncommercial search engine Marginalia <https://marginalia-search.com>, which favors small sites over Big Tech
And the obvious answer to the title: "Why the laundry room? Two reasons: Heat and noise." It runs on a a 32-core AMD EPYC 7532, half a terabyte of RAM, and "all in, cost $5,000, with about $3,000 of that going toward storage"
phendrenad2 · 35m ago
This is a cool project, and I hope he has fun with it.
I've daydreamed about how I'd create my own search engine so, so many times. But I always run into an impassable wall: The internet now isn't at all the same as the internet in 1999.
Discovery isn't really that useful. If you find someone's self-hosted blog about dinosaurs, it probably hasn't been updated since 2004, all the links and images are broken, and it's just thoroughly upstaged by Wikipedia and the Smithsonian. Sure, it's fun to find these quirky sites, but they aren't as valuable as they once were.
We've basically come full circle to the AOL model, where there are "hubs" of content that cater to specific categories. YouTube has ALL the long-form essays. Tiktok has ALL the humorous videos. Medium has ALL the opinion pieces. Reddit has ALL the flame wars. Mayo Clinic has ALL the drug side-effects. Amazon has ALL the shopping. Ebay has ALL the collectables.
None of these big companies want nasty little web crawlers poking and prodding their site. But they accept Google crawlers, because Google brings them users. Are they going to be that friendly to your crawler?
Of course, I still dream. Maybe a hub-based internet needs a hub-aware search engine?
Were you trying them via Chrome, by any chance? ;)
jslakro · 3h ago
firefox here and it's not working
tolerance · 1h ago
The great thing about this is that with the decentralization/recentralization of the Web, it may become easier for certain people to roll their own search engines for their respective communities and crawl/index pages only according to their shared tastes.
Nothing new as it has been done before, the concept is simple enough:
step 1: indexer, solr/lucene
Step 2: crawler of which there are several foss, build one yourself?
or you just run yacy which is a combo of the above, hook combine with an oldschool searx instance and you will be granted the title as seeker by the spirit of Fravia+ who was elder of the searchlores!!! Not only will you filter crap made by machine learning models, but thou shall find what thou seek! I refuse to call a 16 line long for loop triggering in memory loaded tokenized data where data can be anything from a scientific paper hallucinated by a chatbot to a message between two lovers anything intelligent for it is not intelligence but a blob of tokenized fcking data in memory getting triggered for an output by a derp with a 16 line long for loop!!!
vlucas · 2h ago
> “I think it’s definitely lowered the barrier,” Lin says of the LLM’s role in enabling DIY search engines. “To me, it seems like the only barrier to actually competing with Google, creating an alternate search engine, is not so much the technology, it’s mostly the market forces.”
Oh sweet summer child
iam_saurabh · 2h ago
I love stories like this—tech history is full of scrappy beginnings. Even if this project doesn’t succeed, it reminds us that giant companies aren’t unshakable.
When I started using it (~ 2 years) , it was necessary. Google was simply not solving any of my actual issues (software related).
Now, It seems that google might have improved a bit. I check from time to time and the gap isn't as huge, as when Kagi started
shayway · 2h ago
How does your experience with Searcha compare? It seems to be down at the moment.
the_third_wave · 2h ago
Do Kagi users get paid for shilling the company? Nearly all threads relating to the subject of search has a few mentionings of the glory of Kagi, often including links to the site. I suspect this is not as effective as the Kagi crew thinks since there is likely to be a large overlap between their potential customers and those who are really turned off by such shilling.
dawnerd · 2h ago
Flip side how much does Google pay you to defend their monopoly? Kagi is a solid product with a team that clearly cares about what they’re building. They’re transparent and post change logs when things update. I simply trust them infinitely more than Google.
datadrivenangel · 2h ago
Kagi customer here. Not getting paid to shill. I think it's worth occasionally mentioning alternatives that are good enough to pay for so that other people know there are other people using other options.
But full disclosure, sometimes I'm using DuckDuckGo and it's also good enough most of the time that I occasionally forget until I go down some rabbit hole and realize that I'm using the wrong search engine.
hamdingers · 2h ago
Have you considered it's a good product that causes its users to become advocates?
TIL about effort justification! I think signing up for Kagi is not particularly effort-intensive however.
tolerance · 1h ago
> The effect is most likely to occur when there are no obvious reasons for performing the task. Because expending effort to perform a useless or unenjoyable task, or experiencing unpleasant consequences in doing so, is cognitively inconsistent (see cognitive dissonance), people are assumed to shift their evaluations of the task in a positive direction to restore consistency.
I just don’t understand people who get so upset that someone might like something enough to talk about liking it. So upset that they won’t ever try the thing. Like … ok I guess? You do you. It’s just a strange way to make decisions.
At least this is just a consumer product. Worse is when people here say they make technical decisions using the same process. They’d black list certain tech because they’ve heard people talking about how it solved their problems. Also ok, but now I know I should avoid them professionally.
mdaniel · 2h ago
I get the impression it's the volume of the folks who sing its praises. There was a web3 crowd for a while, Bitwarden champions would show up to any mention of a password manager, and (ahem) some AI champions can be over the top
In all of these cases, a reasonable counterpoint is that if it were that applicable for all audiences, one wouldn't need to sing its praises, it would sing its own praises
ufmace · 1h ago
It sings its own praises... how exactly? Maybe by a bunch of happy users talking about how they like it and it's a better solution to the problem that the thread or article is about without being explicitly paid? Which is exactly what's happening here and some people are complaining about it?
testdelacc1 · 1h ago
How does a password manager sing its own praises?
koakuma-chan · 2h ago
I tried it, it's slow and bad and free tier is only 100 requests, and it's too expensive, and price is unjustified. I use gemini with google search grounding.
alexjplant · 2h ago
I understand skepticism in the age of LLM-generated content and CAPTCHA-solving bots. What I don't understand is why people choose such weird hills to die on and think that posting about it will accomplish anything. Do you think people will read your comment and go "gee, I was going to use Kagi but now I won't because this random person has a bad feeling about a series of comments they remember seeing"?
I signed up for a specialist forum not too long ago and posted an honest review of a product because I hadn't been able to find one anywhere on the internet. Immediately a bunch of people accused me of being a "shill" for a direct-to-consumer business that's been powered by a Yahoo storefront for the last 20 years, as though a business that's run by a guy with an AOL e-mail address is sophisticated enough to figure out Fiverr and astroturf their reputation on a phpBB forum.
Think about it for just a moment - do you really think that the Hacker News audience is large enough or full of enough tastemakers to sway an alternative search engine's market share? It isn't. If Kagi wanted to do that they'd hire TikTok influencers.
throwaway290 · 1h ago
no one else would pay for search. people on HN is probably 90% of their total possible market.
lelandbatey · 2h ago
Nope, it's just a nice thing I like. It is nearly the platonic ideal of a search engine for me. It causes me no problems and doesn't try to sell me garbage.
It's like discovering that there a better pair of shoes that're more comfortable. Everybody can use a slightly improved more comfortable pair of shoes, so it comes up frequently.
tmdetect · 2h ago
Kagi is a polished product. This is drying someones laundry.
Google was invented many years ago by two guys in a dorm room and since then there's been so many white papers and advancements in the public sphere and the actual underlying problem has not changed that much, that it seems like it could be done by a small group or independent person.
dec0dedab0de · 2h ago
Crawling is much more difficult than it used to be. Significantly more content is behind a login, Javascript is required for way more than it should be, and almost the entire web is behind cloudflare or another type of captcha.
non_aligned · 2h ago
I think there are two factors that helped Google. First, the search engine landscape back then was absolutely abysmal. I'm sure someone will chime in saying that it's abysmal today as well, but the reality is that 99%+ of consumer searches get good results today. And that's simply because the nature of search has changed: we have billions of people using the internet, and they overwhelmingly just search for products to buy, local restaurants that offer takeout, or for familiar pop content to watch or listen to. And there's some SEO spam there, but also pretty fierce quality assurance by search engines.
Second, the internet was different: when all nerds declared that Google is good, that was CNN-grade newsworthy (and CNN used to matter a lot more back then), simply because the internet seemed kinda important, but there was no other authority on the topic. Today, that's not the case. If you need someone to opine on the internet on air, you invite some political pundit or a business analyst.
So no, I don't think you can repeat the success of Google the same way. It was a product of its time.
snek_case · 32m ago
Google maps is probably a big moat that's very hard to replicate. You can't as easily just crawl all of that data. It's not easy to generate directions. The average user doesn't want to use your search engine for one thing and Google for everything else, they just want a one stop shop for search.
jrm4 · 2h ago
More to the point, it's a shame that we can't collectively grok (dammit, they took that from us too) concepts like "personal" and/or "curated" directories, e.g. individual and group wikis and so forth on perhaps more directed topics with lists of good links.
cosmicgadget · 54m ago
Other than the obvious (but surmountable) technical challenges with crawling and indexing, trying to establish "goodness" for a given user is tough. For a blogger it will be "hey, you are reading this so you probably like what I like". That's often true but as soon as you try to have a centralized service with arbitrary users, it is hard to do anything better than filtering purely commercial content.
sdf4j · 2h ago
what you mean we can't? there are a lot of curated content directories out there.
jrm4 · 1h ago
Right, I suppose I mean "getting more people to think about why a few of these bookmarked for your favorite topics, especially tied to a trustworthy person, is a million times better than just hitting up Google."
Or, perhaps, a "a better Google should just take you to these."
Something like that.
ambicapter · 2h ago
Google basically invented the modern cloud in order to efficiently use the hardware necessary to actually build those search engine indices. It's not really a question of implementing a good algorithm and away we go.
That's what I was expecting this submission to be about, although to be honest I'm not certain that Marginalia would want the influx of a fastcompany sized tire kicking
CalRobert · 2h ago
Among other things, I think crawling is a lot harder now.
lif · 2h ago
Provided they have the kind of massive government support Google has had from the get-go, sure!
OutOfHere · 3h ago
The actual underlying problem has changed altogether. Pagerank is easily gamed by SEO.
Search candidates and rankings now require assessment by LLM. Moreover, as a default, users want the results intelligently synthesized into a text response with references rather than as raw results.
Crawling too requires innovative approaches to bypass server filters.
I doubt any independent person can afford to run a vector database or LLMs at immense scale.
freeopinion · 14m ago
> users want the results intelligently synthesized into a text response with references rather than as raw results
This leads directly to another big change.
People used to submit their sites to search engines and now they might actively block search engines. So a search engine author might have to spend a lot of effort in adversarial games.
kcbanner · 2h ago
> users want the results intelligently synthesized into a text response with references rather than as raw results.
The reason I pay for Kagi is that I specifically don't want this to occur.
OutOfHere · 2h ago
If you pay for a service (web search) that 99.9% use for free, you're an extreme outlier, and not necessarily a justifiable one either. After all, DDG, Google and various others still have raw results for free.
Workaccount2 · 2h ago
How much do you technologically relate to the average person on the street though?
Every person I have seen (outside the tiny tech bubble) google something has just read the AI overview without skipping a beat.
yepitwas · 2h ago
That's worrisome since I've seen those be for-sure wrong a pretty high percentage of the time.
[EDIT] Incidentally, are there any sites that do actual web search any more, better than Yandex? I'd rather avoid a Russian site if I can, but there are whole topics where it's impossible to find anything useful on heavily "massaged" allegedly-Web-search-but-not-really sites like Google and DDG (Bing), but I can find what I want on page 1 or 2 of a Yandex search. Is Kagi as good as that, or is their index simply ignoring a whole bunch of the Web like so many others? I don't mind paying.
degamad · 1h ago
Google "Web" results (not the default results you get when you search) still seem okay for me. You can force them with the udm=14 url trick, or select the "Web" tab in the results. No AI, no images or shopping results, and slightly better text results.
franktankbank · 2h ago
Yep, same here. Ask it "should I wash venison tenderloin" and you get an initial "No, because" followed by a generally "yes its important to clean including with water" in the longer description. Wow a self contradictory answer! Good job!
jkestner · 2h ago
We’re being force fed them. I’m an AI hater and I catch myself reading those sometimes.
Yes, people want the answer directly. Google wants you to stay on their site to read some mishmash. I think the ideal would be to immediately go to the source’s site.
throwmeaway222 · 2h ago
At this point the web is also so centralized you only need 3 bookmarks these days (your news, youtube and Amazon)
A search is just learning what you don't know and AI does a better job than search has ever done for me - and I'm in tech.
ricardo81 · 2h ago
>Pagerank
Also a lot of site owners are reluctant to link out. So much so that 'nofollow' had been reduced to a hint rather than a directive.
iamacyborg · 2h ago
> Moreover, as a default, users want the results intelligently synthesized into a text response with references rather than as raw results.
Citation needed
OutOfHere · 2h ago
You mean all the users of chat services aren't evidence? Chat services increasingly incorporate web links for references in their responses, and this is as the users seek. The tide continues to shift from traditional search to LLM synthesis.
iamacyborg · 2h ago
I suspect there are more users of traditional search than there are of llm chat apps.
HardCodedBias · 1h ago
I know that Google engineers have a cushy life but I actually find it unlikely that a guy, who isn't attempting some radical new type of search (like pagerank back in the day) can hope to compete with the orgs in Google who support search.
Again, those orgs are likely too comfortable and less productive than people would like, but we're talking about many-many thousands and depending upon how you define "the work" of search upwards of 10k.
I didn't see any new secret sauce in the article and Google is has said that since 2015 (?) Google Brain has been involved in search.
This is not to say that Google couldn't be dislodged by search via LLM or similar, that is "new" research.
p3rls · 1h ago
i've been thinking that google could use its own AI to evaluate URLs instead of relying on pagerank and backlinks which are almost completely valueless as a signal in 2025. in my niche there's more slop than ever being produced daily and it's all hitting rank 1. it's tragic what google is doing to the internet.
Oarch · 2h ago
I'm sure there's a money laundering joke in here somewhere
I have 1542766 domains. Might not be much, but it is an honest work.
It is available as a github repo, so anybody that wants to start crawling has some initial data to kick off.
Links
https://github.com/rumca-js/Internet-Places-Database
FYI there's a broken link in your readme:
I hope this guy succeeds and becomes another reference in the community like the marginalia dude. This makes me want to give my project another go...
https://commoncrawl.org
While the index is currently not open source, it should be at some point. Maybe when they get out of the beta stage (?) details are yet unclear.
https://www.proxyrack.com/residential-proxies/
why do I never get deals like that when I am shopping for the homelab on eBay?
I see this for pretty much all hardware out on eBay, just go back 5 years and watch the price fall 10x.
I feel like there was a five year span where everyone I talked to said buying or selling electronics on eBay was a nightmare, so I'm a little curious if I need to re-evaluate my priors.
A 7532 CPU is now ewaste for all the datacenters out there 1/10 of original price is reasonable, but the latest Nvidia GPU for 200 bucks is obviously a scam.
The real issue is being a seller and solving the "and then the customer claims I shipped them a box of rocks" problem.
I've personally never had that problem after over a decade and hundreds of purchases on eBay. I've had some defective parts, but never outright fraud. IME eBay favors buyers.
"An error has occurred building the search results."
He can then exhaust the remaining server heat through the dryer vent stack.
No comments yet
https://archive.is/HA7y4
Some bits and pieces:
> his new search engine, the robust Search-a-Page <https://searcha.page>, which has a privacy-focused variant called Seek Ninja <https://seek.ninja>
> The secret to making it all happen? Large language models. “What I’m doing is actually very traditional search,” Pearce says. “It’s what Google did probably 20 years ago, except the only tweak is that I do use AI to do keyword expansion and assist with the context understanding
> Fellow ambitious hobbyist Wilson Lin, who on his personal blog <https://blog.wilsonl.in/search-engine/> recently described his efforts to create a search engine of his own, took the opposite approach from Pearce.
> And then there’s the concept of doing a small-site search, along the lines of the noncommercial search engine Marginalia <https://marginalia-search.com>, which favors small sites over Big Tech
And the obvious answer to the title: "Why the laundry room? Two reasons: Heat and noise." It runs on a a 32-core AMD EPYC 7532, half a terabyte of RAM, and "all in, cost $5,000, with about $3,000 of that going toward storage"
I've daydreamed about how I'd create my own search engine so, so many times. But I always run into an impassable wall: The internet now isn't at all the same as the internet in 1999.
Discovery isn't really that useful. If you find someone's self-hosted blog about dinosaurs, it probably hasn't been updated since 2004, all the links and images are broken, and it's just thoroughly upstaged by Wikipedia and the Smithsonian. Sure, it's fun to find these quirky sites, but they aren't as valuable as they once were.
We've basically come full circle to the AOL model, where there are "hubs" of content that cater to specific categories. YouTube has ALL the long-form essays. Tiktok has ALL the humorous videos. Medium has ALL the opinion pieces. Reddit has ALL the flame wars. Mayo Clinic has ALL the drug side-effects. Amazon has ALL the shopping. Ebay has ALL the collectables.
None of these big companies want nasty little web crawlers poking and prodding their site. But they accept Google crawlers, because Google brings them users. Are they going to be that friendly to your crawler?
Of course, I still dream. Maybe a hub-based internet needs a hub-aware search engine?
- SearchaPage - Web Search Engine https://searcha.page/
- Seek Ninja - Stealthy Search Engine https://seek.ninja/
Both of them are erroring out right now?
The bad thing about this is...read above.
Oh sweet summer child
When I started using it (~ 2 years) , it was necessary. Google was simply not solving any of my actual issues (software related).
Now, It seems that google might have improved a bit. I check from time to time and the gap isn't as huge, as when Kagi started
But full disclosure, sometimes I'm using DuckDuckGo and it's also good enough most of the time that I occasionally forget until I go down some rabbit hole and realize that I'm using the wrong search engine.
[1] https://en.wikipedia.org/wiki/Effort_justification
I’m not following you.
https://dictionary.apa.org/effort-justification
I just don’t understand people who get so upset that someone might like something enough to talk about liking it. So upset that they won’t ever try the thing. Like … ok I guess? You do you. It’s just a strange way to make decisions.
At least this is just a consumer product. Worse is when people here say they make technical decisions using the same process. They’d black list certain tech because they’ve heard people talking about how it solved their problems. Also ok, but now I know I should avoid them professionally.
In all of these cases, a reasonable counterpoint is that if it were that applicable for all audiences, one wouldn't need to sing its praises, it would sing its own praises
I signed up for a specialist forum not too long ago and posted an honest review of a product because I hadn't been able to find one anywhere on the internet. Immediately a bunch of people accused me of being a "shill" for a direct-to-consumer business that's been powered by a Yahoo storefront for the last 20 years, as though a business that's run by a guy with an AOL e-mail address is sophisticated enough to figure out Fiverr and astroturf their reputation on a phpBB forum.
Think about it for just a moment - do you really think that the Hacker News audience is large enough or full of enough tastemakers to sway an alternative search engine's market share? It isn't. If Kagi wanted to do that they'd hire TikTok influencers.
It's like discovering that there a better pair of shoes that're more comfortable. Everybody can use a slightly improved more comfortable pair of shoes, so it comes up frequently.
Google was invented many years ago by two guys in a dorm room and since then there's been so many white papers and advancements in the public sphere and the actual underlying problem has not changed that much, that it seems like it could be done by a small group or independent person.
Second, the internet was different: when all nerds declared that Google is good, that was CNN-grade newsworthy (and CNN used to matter a lot more back then), simply because the internet seemed kinda important, but there was no other authority on the topic. Today, that's not the case. If you need someone to opine on the internet on air, you invite some political pundit or a business analyst.
So no, I don't think you can repeat the success of Google the same way. It was a product of its time.
Or, perhaps, a "a better Google should just take you to these."
Something like that.
Search candidates and rankings now require assessment by LLM. Moreover, as a default, users want the results intelligently synthesized into a text response with references rather than as raw results.
Crawling too requires innovative approaches to bypass server filters.
I doubt any independent person can afford to run a vector database or LLMs at immense scale.
This leads directly to another big change.
People used to submit their sites to search engines and now they might actively block search engines. So a search engine author might have to spend a lot of effort in adversarial games.
The reason I pay for Kagi is that I specifically don't want this to occur.
Every person I have seen (outside the tiny tech bubble) google something has just read the AI overview without skipping a beat.
[EDIT] Incidentally, are there any sites that do actual web search any more, better than Yandex? I'd rather avoid a Russian site if I can, but there are whole topics where it's impossible to find anything useful on heavily "massaged" allegedly-Web-search-but-not-really sites like Google and DDG (Bing), but I can find what I want on page 1 or 2 of a Yandex search. Is Kagi as good as that, or is their index simply ignoring a whole bunch of the Web like so many others? I don't mind paying.
Yes, people want the answer directly. Google wants you to stay on their site to read some mishmash. I think the ideal would be to immediately go to the source’s site.
A search is just learning what you don't know and AI does a better job than search has ever done for me - and I'm in tech.
Also a lot of site owners are reluctant to link out. So much so that 'nofollow' had been reduced to a hint rather than a directive.
Citation needed
Again, those orgs are likely too comfortable and less productive than people would like, but we're talking about many-many thousands and depending upon how you define "the work" of search upwards of 10k.
I didn't see any new secret sauce in the article and Google is has said that since 2015 (?) Google Brain has been involved in search.
This is not to say that Google couldn't be dislodged by search via LLM or similar, that is "new" research.