Not the 10000, but I admin'd a 4500 back in 1999 at Bristol-Myers Squibb at the ripe old age of 21. It was running Sun's mail server, which required constant care and feeding to even remotely reliably serve our 30,000+ users.
One time it just stopped responding, and my boss said "now, pay attention" and body-checked the machine as hard as he could.
It immediately started pinging again, and he refused to say anything else about it.
defaultcompany · 3h ago
This reminds me of the “drop fix” for the sparc station where people would pick up the box and drop it to reseat the PROMs.
badc0ffee · 3m ago
Apparently you also had to do this with the Apple ///.
linsomniac · 2h ago
Amiga had a similar issue. One of the chips (fat Agnes IIRC?) didn't quite fit in the socket correctly, and a common fix was to pull out the drive mechanisms and drop the chassis something like a foot onto a carpeted floor.
Somewhat related, one morning I was in the office early and an accounting person came in and asked me for help, her computer wouldn't turn on and I was the only other one in the office. I went over, poked the power button and nothing happened. This was on a PC clone. She has a picture of her daughter on top of the computer, so I picked it up, gave the computer a good solid whack on the side, sat the picture down and poked the power button and it came to life.
We call this: Percussive Engineering
bionsystem · 3h ago
I can't wait for the mandatory "brendan gregg screams at disks" youtube link.
Ah, percussive maintenance! Also good for reseating disks that just don’t quite reliably get enumerated, slam the thing back in. I had to do something similar on a power supply for a V440, thankfully it was a month or so away from retirement, I didn’t feel too bad giving it some encouragement like that. Great machines.
eugenekay · 2h ago
Throughout the late 90s, “Mail.com” provided white-label SMTP services for a lot of businesses, and was one of the early major “free email” providers. Each Free user had a storage limit of something like 10MB, which is plenty in an era before HTML email and attachments were commonplace. There were racks upon racks of SCSI disks from various vendors for the backend - but the front end was all standard Sendmail, running on Solaris servers.
I worked at a competing white-label email provider in the 90s and even then it seemed obvious that running SMTP on a Sun Enterprise was a mistake. You're not gaining anything from its multiuser single-system scalability. I guess it stands as an early example of pets/cattle debate. My company was firmly on the cattle side.
eugenekay · 1h ago
I was just the Teenage intern responsible for doing the PDU Cabling every time a new rack was added, since nobody on the Network or Software Engineering teams could fit into the crawl spaces without disassembling the entire raised-floor.
I do know that scale-out and scale-up were used for different parts of the stack. The web services were all handled by standard x86 machines running Linux - and were all netbooted in some early orchestration magic, until the day the netboot server died. I think the rationale for the large Sun systems was the amount of Memory that they could hold - so the user name and spammer databases could be held in-memory on each front end, allowing for a quick ACCEPT or DENY on each incoming message - before saving it out to a mailbox via NFS.
neilv · 1h ago
> They were also joined with several engineers in Beaverton, Oregon through these mergers.
They might mean from Floating Point Systems (FPS):
> In December 1991, Cray purchased some of the assets of Floating Point Systems, another minisuper vendor that had moved into the file server market with its SPARC-based Model 500 line.[15] These symmetric multiprocessing machines scaled up to 64 processors and ran a modified version of the Solaris operating system from Sun Microsystems. Cray set up Cray Research Superservers, Inc. (later the Cray Business Systems Division) to sell this system as the Cray S-MP, later replacing it with the Cray CS6400. In spite of these machines being some of the most powerful available when applied to appropriate workloads, Cray was never very successful in this market, possibly due to it being so foreign to its existing market niche.
Some other candidates for server and HPC expertise there (just outside of Portland proper):
(I was very lucky to have mentors and teachers from those places and others in the Silicon Forest, and also got to use the S-MP.)
trollied · 3h ago
I used to love working with E10k/E15k boxes. I was a performance engineer for a telco software provider, and it was so much fun squeezing every single thing out of the big iron.
It’s a bit sad that nobody gives a shit about performance any more. They just provision more cloud hardware. I saved telcos millions upon millions in my early career. I’d jump straight into it again if a job came up, so much fun.
amiga386 · 46m ago
I used to work for a telco equipment provider around the time everyone was replacing PDH with SONET. Telcos were gagging to buy our stuff, the main reason being basic hardware advances.
Telephone Exchanges / Central Offices have to be in the centre of the lines they serve, meaning some very expensive real estate, and datacenter-level HVAC in the middle of cities is very, very expensive.
They loved nothing more than to replace old 1980s switches with ones that took up a quarter to a tenth of the floorspace, used less than half the electricity, and had fabrics that could switch fibre optics directly.
kstrauser · 2h ago
My experience was a bit different. I first saw a Starfire when we were deploying a bunch of Linux servers in the DC. The Sun machine was brilliant, fast, enormous, and far more expensive per unit of work than these little x86 boxes we were carting in.
The Starfire started at around $800K. Our Linux servers started at around $1K. The Sun box was not 800x faster at anything than a single x86 box.
It was an impressive example of what I considered the wrong road. I think history backs me on this one.
> It’s a bit sad that nobody gives a shit about performance any more.
Everyone gives a shit about performance at some point, but the answer is horizontal scaling. You can’t vertically scale a single machine to run a FAANG. At a certain vertical scale, it starts to look a helluva lot like horizontal scaling (“how many CPUs for this container? How many drives?”), except in a single box with finite and small limits.
axiolite · 13m ago
> The Sun box was not 800x faster at anything than a single x86 box.
You don't buy enterprise gear because it's economical for bulk number-crunching... You buy enterprise gear when you have a critical SPOF application (typically the database) that has to be super-reliable, or that requires greater resources than you can get in commodity boxes.
RAS is an expensive proposition. Commodity servers often don't have it, or have much less of it than enterprise gear. Proprietary Unix systems offered RAS as a major selling point. IBM mainframes still have a strong market today.
It wasn't until the late 2000's when x86 went to 64-bit, so if your application wanted to gobble more than 2GB/4GB of RAM, you had to go with something proprietary.
It was even more recently that the world collectively put a huge amount of effort in, and figured out how to parallelize a large amount of number-crunching problems that were previously limited to single-threaded.
There have been many situations like these through the history of computing... Going commodity is always cheaper, but if you have needs commodity systems don't meet, you pay the premium for proprietary systems that do.
trollied · 1h ago
I don’t disagree. But most also don’t give a shit and then scale horizontally endlessly, and spend too much money, to deal with their crappy code.
As a dev it isn’t your problem if the company you work for just happily provisions and sucks it up.
kstrauser · 1h ago
That’s a thing, to be sure. The calculus gets a little complicated when that developer’s pay is far more than the EC2 bill. There’s a spectrum with a small shop wasting $1000 a year hosting inefficient code, and Google-scale where SRE teams would love to put “saved .3% on our cloud bill!” on their annual review.
rjsw · 1h ago
> ... to deal with their crappy code
written in an interpreted language.
znpy · 34m ago
> Everyone gives a shit about performance at some point, but the answer is horizontal scaling. You can’t vertically scale a single machine to run a FAANG.
You might be surprised about how many companies think they're FAANG (but aren't) though.
mlyle · 2h ago
> It’s a bit sad that nobody gives a shit about performance any more. They just provision more cloud hardware.
It's hard to get as excited about performance when the typical family sedan has >250HP. Or when a Raspberry Pi 5 can outrun a maxxed-E10k on almost everything.
...(yah, less RAM, but you need fewer client connections when you can get rid of them quickly enough).
lokar · 1h ago
In the end that approach to very high scale and reliability was a dead end. It’s much better and cheaper to solve these problems in software using cheap computers and fast networks.
chasil · 40s ago
If you have applications that run (and rely) on z/OS, this kind a machine makes sense.
The e10k didn't have applications like that. Just about everything you could do on it could be made to work on commodity x86 with Linux (after some years, for 64-bit).
trollied · 1h ago
Less cheap computers is still a thing. Entirely missing the point.
lokar · 1h ago
A lot of the examples here are things like running a large email service. Doing that with this kind of hardware makes no sense.
Henchman21 · 1h ago
It might make no sense today, but it made loads of sense back then. One cannot apply modern circumstances backwards in time.
nocoiner · 3h ago
To this day, “Sun E10000 Starfire” is basically synonymous in my head with “top-of-the-line, bad-ass computer system.” What a damn cool name. It made a big impression on an impressionable youth, I guess!
beng-nl · 2h ago
I agree on all counts, but the installation I had at my job at the time regularly needed repairs..! Hopefully this was an exceptional case, but it gave me the impression of “redundancy added too much complexity to make the whole reliable.”
ETA: particularly because the redundancy was supposed to make it super reliable
somat · 1h ago
I worry about this sometimes, there is this long tail of "reliability" you can chase, redundant systems, processes, voting, failover, "shoot the other node in the head scripts" etc. But everything adds complexity, now it has more moving parts, more things that can go wrong on weird ways. I wonder if the system would be more reliable if it were a lot simpler and stupid, a single box that can be rebooted if needed.
It reminds me of the lesson of the Apollo computers, The AGC was to more famous computer, probably rightfully so, but there were actually two computers, The other was the LVDC, made by IBM for controlling the Saturn V during launch, now it was a proper aerospace computer, redundant everything, a can not fail architecture, etc. In contrast the AGC was a toy, However this let the AGC be much faster and smaller, instead of reliability they made it reboot well, and instead of automatic redundancy they just put two of them.
There is something to be learned here, I am not exactly sure what is is, worse is better?
jeffbee · 2h ago
No, I think that was typical. Nostalgia tends to gloss over the reality of how dodgy the old unix systems were. The Sun guy had to show up at my site with system boards for the SPARCcenter pretty regularly.
JSR_FDED · 2h ago
This was one of the all time biggest strategic mistakes SGI made - for a mere $50 million they enabled their largest competitor to rack up huge wins against them almost overnight. A friend at SUN at the time was telling me how much glee they took in sticking it to SGI with its own machines.
cf100clunk · 44m ago
> one of the all time biggest strategic mistakes SGI made
SGI in the Ewald years tripped itself up, then in the Rick Belluzzo years made a cavalcade of avoidable mistakes.
jasongill · 3h ago
This is one of my dream machines to own. The Sun E10k was like the Gibson, it was so mythically powerful. It was a Cray inside of your own server closet, and being able to be the admin of an E10k and have root on a machine with so much power was a real status symbol at the time.
tverbeure · 2h ago
I worked for a company that bought one of these. It was delivered, lifted through the window of the server room with a crane and worked fine.
A few days later, our admin noticed over the weekend that he couldn’t remote log in. He checked it out and… the machine was gone. Stolen.
Somebody within Sun must have tipped off where these things were delivered and rented a crane to undeliver them.
cf100clunk · 30m ago
Hmmm... I wondered why the official E10K demo machine in the lobby of Sun's HQ back then had been enclosed in glass. It also might very well have just been a mockup, I suppose.
pavlov · 1h ago
Isn’t it more likely it was someone within the company you worked for?
They would have access to site-specific info like how easy it is to get access to that server room to open the windows.
The old saying is “opportunity makes the thief.” Somebody at Sun has much less visibility into the opportunity.
bobmcnamara · 3h ago
Cray-cyber.org used to have free shell accounts on one in Germany.
hpcjoe · 59m ago
I recall that while I was at SGI. Many of us within SGI were strongly against the move to sell this off to Sun. We blamed Bo Ewald for the disaster to SGI that this was, the lack of strategic vision on his part. We also blamed the idiots in SGI management for thinking that only MIPS and Irix would be what we would be delivering.
Years later, Ewald and others had a hand in destroying the Beast and Alien CPUs in favor of the good ship Itanic (for reasons).
IMO, Ewald went from company to company, leaving behind a strategic ruin or failure. Cray to SGI to Linux Networx to ...
One time it just stopped responding, and my boss said "now, pay attention" and body-checked the machine as hard as he could.
It immediately started pinging again, and he refused to say anything else about it.
Somewhat related, one morning I was in the office early and an accounting person came in and asked me for help, her computer wouldn't turn on and I was the only other one in the office. I went over, poked the power button and nothing happened. This was on a PC clone. She has a picture of her daughter on top of the computer, so I picked it up, gave the computer a good solid whack on the side, sat the picture down and poked the power button and it came to life.
We call this: Percussive Engineering
https://www.youtube.com/watch?v=tDacjrSCeq4
(btw it's titled "Shouting in the Datacenter")
Anyway, here’s the front end SMTP servers in 1999, then in-service at 25 Broadway, NYC. I am not sure exactly which model these were, but they were BIG Iron! https://kashpureff.org/album/1999/1999-08-07/M0000002.jpg
I do know that scale-out and scale-up were used for different parts of the stack. The web services were all handled by standard x86 machines running Linux - and were all netbooted in some early orchestration magic, until the day the netboot server died. I think the rationale for the large Sun systems was the amount of Memory that they could hold - so the user name and spammer databases could be held in-memory on each front end, allowing for a quick ACCEPT or DENY on each incoming message - before saving it out to a mailbox via NFS.
They might mean from Floating Point Systems (FPS):
https://en.wikipedia.org/wiki/Cray#Cray_Research_Inc._and_Cr...
> In December 1991, Cray purchased some of the assets of Floating Point Systems, another minisuper vendor that had moved into the file server market with its SPARC-based Model 500 line.[15] These symmetric multiprocessing machines scaled up to 64 processors and ran a modified version of the Solaris operating system from Sun Microsystems. Cray set up Cray Research Superservers, Inc. (later the Cray Business Systems Division) to sell this system as the Cray S-MP, later replacing it with the Cray CS6400. In spite of these machines being some of the most powerful available when applied to appropriate workloads, Cray was never very successful in this market, possibly due to it being so foreign to its existing market niche.
Some other candidates for server and HPC expertise there (just outside of Portland proper):
https://en.wikipedia.org/wiki/Sequent_Computer_Systems
https://en.wikipedia.org/wiki/Intel#Supercomputers
(I was very lucky to have mentors and teachers from those places and others in the Silicon Forest, and also got to use the S-MP.)
It’s a bit sad that nobody gives a shit about performance any more. They just provision more cloud hardware. I saved telcos millions upon millions in my early career. I’d jump straight into it again if a job came up, so much fun.
Telephone Exchanges / Central Offices have to be in the centre of the lines they serve, meaning some very expensive real estate, and datacenter-level HVAC in the middle of cities is very, very expensive.
They loved nothing more than to replace old 1980s switches with ones that took up a quarter to a tenth of the floorspace, used less than half the electricity, and had fabrics that could switch fibre optics directly.
The Starfire started at around $800K. Our Linux servers started at around $1K. The Sun box was not 800x faster at anything than a single x86 box.
It was an impressive example of what I considered the wrong road. I think history backs me on this one.
> It’s a bit sad that nobody gives a shit about performance any more.
Everyone gives a shit about performance at some point, but the answer is horizontal scaling. You can’t vertically scale a single machine to run a FAANG. At a certain vertical scale, it starts to look a helluva lot like horizontal scaling (“how many CPUs for this container? How many drives?”), except in a single box with finite and small limits.
You don't buy enterprise gear because it's economical for bulk number-crunching... You buy enterprise gear when you have a critical SPOF application (typically the database) that has to be super-reliable, or that requires greater resources than you can get in commodity boxes.
RAS is an expensive proposition. Commodity servers often don't have it, or have much less of it than enterprise gear. Proprietary Unix systems offered RAS as a major selling point. IBM mainframes still have a strong market today.
It wasn't until the late 2000's when x86 went to 64-bit, so if your application wanted to gobble more than 2GB/4GB of RAM, you had to go with something proprietary.
It was even more recently that the world collectively put a huge amount of effort in, and figured out how to parallelize a large amount of number-crunching problems that were previously limited to single-threaded.
There have been many situations like these through the history of computing... Going commodity is always cheaper, but if you have needs commodity systems don't meet, you pay the premium for proprietary systems that do.
As a dev it isn’t your problem if the company you work for just happily provisions and sucks it up.
written in an interpreted language.
You might be surprised about how many companies think they're FAANG (but aren't) though.
It's hard to get as excited about performance when the typical family sedan has >250HP. Or when a Raspberry Pi 5 can outrun a maxxed-E10k on almost everything.
...(yah, less RAM, but you need fewer client connections when you can get rid of them quickly enough).
The e10k didn't have applications like that. Just about everything you could do on it could be made to work on commodity x86 with Linux (after some years, for 64-bit).
ETA: particularly because the redundancy was supposed to make it super reliable
It reminds me of the lesson of the Apollo computers, The AGC was to more famous computer, probably rightfully so, but there were actually two computers, The other was the LVDC, made by IBM for controlling the Saturn V during launch, now it was a proper aerospace computer, redundant everything, a can not fail architecture, etc. In contrast the AGC was a toy, However this let the AGC be much faster and smaller, instead of reliability they made it reboot well, and instead of automatic redundancy they just put two of them.
https://en.wikipedia.org/wiki/Launch_Vehicle_Digital_Compute...
There is something to be learned here, I am not exactly sure what is is, worse is better?
SGI in the Ewald years tripped itself up, then in the Rick Belluzzo years made a cavalcade of avoidable mistakes.
A few days later, our admin noticed over the weekend that he couldn’t remote log in. He checked it out and… the machine was gone. Stolen.
Somebody within Sun must have tipped off where these things were delivered and rented a crane to undeliver them.
They would have access to site-specific info like how easy it is to get access to that server room to open the windows.
The old saying is “opportunity makes the thief.” Somebody at Sun has much less visibility into the opportunity.
Years later, Ewald and others had a hand in destroying the Beast and Alien CPUs in favor of the good ship Itanic (for reasons).
IMO, Ewald went from company to company, leaving behind a strategic ruin or failure. Cray to SGI to Linux Networx to ...