All of Japan's Toyota Assembly Plants Shut Down for a Day Because Their Server Ran Out of Disk Space

Flying Squid@lemmy.world · 2 years ago

All of Japan's Toyota Assembly Plants Shut Down for a Day Because Their Server Ran Out of Disk Space

grabyourmotherskeys@lemmy.world · 2 years ago

I haven’t read the article because documentation is overhead but I’m guessing the real reason is because the guy who kept saying they needed to add more storage was repeatedly told to calm down and stop overreacting.

krellor@kbin.social · 2 years ago

I used to do some freelance work years ago and I had a number of customers who operated assembly lines. I specialized in emergency database restoration, and the assembly line folks were my favorite customers. They know how much it costs them for every hour of downtime, and never balked at my rates and minimums.

The majority of the time the outages were due to failure to follow basic maintenance, and log files eating up storage space was a common culprit.

So yes, I wouldn’t be surprised at all if the problem was something called out by the local IT, but were overruled for one reason or another.

Pat12@lemmy.world · 2 years ago

this is software speciifcally for assembly line management?

Anomalous_Llama@lemmy.world · 2 years ago

There is specific software for everything

afraid_of_zombies@lemmy.world · 2 years ago

Yeah a few levels.

Level 1: complex stand alone devices, mostly firmware.

Level 1a. Stuff slightly more complicated than a list of settings, usually for something like a VFD or a stepper motor controllers. Not as common.

Level 2 PLCs, HMIs, and the black magic robotic stuff. Stand alone equipment. Like imagine a machine that can take something, heat it up, and give it to the next machine.

Level 3: DCS and SCADA. Data control center and whatever SCADA stands for, I always forget. This is typically for integrating or at least data collection of multiple stand alone equipment for level 2.

Level 4: the integration layer between Level 3 and whatever means the company has for entering in sales.

Like everything in software this is all general. Some places will mix layers, subtract layers, add them. I would complain about the inconsistent nature of it all but without it I would be unemployed.

Pat12@lemmy.world · 2 years ago

Level 1a. Stuff slightly more complicated than a list of settings, usually for something like a VFD or a stepper motor controllers. Not as common.

Level 2 PLCs, HMIs, and the black magic robotic stuff. Stand alone equipment. Like imagine a machine that can take something, heat it up, and give it to the next machine.

Level 3: DCS and SCADA. Data control center and whatever SCADA stands for, I always forget. This is typically for integrating or at least data collection of multiple stand alone equipment for level 2.

Level 4: the integration layer between Level 3 and whatever means the company has for entering in sales.

Like everything in software this is all general. Some places will mix layers, subtract layers, add them. I would complain about the inconsistent nature of it all but without it I would be unemployed

Is this specific software engineering languages? or is this electrical engineering or what kind of work is this?

afraid_of_zombies@lemmy.world · 2 years ago

I am having problems understanding your questions. I generally operate on level 2 and we typically use graphics based languages when we implement scripting languages to do graphical languages. The two most common graphic languages are FBDs and Ladder-Logic. Both have a general form and vendor specific quirks.

For scripting I tend towards Perl or Python, but I have seen other guys use different methods.

Level 3 use pretty much the same tools. Level 4 I have in the passed used a modbus/tcp method but this isn’t something I can really say is typical. One guy I know used the python API to do it.

Pat12@lemmy.world · 2 years ago

oh, thank you

my background is not in engineering which explains my confusing questions

DontMakeMoreBabies@kbin.social · 2 years ago

I’m this person in my organization. I sent an email up the chain warning folks we were going to eventually run out of space about 2 years ago.

Guess what just recently happened?

ShockedPikachuFace.gif

This is fine🔥🐶☕🔥@lemmy.world · 2 years ago

You got approval for new SSDs because the manglement recognised threat identified by you as critical?

Right?

IMongoose@lemmy.world · 2 years ago

Sometimes that person is very silly though. We had a vendor call us saying we needed to clear our logs ASAP!!! due to their size. The log file was no joke, 20 years old. At the current rate, our disk would be full in another 20 years. We cleared it but like, calm down dude.

Dojan@lemmy.world · 2 years ago

Ballast!

Just plonk a large file in the storage, make it relative to however much is normally used in the span of a work week or so. Then when shit hits the fan, delete the ballast and you’ll suddenly have bought a week to “find” and implement a solution. You’ll be hailed as a hero, rather than be the annoying doomer that just bothers people about technical stuff that’s irrelevant to the here and now.

IWantToFuckSpez@kbin.social · edit-2 2 years ago

And was fired for not doing his job which management prevented him from doing in the first place

Semi-Hemi-Demigod@kbin.social · 2 years ago

Sysadmin pro tip: Keep a 1-10GB file of random data named DELETEME on your data drives. Then if this happens you can get some quick breathing room to fix things.

Also, set up alerts for disk space.

Dkarma@lemmy.world · 2 years ago

The answer here is not storage it is better alerting.

nfh@lemmy.world · 2 years ago

Why not both? Alerting to find issues quickly, a bit of extra storage so you have more options available in case of an outage, and maybe some redundancy for good measure.

RupeThereItIs@lemmy.world · 2 years ago

A system this critical is on a SAN, if you’re properly alerting adding a bit more storage space is a 5 minute task.

It should also have a DR solution, yes.

Agent641@lemmy.world · 2 years ago

Yes, alert me when disk space is about to run out so I can ask for a massive raise and quit my job when they dont give it to me.

Then when TSHTF they pay me to come back.

dx1@lemmy.world · 2 years ago

The real pro tip is to segregate the core system and anything on your system that eats up disk space into separate partitions, along with alerting, log rotation, etc. And also to not have a single point of failure in general. Hard to say exact what went wrong w/ Toyota but they probably could have planned better for it in a general way.

Maximilious@kbin.social · edit-2 2 years ago

10GB is nothing in an enterprise datastore housing PBs of data. 10GB is nothing for my 80TB homelab!

Semi-Hemi-Demigod@kbin.social · 2 years ago

It not going to bring the service online, but it will prevent a full disk from letting you do other things. In some cases SSH won’t work with a full disk.

GhostlyPixel@lemmy.world · 2 years ago

It’s all fun and games until tab autocomplete stops working because of disk space

model_tar_gz@lemmy.world · 2 years ago

Tab complete in vim go lolllllooolol NO

idunnololz@lemmy.world · 2 years ago

It’s nothing for my homework folder.

mohammed_alibi@lemmy.world · 2 years ago

That’s an incredible collection of homework!

z00s@lemmy.world · 2 years ago

Or make the file a little larger and wait until you’re up for a promotion…

mkhopper@lemmy.world · 2 years ago

500Gb maybe.

Swiggles@lemmy.blahaj.zone · 2 years ago

This happens. Recently we had a problem in production where our database grew by a factor of 10 in just a few minutes due to a replication glitch. Of course it took down the whole application as we ran out of space.

Some things just happen and all head room and monitoring cannot save you if things go seriously wrong. You cannot prepare for everything in life and IT I guess. It is part of the job.

RidcullyTheBrown@lemmy.world · 2 years ago

Bad things can happen but that’s why you build disaster recovery into the infrastructure. Especially with a compqny as big as Toyota, you can’t have a single point of failure like this. They produce over 13,000 cars per day. This failure cost them close to 300,000,000 dollars just in cars.

frododouchebaggins@lemmy.world · 2 years ago

The IT people that want to implement that disaster recovery plan do not make the purchasing decisions. It takes an event like this to get the retards in the C-suite listen to IT staff.

GloveNinja@lemmy.world · 2 years ago

In my experience, the C-Suite dicks will put the hammer down on someone and maybe fire a couple of folks. They’ll demand a summary of what happened and what will be done to stop it from happening again. IT will provide legit options to resolve this long term, but because that comes with a price tag they’ll be told to fix it with “process changes” and the cycle continues.

If they give IT money that’s less for themselves at EOY for bonuses so it’s a big concern /s

Swiggles@lemmy.blahaj.zone · 2 years ago

Yea, fair point regarding the single point of failure. I guess it was one of those scenarios that should just never happen.

I am sure it won’t happen again though.

As I said it can just happen even though you have redundant systems and everything. Sometimes you don’t think about that one unlikely scenario and boom.

MoogleMaestro@kbin.social · 2 years ago

There’s some irony to every tech company modeling their pipeline off Toyota’s Kanban system…

Only for Toyota to completely fuck up their tech by running out of disk space for their system to exist on. Looks like someone should have put “Buy more hard drives” to the board.

netburnr@lemmy.world · 2 years ago

It was forever ignore in backlog

blazera@kbin.social · 2 years ago

This is a fun read in the wake of learning about all the personal data car manufacturers have been collecting

R0cket_M00se@lemmy.world · 2 years ago

Was this that full shutdown everyone thought was going to be malware?

The worst malware of all, unsupervised junior sysadmins.

Takina's Old Pair™@lemmy.world · 2 years ago

Human error…lol, classic.

AnUnusualRelic@lemmy.world · 2 years ago

Idiots, they ought to have switched to tabs for indenting. Everybody knows that.

pastermil@sh.itjust.works · 2 years ago

Jokes aside, this got me thinking, is there actual difference in term of file size?

JackbyDev@programming.dev · 2 years ago

Kanban

Blurrg@lemmy.world · 2 years ago

Free disk space is just inventory and therefor wasteful.

LEDZeppelin@lemmy.world · edit-2 6 months ago

deleted by creator

RFBurns@lemmy.world · 2 years ago

Storage has never been cheaper.

There’s going to be a seppuku session in somebody’s IT department.

massive_bereavement@kbin.social · 2 years ago

Someone messed up log rotation and the whole /var went ro.

Sygheil@lemmy.world · 2 years ago

Free space is a wasted disk space.

httpjames@sh.itjust.works · 2 years ago

Server less DBs ftw

RidcullyTheBrown@lemmy.world · 2 years ago

Serverless just means that the user doesn’t manage the capacity by themselves. This scenario can happen easily if the serverless provider is as incompetent as the Toyota admins.

All of Japan's Toyota Assembly Plants Shut Down for a Day Because Their Server Ran Out of Disk Space

All of Japan's Toyota Assembly Plants Shut Down for a Day Because Their Server Ran Out of Disk Space

Toyota says plant shutdown last week due to server malfunction