Petroski Notes
Random bits about the book "To Forgive Design" 2012 by Henry Petroski, and related ideas. I write notes like this on my wiki so I can mine the text for my own later writings (and people can correct me before I do).
Book Notes
No complaints - just notes to help me remember what I found in the book, or what occurred to me while I read it.
P15 - Washington Roebling: "An engineer who has not been educated as a spy or detective is no match for a rascal." A mirror in computer security - protecting fragile software systems from malware. Similar to Bruce Schneier, many books about computer security.
P17 - Chinese wallboard: Paul Midler's "Poorly Made in China: An Insider's Account of the Tactics Behind China's Production Game"
P17 - Chinese PVC pipe: polyvinyl chowmein. When replacing water main to front of house, be sure to specify American made. How to verify provenance??? We are at the bottom of a hillside water district, the storage tank is 500 feet higher elevation, the feed pressure is very high.
P27 - Challenger accident: Space shuttle problems stem from faulty system architecture, driven by unrealistic reusability goals. The heavy LOX tank must to be "above" the main engine thrust line. Insulation needed, but cannot be inside the tank or it might ignite. That puts large wings (needed for large crossrange landing requirements, specified by military but never needed) under the tank and its insulation. The solids were segmented to pass through rail tunnels from Utah - they could have been unitary construction if made east of the Rockies, but Jake Garn and Martin Thiokol helped politics triumph over safety and efficiency. The Atlas, with pressurized tanks and a vertical stack, is far more reliable and relatively cheap to launch. Atlas tanks were thinner than pop cans, the aluminum cost is tiny compared to the logistic costs, and an engine/turbopump capable of multiple launches must be heavier than a single-use engine, subtracting from payload fraction. Sensors and avionics should be plentiful for flight recording, and can survive reentry using the first stage tanks as a heat shield (if the heavy engine and turbopump is jettisoned). The key to low cost flight is not reusability of a few thousand pounds of metal, but logistics and voluminous automated data collection - !SpaceX.
P37 - Marble columns breaking in the middle:, tied to spaceflight again. Euler's column formula limits the slenderness of rockets - otherwise they could be very tall and have very small frontal cross sections. Proposals for liquid fuel rocket systems horizontally launched from giant airplanes show little understanding of bending stresses, ullage sloshing in horizontal tanks, and the need to climb above the dense lower atmosphere before pitching over and accelerating to orbital velocity. Not to mention the logistics of launching from a moving platform, and landing that big plane with a fueled rocket attached in case of a launch abort.
P38 - Kansas City walkway: tension members breaking. Acoustic waves in cables reflect and double at the points of attachment. Unless the forces are perfectly steady state, the cables near attachments must be strongest, and gently tapered, with the risetime of stresses considered. The KC collapse occurred during a party with people dancing, IIRC.
P42 - Pentium FDIV bug: The text is incorrect. Floating point numbers are represented by a mantissa and an exponent. Multiplication and division are done those numbers (typically normalized to a magnitude between 0.5 and 1.0) the magnitude of the result of multiplication will be between 0.25 and 1.0, and the result for division will be between 0.5 and 2.0, with a shift operation to normalize the result. Classical binary division compares the does a trial subtraction of a divisor from the dividend; after the borrow ripples through many gates and reaches the most significant bit, the divider selects either the dividend or the remainder to multiply by two (a simple shift) to become the next dividend. This process must be repeated 53 times for an IEEE double precision number. The Pentium used a more aggressive algorithm, SRT2, which subtracts the divisor multiplied by -2, -1, 0, 1, or 2. The value is guessed by using the most significant bits of divisor and dividend to select among the 5 divisor values using a table lookup, then shifting the result two bits. This process can "overshoot", but the 5 values provide enough overlap to correct the dividend in subsequent lookups and subtractions. The process can be more than twice as fast as the simple algorithm, because it does not need to do complete carry/borrow propagation for all 53 bits, the table lookup can compensate for carries, too. The problem: the table was implemented as a ROM (Read Only Memory), with 256 values (IIRC), but 4 of the values were accidentally set to zero. This caused some small fraction of initial dividend/divisor pairs to not converge on the right values, in spite of the compensatory overlap. In some cases, the error approached 0.01% .
- This was not proper "belt and suspenders" design. The design failed - it was not checked logically/numerically. The testing failed - the test should not only look for manufacturing defects, but timing delays. Multiple test approaches provide a parallax view on a design, and tend to catch failures that tests designed for oversimplified system models do not.
P43 - iPhone: Steve Jobs only listened to his industrial design department. Style interfered with function on many products. The Newton, Siri software, and other botches were much more numerous than the iPod and the iPhone. Jobs was about to start a titanic legal battle with Google over Android when cancer took him. Only the young die good ...
P72 - NY WTC truck bomb: Read James B. Stewart's "Heart of a Soldier", about Rick Rescorla, security chief for Morgan Stanley in the south tower, WTC2. Rescorla was the last man out after the truck bombing. His analysis showed that the truck bomb could have taken down the tower if the terrorists had placed it better, and he was sure another attempt would be made, so he unsuccessfully lobbied to break the lease and move out of the doomed building. During , his Army comrade Dan Hill had converted to Islam and had fought with the Mujahadeen in Afghanistan. After the Cole bombing and the alleged reward for bin Laden, Hill and Rescola approached the Clinton State Department with a credible plan to work with the Mujahadeen to capture or kill bin Laden, long before 9/11. They were rebuffed. So, Rescorla practiced evacuation drills for Morgan Stanley employees. On 9/11/2001, he was supposed to be on vacation, but was substituting for subordinate, when the plane hit the north tower, WTC1. The building PA system told people in WTC2 to remain at their desks; Rescorla's security team disabled the PA and evacuated 3000 employees from WTC2 south tower, singing Men of Harlech, praise, and encouragements into bullhorns. Rescorla and his team went back into the building to search for stragglers when it collapsed. Rescorla was the true hero of 9/11; he had worked hard to prevent it, he was overridden by management and government, he defied authority to save thousands of lives, and gave his life to leave nobody behind. His designs were organizational rather than mechanical, but in the true engineering spirit.
Natural disaster: The highest risk of natural disaster facing the US is the Cascadia subduction zone, currently estimated to have a 50% chance of a very large subduction zone quake in the next 50 years. Given the woeful seismic underdesign of Portland, Seattle, Vancouver BC, and cities in between (designed for magnitude 5, not 9+), large coastal resorts built on sandspits, and transportation systems easily severed by a big quake, it is possible that hundreds of thousands will die. The last big subduction quake in 1700 deposited sand on top of cliffs, and wiped out so many of the warlike coastal tribes that Lewis and Clark survived a winter without attack, a century later. The tsunami killed hundreds in Japan (which is how we know precisely when it happened).
P77 - Nuclear plant: The Fukushima DaiIchi plant site was originally hilly. It was graded down to sea level for easier construction from barges. If the back up generators were up the hill instead of in the basement, and the electrical switching was above ground, the plants would have rode out the tsunami just fine. Units 1 and 2 were scheduled for cold shutdown the very next month. It would have been sooner, but safer replacement plants elsewhere in the TEPCO system were delayed by the permitting process.
P93 - Simulation versus testing: Chip design suffers from this. We do a heck of a lot of simulation and modelling before building, but only nature can simulate everything. We are designing "all on one chip" systems combining radio receivers and processor clock drivers. Simulations based on simple models of interaction through the substrate (not unlike a waterbed with children bouncing on it) do not accurately model how the clock drivers can interfere with the receivers. So Intel wrote better simulations, and also designed the clocks so they do not emit noise in the same frequency bands as the receivers. Belt and suspenders. The Intel Sandy Bridge chipset simulations probably used tens of millions of dollars worth of computer time. The simulation did not detect that too much current draw occured when the USB ports in USB3 mode, leading to early failure. They only found this out after extended life testing, which was not properly analyzed until millions of chipsets had been built into motherboards. Learning from the FDIV fiasco, Intel spent almost a billion dollars replacing those motherboards (none of which had failed in the field so far).
P94, picture of JoDean Morrow: Includes a Tektronix graphics display, probably a 4010. There are 4 people lined up, it is not exactly clear who is Dr. Morrow doing the demonstrating, and who are the 3 named observers, but I assume he's the fellow on the right with the hand raised.
Blind testing for cracks and fatigue failure: can this be done with transducers and acoustic propagation? A small gap is acoustically quite different from continuous metal, and I imagine the corner of a crack refracts high frequency sound waves in an obvious way, and that metal at the point of failure propagates sound differently than less strained metal. A few temporary clamps with transducers could shake (very small shakes) and observe, especially with some pre-modeling of the structural element suggesting where to look.
P117 - plastic models: I sometimes built 20x larger models of integrated circuit structures, with plastic and copper tape and solder. Maxell's equations scale nicely, and I can measure everything with 20x slower (and much cheaper) equipment.
P122 - picture of bridge: Could round aeroshells around girders reduce wind loading? Maybe. Or maybe it would just interfere with inspection. Perhaps the best inspection would involve hundreds of thousands of digital camera pictures, analyzed back in the office, much with automation, perhaps aided by cheap overseas engineers. The cameras, on long skinny poles, could reach inside shells, also controlling lighting of the photograph for better automated analysis.
P133 - inner corrosion of wires in suspension cables: Electrical test? An accurate measurement of end-to-end resistance versus temperature and ambient humidity probably could measure this. We can measure resistance with AC "4 point kelvin measurement" to 8 decimal places. I bet we could see individual wires break in real time (over minutes - to get 8 places you must do a LOT of signal averaging). Since current can find it's way though many paths, you might have to instrument all the paths to subtract them from your measurements. But you probably want to know about their continuity, too. Temperature will change resistance, a lot, but temperature can be measured on the surface and estimated inside.
P139 - spacing trucks over a bridge: Scales, timers, and a pullout lane approaching the bridge with signals directing trucks onto the lane if another truck is on the bridge. Perhaps warnings of congestion a few miles up the road. I imagine the problem is accentuated if truck wheel spacing matches structural spacing in the bridge deck. Extra resonances?
Demolition: Are there rich guys with more money than sense who will pay a million bucks to push the button that initiates the destruct sequence? Perhaps a lottery for the privilege. Perhaps additional filming rights for Hollywood action movies. Whatever helps pay for a safe job.
Failure Lottery: How about a betting pool, which bridge is going to fall next? The competition among civil engineers might identify some candidates in need of repair. If as a result of their diligent efforts, bridge is repaired before it fails, they get a compensatory prize, perhaps paid from additional funds added by the NTSB.
We should have webcams on these structures, both for trip planning and for watching failures.
P168 - de Havilland Comet: Read Nevil Shute's 1948 novel "No Highway in the Sky", and watch the 1951 movie starring with Jimmy Stewart and Marlene Dietrich. Jimmy Stewart plays a materials engineer who discovers that the tail section of a new airplane is likely to fracture after a precise number of hours. The precision of the failure prediction is silly, but Stewart as a materials engineer is priceless. Nevil Shute's autobiography "Slide Rule" is a must-read for anyone interested in the history of aircraft engineering - and rapid technology growth - and failures. In his day job as engineer and entrepreneur, Nevil Shute Norway founded air transport manufacturer Airspeed in 1931. Airspeed grew through two major revolutions in aircraft to 1000 employees by 1940 - when it was too small to compete, and sold to de Havilland. Perhaps if Norway had remained with Airspeed/de Havilland, the world would have missed some great engineering novels, but his knowledge of failure would have resulted in a safe version of the Comet. de Havilland would be Boeing's major competitor, not Airbus.
P173 - Titanic: My grandfather embarked for the US on the Canadian Pacific "Empress of Ireland" in late March 1911 ( ticket here ). In 1914, the Empress collided with a collier in the St. Lawrence and sank, killing 1012 people, less than the Titanic's approximately 1600 casualties and the Lusitania's 1200, but mostly ignored by history. Not a structural failure, but certainly a traffic design problem.
P180 - Iron ring: Never saw one. But then, despite a Summa Cum Laude EE degree from UC Berkeley, in the state of Oregon I am not an "engineer" unless I take the PE test. Not relevant to my field (though I have other certifications).
P199 - Tacoma Narrows Bridge: Gig Harbor is the upscale bedroom community for Tacoma, and is home to some very wealthy people. Friends live there. The bridges are pretty.
P274 - Good description of Gulf Spill: The energy industry employs fewer engineers per revenue dollar than most other technology fields. The fewer they are, the more blame they get.
- Physicist Sir Michael Berry refers to similar asymptotic problem as "the worm in the apple" - if you bite into an apple and see a worm, you get upset. If you see HALF a worm, you get MORE upset, with your discomfort increasing the less worm you see remaining. But if you see no worm, you don't get upset at all. So if only one engineer was employed by the entire oil industry (with predictable bad results), the other millions of us would be blamed for the bad things "engineers" do.
- Still, they should have tested the preventers better before deploying them. And had tools handy to cap a well if they failed. We have armies ready to remediate diplomatic failures; why don't we have armies ready for other disasters? We have far more disasters (technological or otherwise) than we have wars.
- The biggest source of pollution in the Gulf is not oil spills - it is phosphate fertilizer runoff from the Mississippi, much resulting from corn grown for biofuel ( "Food'o'Fuel" - feed cars, not people). Oil extraction can be environmentally costly, but not as much as some "pseudo-green" alternatives.
P312 - picture of cranes: While I was reading this in a bus shelter, two 14 year old boys looked at the book over my shoulder and got really excited. One of them was an artist, had already sold some artwork. We talked about professional technical illustration, and I showed them some technical illustrations I did. Some kids still love learning about engineering - there's hope!
History of failure: In Japan, they say failure is a golden treasure. Back in the days of the first computer monitors, Japanese TV manufacturers made high resolution color CRTs on the same production lines as their less demanding TV CRTs. This complicated the lines, but the computer CRTs showed defects far more often than the TV tubes. Finding and solving problems on the difficult tubes made the ordinary tubes more reliable and yield better. The extra flexibility helped transitions to new designs, and adjustments to changing markets. Sometimes you do the hard stuff, at considerable cost, so the ordinary stuff can be cheaper and better.
Reliability and failure in chip design
Electronic product design accumulates failure experience more rapidly but more inconsistently than structural engineering. In a throwaway culture, a product is often discarded before it has a chance to fail. Exceptions include instrumentation (see Vintage Tektronix below), ships and planes, automotive, and communications central plant electronics, but even here the replacement cycles are rarely more than 20 years, and also obsolescence driven. Usually, the electronics is more reliable than the software running on it, so errors in software receive more attention ( properly so, though with less success ).
Electronics for implantable medical products (pacemakers, etc.) must have super high reliability, but medical volumes are small and the duration of use can be as short as hours (for catheterized imaging sensors, for example). The very high liability exposure often means that mainstream manufacturers actively avoid these markets - for example, Motorola sold piezo sensors for land mines, but refused to sell them for defibrillating pacemakers. Electronics for satellites require high reliability, but direct forensic analysis of failures is usually impossible; these failures are often inferred via telemetry.
Electronic systems are cheap and abundant enough to permit sampling and test to destruction - components and whole systems. Heat accelerates the chemical and thermal expansion processes that drive most failures, so electronics are tested with temperature cycling ( as much as -50C to 150C ) and month-duration high temperature testing ("life test") of hundreds of samples. Failure doubles for every increase of 10C, so a product baked at 150C for a month emulates a product operating at 50C for 80 years. These temperatures destroy or warp plastics, liquid crystal displays, etc., so forensics can be incomplete.
Sadly, some manufacturers of consumer electronics perform no testing at all, beyond making a few prototypes out of one batch of prototype parts and not finding egregious failures. We sometimes ironically call this cruelty-free - there's no testing. Like the animal non-testing equivalent, the customers become the guinea pigs. OTOH, companies like Apple perform hundreds of tests during product assembly in their Chinese manufacturing plants. Finding defects early, and correcting processes quickly, increases product quality and reduces scrap rate.
Some products are designed with built-in telemetry, making factory test easier, but very helpful for post-failure forensics. The IEEE 1149 family of test standards reduce the number of (unreliable) contact probes necessary to measure a product during manufacture. My company SiidTech designs and licenses identification circuits that are built into the chips, allowing failed chips to be compared to saved individual testing data on the wafer. Our clients use this to improve their tests (reducing future failures), detect otherwise anonymous chips whose parameters change during manufacturing (they will probably change past failure in the field), and even discard assembled chips that were neighbors to failed chips on the wafer (the neighbors tend to fail faster).
The greatest value of failure data is process improvement. Like all engineering disciplines, failed components teach us where manufacturing is inadequate. Non-failures teach us to be more aggressive, reducing costs and pushing the performance of future components. Cell phones and microprocessors are pushed to the limits - a cell phone can be more power efficient if smaller power amplifier transistors are used with higher voltages and temperatures. They degrade faster in the field, but most consumers lose, break, or replace their cell phones before this happens. Computer CPUs are stressed similarly; customers demand performance, and old computers grow uselessly obsolete, so computers are optimized for maximum performance over 5 year lifetimes. Hobbyist "overclockers" eke out a few percent more speed while reducing lifetimes to months. The heat and increased current pushes metal atoms in conductors and creates cracks and voids - "electromigration".
Electronics have become so reliable that politicians push the other way. In order to prevent lead in landfills, RoHS ( Reduction of Hazardous Substances, "row-hoss") rules forbid lead in solder, moving to other "non-eutectic" tin alloys. These alloys do not flow like solder, and have high stress at grain boundaries in the metal. The material relieves stress by forming "whiskers" (long, spindly crystals) that reach out from solder joints, shorting to neighboring connections. These have caused failures in aircraft and satellites - crashes and derelict satellites don't add e-waste to landfills, either, perhaps an occasional pilot to a cemetery.
Perhaps the politicians are also increasing death tolls for structural engineering, with aesthetic and construction "convenience" demands. Airport noise reduction and fuel economy changes how aircraft engines and airframes are designed; the Rolls Royce engines for Boeing's 787 Dreamliner are engineering marvels, but they are pushing into new territory. International politics demand that components are made in China, Japan, and elsewhere, often by inexperienced manufacturers. The Dreamliner's carbon fiber wing roots turned out much weaker than planned, delaying deployment of the plane for years (fortunately, during a order-delaying recession). Boeing may pull all this off without increased failures in the field, they are very good engineers. But Airbus must push performance even more than Boeing to re-capture market. As these two giants vie for market leadership, we may learn we have pushed too hard as these planes age, fail, and passengers die.
Abusing statistical distributions
Most phenomena don't fit gaussian curves. Measuring a few samples, computing a mean and deviation, then extrapolating to large deviations drastically underestimates the probabilities at those extremes. Almost all real measured distributions from many samples have fat tails, kurtosis, much higher probabilities for large sigma. On some of my circuits, I've built over one million samples (thousands of circuits per integrated circuit die) to characterize the extremes, verifying that there were many larger deviations than a naive bell curve would suggest. The usual case is that there are common variations which sum statistically, as well as rare defects that add a large amount of variation to a small subset of samples.
Nicholas Taleb's book "The Black Swan" describes this as the occasional event that falls way outside normal variation, in finance and in life. Between 1920 and 1975, his home country of Lebanon was peaceful, prosperous, and ethnically diverse. In 1975, civil war broke out and some of his high school friends were now trying to kill him. This formed his later investment (and life) strategy - hide and read, don't assume business as usual, prepare for major catastrophes. The book is a charming romp through the inevitability of the unexpected - such as the black swans used by European philosophers as examples of logical absurdities - until they were discovered in Australia.
Bart Kosko's book "Noise" is an electrical engineering take on the same issues - this time signals on a wire, or variations in parameters when measuring batches of parts. The outliers are always there. Even more mundane behaviors like Poisson distributions show that if one event happens with small probability over a short time, there is a significant probability of many events happening during a similar short time sometime in the future. For example, if there is a 10% probability of a piece of debris running into your space elevator every orbit, it seems like you can dodge it. But after 20 years and 100,000 orbits, there will be 17 orbits where you have to dodge 3 or more debris pieces during the same orbit. Dodging 3 objects nearly simultaneously may be beyond the capacity of your space elevator to dodge. This Poisson "many things going wrong at once" behavior is characteristic of complicated systems running continuously for a long time. If your system can't be allowed to fail, you need a lot of spare capacity to deal with more than one problem at a time.
Vintage Tektronix oscilloscopes
In the 50's through the 70's, Tektronix designed oscilloscopes to last a very long time. Some have, and are displayed at the Vintage Tek museum, between Beaverton and Portland. Normal museum hours are Friday and Saturday, but I imagine that special hours could be arranged for visiting dignitaries. During normal hours, many of the retired engineers who originally designed this equipment can be found in the back room, repairing units for display. A don't-miss opportunity for engineering historians.
The very durability of vintage Tektronix equipment means that there is a lot of it on the surplus market, and mouldering in instrument storage at hundreds of companies. One of my dreams is that universities could bring these old instruments and old engineers to campus for a two week "gross anatomy" course for seniors or grad students, taking these old instruments apart while explaining the engineering decisions that went into them.
I sometimes run a junior version of this, bringing a few hundred pounds of old electronics and a dozen sets of tools and safety goggles to a weekend academy setting, letting kids from 6 to 12 take stuff apart and see how it is put together (6 year olds love taking apart deskset telephones). I've had the dubious honor of watching a mom drag her daughter away kicking and screaming - the girl wanted to take stuff apart all night. Most kids don't care about how things are made. But the future engineers care a Whole Lot. Supplying them with scraps to take apart and rebuild (and teaching them to do it semi-safely) should be a priority for our profession. There is a lot of scrap wood that should be going the future structural engineers out there - how can we foster their projects while keeping the kids out of the emergency room, and ourselves out of court?
Robert Courland's "Concrete Planet"
I would love to see a review by a concrete engineer - this book seemed a little "selective" about concrete failures. Book notes here
Cathodic protection of rebar
Concrete Planet makes much of the rusting of rebar in old concrete structures, and the trillions of dollars that may be necessary to replace thousands of them in the near future. I don't know how real the problem is. If it indeed a serious problem, we ought to be looking for ways to slow or stop this decay.
Rebar iron won't oxidize nearly as fast if it is biased cathodically. Too much voltage, and the electrode will emit hydrogen, which can embrittle the iron. Without knowing the voltage drop through the concrete, it is difficult to bias the iron properly.
A proposed invention
I'm not interested in patenting the following - consider it public domain. If it makes sense, please use it to protect America's aging reinforced concrete, saving lives and tax dollars. Perhaps a joint project between the CE and ME departments at Duke could get some papers out of it. I can imagine this turning into a little circuit board with a solar cell, a battery, and a small integrated circuit with a simple bluetooth radio transceiver for status logging and bias calibration, mass produced and added by the millions to reinforced concrete structures.
If you live in a place like Oregon with lots of rain and humidity, and have cheap telephone jacks with inadequate gold plating, you've listened to anodic oxidation of copper - it makes a hissing noise. These may be the little electrochemical events of copper ions oxidizing, small voltage spikes adding up to electrical noise.
The current drawn from a cathodic bias circuit on embedded rebar may make a similar noise, if the voltage is inadequate to prevent oxidation events. If the current is detected with a "virtual ground" amplifier, the capacitance of a few hundred feet of rebar can be nulled, and the audio band noise measured. Of course, the circuit will need a lot of filtering to reject radio noise and other antenna pickup.
I also presume that hydrogen generation is a higher energy phenomena, and has a different noise spectrum. So proper bias on a rebar electrode will be between the "copper oxidation hiss" and the "hydrogen hiss". It would be instructive to digitize the electronic spectrum of a piece of rebar in concrete as a function of voltage, and see if these hisses can indeed be distinguished, and a bias point chosen for the optimum tradeoff between hydrogen and oxidation.
If the voltage oscillates, we might learn about the migration of evolved hydrogen. Will it react with the iron oxide, reducing it back to iron and water? Can we learn about the depletion of alkalinity in the concrete around the rebar? We may learn to coat the rebar with materials that work in conjunction with cathodic protection to improve survival still further. These processes could be emulated at small scale in the materials lab, perhaps combined with high temperature lifetest acceleration.
I do not know how well this would work for large expanses of interconnected rebar - chances are there will be DC voltage gradients across the structure, so some portions of the electrode will be oxidizing while others will be reducing. While this technique might still help prolong the lifetime of existing reinforced concrete, it will probably work best on rebar electrodes carefully designed to match areas of similar ambient electrical potential and long term water infiltration and pH changes. So new designs will eventually get finite element electric field analysis, as well as mechanical analysis.
It may also be possible to find patches of damaged rebar by using large electrode plates on the surface of the structure to change the electric fields, then look at spectrum changes. Perhaps external patch electrodes could be added to parts of an existing structure's surface to remediate or stop damage. Ugly, but not as ugly as a bridge collapsed into a river.