1. Introduction
We must have as a principle that unexpected failures are not normal causes, they always need to be investigated because it is these “outliers” that can bring us opportunities for improvement in the system.
Unexpected failures are what normally leads us to the loss of control in the production process, and thus, investigating “outliers” is valid both for process control, due to failure or waste generation.
We need to seek control of the production process, with high productivity due to the high availability of equipment, and low waste due to the control of production and maintenance processes.
We can find causes that may be present in other points of our system, which could be blocked, other machines of the same family that may have their future failures anticipated and blocked, which will bring benefits of eliminating maintenance costs, which will be prevented by eliminating the cause and productivity gains by increasing machine availability, including possible environmental and safety risks.
Any incident, whether personal safety, environmental safety, operational continuity or waste needs to be investigated, because only in this way, making it a habit not to accept incidents, investigating causes and looking for ways to block them is that day by day improvement of results is obtained.
We are going to try to better expose the concept with examples of events.
2. Water overflow from the lime solution preparation tank in the WWTP
After a few weeks of the WWTP unit start-up, an overflow occurred in the lime tank, which generated an Environmental Incident communication. All overflow is collected in an underground reservoir and contained in a dike, which limited the event.
The fact was taken to a production meeting. Initially as a probable operational failure, it was reported that the Chemical Technician had left the valve open to complete the level of the tank, but involved in the reception of a truck of Internal Varnish, he forgot, and with this the overflow occurred with level alarm of the containment tank.
A directive that must be taken is that an operator should never be responsible for accompanying something that can be executed in an automated way, and thus a team was assembled with a Chemical Technician and an Electronic Technician to better investigate and understand what was really happening, and bring to Management a report of the fact rather clarified.
Investigation team report:
The system is programmed to operate automatically through the addition of a fixed volume of water and pure lime for use in water treatment.
The lime stored on the FL2 Filter platform is always added in a fixed volume for each water injection cycle, with a fixed volume in 2/3 of the tank volume, which maintains the concentration of the solution.
The system is designed to request the addition of lime when the level reaches 1/4 of the total height of the tank (LSTK14L). When this occurs, the solenoid will open and begin to inject water, which does not stop the WWTP, and will request the preparation of the solution (pour a 20 kg bag into the tank) to the Chemical Technician, informed by alarm on the low level panel and message “PREPARE LIME SOLUTION”. Only after the Technician’s acknowledgment is the alarm deactivated. The solenoid closes when the level reaches 100% (LSTK14H).
The levels are mercury float, and work inside a compartment to reduce agitation, which can damage the floats, but the high level float was damaged by the agitation, even with insulation, so it did not operate and ended up overflowing the tank. The failure was reported in the shift report, and there was a request for manual accompaniment by the operator until the float was replaced.
Investigation conclusion: Inadequate level control system, once it is fragile for the operating condition. Recommendation to replace the mercury float level with another ultrasonic one, which will not have contact with the liquid and is appropriate to work remotely, avoiding the risk of breakage.
Immediate action: Repair the system with change of the float, buy adequate level, and place a time interlock for cutting the water injection, with alarm, to avoid new overflow in case of failure of the high level control.
Manager’s conclusion: We should not expect a Technician, who has several tasks in the shift, to have the responsibility of the function of controlling the level of a tank. When the float broke, the action should have been to provide an additional automatic blocking solution, so that the environmental risk was blocked. The breakage of the float should have been treated as a disabled Interlock and an immediate mitigating action should have been provided, so the Manager requested that a recycling training on Jump Procedures of Interlock include situations like this, in which the Interlock would not be jumpeado, but would be inoperative, and that the new level was purchased and installed, with safety timer after opening of the solenoid of water addition, with alarm for failure in the performance of the high level. This might not prevent the overflow, but it would have limited the volume controlling the environmental risk.
All volume generated needs to be transferred to the WWTP system, which generates additional work to the Chemist, and without control it could happen that the volume for containment guard was not sufficient and the entire WWTP was flooded, reaching an extreme dilution of the volume, with loss of control of the pH of the WWTP, and paralyzing the entire operation due to lack of operational capacity of the WWTP.
3. Body Maker CMB 5000 with Main Bearing jamming
A problem that occurred in more than one BM, and a specialized team was assembled to investigate why more than one machine had suffered jamming of the main crankshaft bearing. A stop that took 48 hours of maintenance, and that should not occur, given the protection system with pressure control per bearing, differential pressure control of the filter, which made the lack of lubrication or failure in the filter an unlikely cause.
Investigation conclusion: The team found that the bearing jamming was occurring due to contamination, which reached the bronze bushing, which caused the jamming. Investigating how a contamination passed through the filter, it was found that the filter internally had a safety valve to prevent the filter from breaking due to excessive pressure, and that when there was the start of the pump occurred a pressure pulse and passage in the valve, and this allowed the contamination to pass through the filter.
Solution presented: Block in all machines this internal relief valve of the filter (we had 9 BM in the production line) and adapt the protection with safety pressure switch of excess pressure, with time limit for action, to cover the start of the pump. The internal safety valve protected the filter against breakage of the housing, to avoid oil leakage. If this happened the pump would stop due to lack of pressure, so it was considered a minor risk.
The conclusions were sent to CMB, manufacturer of the equipment, which subsequently made consistent improvements eliminating the cause of this type of failure.
If you discover a cause and can improve the design of the equipment, it is good practice to exchange information with the manufacturer of the equipment, and with the other units that have the same type of equipment, thus extending the blocking for future failures.
4. Uncontrol in the viscosity of the Overvarnish
Below we present an example of a special cause investigation report, which later became part of the Trouble Shooting for special cause, generating a Point to Point Lesson (LPP).
Problem description: The varnish lost viscosity, causing the defect “orange peel” and splashes of varnish on the wall, forcing the operator to stop the printer and replace the reservoir, discarding the previous varnish.

Description of the facts:
The viscosity control system was adjusted to 2.1 and obtained a reading of 25 seconds in Ford Cup No. 4, but the system did not maintain control, causing splashes, including breakage of the Pre-Spin belt due to splashes.
After the corrections of the system, the adjustment remained at 2.1 with constant reading of 25 seconds of viscosity.

INVESTIGATION:
The D&I water hose was removed from the solenoid valve of automatic viscosity control addition and the adjustment was verified to open and close the valve, verifying that it was operating correctly, but it presented a leakage problem, letting a small amount pass when it was closed.

Control actions required for LPP:
- Installation of a filter before the solenoid valve, with pressure gauge before and after the filter for verification of saturation.
- Review the daily check list, including the verification of the fixing screw of the actuation rod of the limit switch and reading of the pressure gauges of the filter, for saturation control.
- Reinforce with the operators, in their training for the position, the need to maintain good cleaning of the Ford Cup No. 4 with alcohol before performing the viscosity reading of the varnish. The orifice must be clean for a correct reading.
- Periodic training every six months in the daily inspection procedures, including in the trouble shooting list the history of the problem cited, with definition of the problem caused by viscosity below the range, with an action plan for its verification.
- The Quality area must maintain a semi-annual verification plan of the training of the operators.
It is important to mention that a special cause investigation must precede a “brainstorm” to discuss among the group the possible causes, and thus assemble an action plan to verify the points addressed and the actions taken for corrections in a always broad way, which cannot be restricted to a single point found with problem addressed by the team, but cover the entire plan. In case of other equipment or similar systems that may go through the same failure, expand to all, blocking future special causes in the other points.


5. Breakage of the Cupper transmission shaft
This is an example in which many times we seek to reduce costs, but by not having total control of the fact we can get exactly the opposite.
The Minster press for manufacturing bodies has a transmission shaft that transfers the movement of the motor to the crankshaft, in which the clutch is installed, which has a useful life of about 24 months, and that after this time begins to present failures in the control of the automatic stop position, and therefore goes through scheduled preventive change with a cycle of 24 months of operation.
This specialized work is concentrated in a specialized corporate maintenance team, which serves 15 units in South America. Each shaft cost about U$ 4,000 and the cause of the replacement was due to wear on the neck of the shaft.
Considered as high cost, and the replacement due to wear of the neck, the team consulted a company specialized in neck recovery with use of Metallization, being indicated by the company specialized in rework of shafts and large parts that would do the rework with alloy to recompose the diameter, with application temperature of 700 °C and surface hardness in the deposited layer of 60 HRC, with a replacement limit of 1.5 mm maximum, being the maximum wear tolerated.
On a certain day, in a plant outside of Brazil, it was reported that after 4 months of operation, the new clutch installed had broken the shaft, which caused the stop of the Unit. Immediately a new unit was sent, and after 76 hours production was resumed, and the clutch with problems was returned to Corporate Maintenance for investigation.
Problem: Breakage of the main shaft of the hydraulic clutch system Minster Press
Corrective action: Interrupt the procedure of reusing the clutch shaft, verify in the Centralized Maintenance system the Units that received reworked shaft and seek the replacement of these in a planned way. Recommendation: Review procedures related to the rework of equipment parts.


The root cause of the problem was a failure in the analysis of the risks involved in the recovery of the shaft. The risk that tempering of the shaft could occur due to the rework was not considered, and the consequences of the failure of the shaft were not duly considered.
Actions for root cause:
- Alteration of procedures, so that the involvement of the Maintenance Engineering area is included in reworks.
- Cancel rework processes in parts of unitary equipment, which could imply stops of production line with time greater than 4 hours.
Thus, if the part considered for rework for reduction of maintenance costs implies a possible breakage, and its replacement with return to production entails a stop time greater than 4 hours, it should not be considered, and only original parts from the manufacturer should be used. - In non-unitary equipment, the rework must also go through risk analysis and involve the Maintenance Engineering area.
- Critical parts, such as crankshaft, transmission shafts, must go through risk analysis and not consider rework if the replacement may imply a stop time greater than 8 hours.
6. Final considerations
A very important tool is the implementation of the L.P.P – which is nothing more than a sheet of instructions that allows all of the production and maintenance activity involved to have knowledge of the fact occurred and assimilate the knowledge already experienced by another operational group.
A very important point is the high potential in companies that have several Plants with similar equipment. A lesson in one Unit, if transferred to another, will contribute to the general growth.
A tool that allows the dissemination to all plants, in an organized way, by topic, equipment, subject, to serve as a basis for Trouble Shooting for consultation will be very effective.
An old popular saying already says that intelligence is learning from the mistakes of others, not from one’s own mistakes.
Every unexpected failure presents potential for improvement in the system, but we may not all be able to analyze, so a criterion for investigation is necessary.
The criterion is for the relevance of the failure: if the failure caused or had the potential to cause material losses, generation of rejection, or potential for damage to safety or the environment, it must be investigated.
Failures without relevance will always be the great majority, and the relevant ones will be few.
Many failures occur due to error of identification. Whenever possible we must use the concept of Poka Yoke, which is nothing more than preventing accidental error.
Example: if you have storage tanks, with diverse raw materials dedicated to each of them, the ideal is that the loading coupling of each one is different, and even involve the supplier for the delivery truck, so that it prevents the operator, by mistake, from unloading a product in the wrong tank.
A simple identification, for example, different colors, is not characterized as Poka Yoke because it does not prevent accidental error.
The second step is the dissemination of the L.P.P., inform everyone about the problem and disseminate the knowledge, so that a new failure due to known cause does not occur.
In this step must be included the review of the training, being added to the Trouble Shooting.
In companies with multiple sites, the management of this type of action in computerized system for consultation expands and enhances the overall result.
Finally, have a standardized system. We don’t usually appreciate rules and regulations, but when it comes to Production, Maintenance and Projects, they are important for performance.












