PHM2021 Technical Language Processing Workshop
This Technical Language Processing (TLP) workshop will guide you through the analysis of text data and how maintenance decisions can be improved with this information. Each presenter will demo their methodology step-by-step to give an in-depth look into the world of TLP! For more information on TLP, please visit: https://www.nist.gov/el/tlp-coi
|Monday, November 29, 2021|
|10:15 AM – 10:30 AM: Opening Remarks|
|Speaker: Michael Brundage; National Institute of Standards and Technology|
|10:30 AM – 11:15 AM: A Technical Language Processing-Based Solution to Automatically Calculate Lubrication-Related Costs from Maintenance Work Orders|
|Presenters: Michael Stewart, Melinda Hodkiewicz; University of Western Australia
Abstract: Lubrication plays a critical role in the reliable functioning of rotating assets such as pumps, motors, gearboxes, compressors, fans, wheels and so on. Manufacturing sites can use over 50 lubricants on hundreds of pieces of equipment. Lapses in quality control of lubricants and lubrication systems such as filters, breathers and pumps, can lead to catastrophic failure of critical equipment. However, trying to identify how much is being spent on maintenance of the lubrication systems and the direct costs of failure requires significant manual input from subject matter experts. This is due to, for example, the myriad of ways various lubricants (grease, oil, lube etc.) are identified in historic work order data. In this presentation we demonstrate a technical language processing-based solution to this challenge. MWO2KG is an end-to-end pipeline for constructing knowledge graphs from maintenance work orders. The deep learning model behind the MWO2KG pipeline is trained on annotated data created through collaborative annotation using Redcoat, our open source web-based annotation tool, and can be interacted with by domain experts using Echidna, our open source knowledge graph visualization platform. We show how we have utilized MWO2KG to automate calculation of the direct cost of lubrication-related work orders. Direct cost is the sum of the cost on executing lubrication work identified in maintenance strategies and the cost of unplanned failures based on the time and materials involved. The MWO2KG pipeline is being used on real industry data although anonymized examples and simulated cost data are used in this example.
|11:15 AM – 12:00 PM: From Data Collection to Decision Making: A Step by Step Tutorial Using Maintenance Work Order Data|
|Presenters: Anna Conte, Lynn Phan, Coline Bolland, Thurston Sexton; National Institute of Standards and Technology (NIST)
Abstract: Historical data analysis provides a core foundation for optimized decision-making in maintenance management. To contextualize natural language text as a data source within this process, Technical Language Processing (TLP) provides a framework for gleaning additional knowledge from this often-overlooked type of historical data. In this workshop, we focus on how the TLP paradigm informs the analysis of maintenance work orders (MWOs), which are a widely available source of data for industrial organizations to better inform their decision making. This step-by-step tutorial illustrates several applications of TLP from within different stages of the data analysis process. We present tools, schemas, and data cleaning strategies for data collection and preprocessing stage, along with a selection of Exploratory Data Analysis (EDA) and feature-selection methods relevant to text-based MWO data. More generally, we cover how to identify and mitigate common pitfalls faced by analysts when making use of MWOs for decision support data.
|12:00 PM – 12:45 PM: Lunch Break|
|12:45 PM – 1:30 PM: RedShred: Extract, Enrich, and Reshape|
|Presenter: Jim Kukla; RedShred
Abstract: The demise of paper and the rise of the paperless office has been predicted since before the invention of the fax machine. Paper’s limitations are well known to those accustomed to the tools of today’s digital world. Unfortunately, paper analogs such as PDF inherit most of paper’s weaknesses due to a dizzying variety of internal representations for equivalent visual output. Even today, paper remains the universal “lowest common denominator” format for technical reference material that is critical in maintenance operations. In this demo we introduce RedShred, a platform that enables teams to liberate document-hosted knowledge more effectively by combining computer vision and natural language processing. This platform is built on three principles: extract, enrich, and reshape. Using RedShred, teams can collaborate on reshaping valuable content that was previously trapped in paper and paper analogs. In this demonstration we will show how users can load technical documentation and configure the platform to extract and enrich the content and reformat its content for a smaller-than-printed-page interface such as the ubiquitous mobile or tablet devices carried by field service personnel. We will also show how RedShred enrichments include useful artifacts for downstream usage such as fine-tuning language models with specific kinds of content from the documents that were ingested. We also discuss the underlying principles and mental model we use to unify these capabilities into a coherent platform.
|1:30 PM – 215 PM: Topic Modeling in R|
|Presenter: Maria Seale; US Army Engineer Research and Development Center (ERDC)
Abstract: Natural language processing techniques are often applied to labeled text data to produce numeric vectors that can inform classification models. However, a wealth of information can reside in text data that is not labeled. In these cases, statistical techniques can be used to determine groups of documents that are semantically similar, effectively “labeling” the documents and providing important information on composition and relevance. This presentation will provide a background on topic modeling and examine a use case implemented in the R programming language.
|2:15 PM – 3:00 PM: Utilize CMMS Data in Practical Ways Despite Data Quality Issues with Asset Answers|
|Presenter: Manjish Naik; GE Digital
Abstract: Asset Answers is a cloud diagnostic application that eliminates poor data quality using benchmarked standards and provides continuous data improvement recommendations. The included asset performance analytics, dashboards, and reporting tools deliver accurate metrics to qualify the asset strategy, drive better reliability and make data-driven maintenance decisions. The Data Quality Module encourages accurate data entry and accountability by pinpointing the correlation between data improvement and metric impact. Asset Answers provides an accurate asset performance assessment by analyzing the Computerized Maintenance Management System (CMMS) data for completeness, accuracy, and standards. Data quality analysis provides a list of data inconsistencies and prioritized actions to effectively resolve challenges – backed by GE Digital industry leadership, equipment expertise, and performance metrics.