In an Industry 4.0 architecture, a historian acts as the long-term memory of production. While the Unified Namespace (UNS) always shows the current state of your machines, sensors, and processes, the historian continuously collects and stores relevant data over extended periods.
But what’s the value of such data?
In my experience, historical data becomes increasingly valuable when it is accessible to the right people. The historian allows process experts and production managers to retrieve past machine and process data in an organized way. Instead of tediously combing through old spreadsheets or scattered records, reliable and complete information is readily available in visualization tools like Grafana.
Here’s a few examples:
- An engineering team at a client company analyzed historical data from their machines and discovered that the motors were rarely used near their rated capacity. As a result, the next product generation featured smaller motors without sacrificing performance. This saved costs for the company and its customers while benefiting the environment.
- Before a major trade show, the marketing team of another client used historical data to create compelling sales arguments for their automation solution. The data demonstrated that their solution saved energy and improved quality.
- In a collaborative project, a partner used historical data to implement an ambitious machine learning model aimed at predicting several critical error types in steel production. An edge-based InfluxDB collected and stored all the relevant data points.
Depending on the use case, historians can be implemented in different ways.
- Edge Historian:
Located directly on the machine or in the factory, this records local process data, provides quick insights into current events, and supports short-term analysis to enable immediate reactions to changes. - Enterprise Historian:
This consolidates and harmonizes data from multiple production lines or even sites in a central repository. It delivers a comprehensive overview, facilitates comparisons between machines, and simplifies identifying patterns that emerge only when data from the entire process chain is unified. - Data Lakes als „Historian“:
A data lake is a large, often cloud-based repository that stores all time-series events (machine data, logs, quality information) without rigid structures. While additional tools are needed to prepare data for analysis or visualization, a data lake offers immense flexibility and scalability—especially for complex analyses or machine learning models.
Buy Historian software or build with Open Source data bases?
The idea of giving your production a long-term memory isn’t new. Even before AI-readiness became a buzzword, production managers and process experts realized the potential they were missing without storing process data in databases. However, the historian use case brings specific challenges for data storage.
- Manufacturing processes generate massive volumes of data in short periods. For example, the 100 machines in Rittal's "Smart Factory" produce around 16 terabytes of raw data daily. To ensure database performance, questions about managing this data come into focus: Which data should be stored as raw data and which as aggregated values? Is it enough to rely on lossless data compression algorithms built into most databases, or are you willing to accept some data loss for stronger compression?
- Ideally, every data point (commonly referred to as a "tag") is stored with its context. For example, the temperature value from sensor PT344X2 should also indicate the specific system element it belongs to, whether it’s recorded in °C or °F, and the sensor’s range. This context should remain intact even if the production process evolves over time.
Specialized historian software solutions (e.g., Canary Labs, AVEVA, or OSIsoft PI) are designed to address such requirements. However, you don’t necessarily need historian software to implement a historian. Increasingly, companies are opting to build their solutions on proven open-source technologies. Jeremy Theocharis from United Manufacturing Hub offers a detailed explanation of why this approach is worth considering in this video.
In practice, it could be realized like this...
A data logger collects relevant process data from your machine in near real-time and streams it via MQTT. An InfluxDB, optimized for time-series data, listens for updates and stores them directly at the machine (Edge Historian). Key data points are aggregated and stored alongside data from other production processes, orders, and quality data in an SQL database within the corporate network (Enterprise Historian). To keep databases manageable, older time-series data is compressed into Parquet files and stored in the company’s data lake, along with images and documents. From there, the data can be accessed anytime for visualization and analysis.
You might think, “Building such a data architecture must be resource-intensive.” And you wouldn’t be wrong.
And you wouldn't be entirely wrong about that. I regularly speak with SMEs that began with specific innovation ideas and ended up:
- following the blueprint of large enterprises and hiring external solution architects to design an oversized data architecture,
- spending hundreds of thousands or even millions of euros on development partners over three years,
- creating a solution that only one or two partners can further develop, effectively creating vendor lock-in,
- learning so many lessons along the way that, even before the first rollout, they question whether it’s better to scrap the project and start over with version 2.0.
We believe the path to a "Smart Factory" for SMEs must be leaner. This belief shapes our principles for developing PREKIT.
The first PREKIT module is delivered as a preconfigured industrial computer, complete with an integrated InfluxDB that locally stores process data and makes it available through apps for visualization and analysis. At any time, additional PREKIT modules can be flexibly added to consolidate data from the entire production line, whether in the company network or the cloud. This allows the entire reference architecture described above to be realized almost "out of the box"—as a flexible and fully open platform.