Phil de Silva
Design of a Real-Time Analytics and Visualisation of IoT Data on Microsoft Azure
The concept of embedding chips and sensors into everyday objects is not a new one. Historically however, the adoption of this technology has been limited by the cost of production and the lack of suitable internet and communications infrastructure; this has now changed. Gartner predicts that there will be 5.8 billion IoT devices in operation at the end of 2020, a 21% increase from 2019.
Any everyday object can be transformed into an IoT or smart device by adding sensors, actuators, and transmitters allowing it to relay data and be controlled remotely. Examples of this would be a smart electricity meter that transmits usage data to a central server, or a smart light bulb, that can be switched on and off via WiFi or Bluetooth.
The ultimate value of IoT solutions lies not only in their ability to ingest and process data in real time, and trigger reactive actions on events, but also in the use of historical data and machine learning to produce models that predict events and allow preemptive action to be taken.
One everyday application of such a system is to indicate to a homeowner when their pot plants need to be watered. This and the next few blog posts cover the development of an IoT solution to indicate to a user when a pot plant requires watering based on measurements of soil humidity. Additional data - ambient temperature and ambient humidity - is also captured and used to generate alerts and to aid in future machine learning endeavours.
This blog post demonstrates the functionality of an IoT Data Engineering pipeline performing five actions of Ingest, Analyse, Record, Action, and Visualise on the collected data. The solution is built on Microsoft Azure.
Capture telemetry data from sensors over time - ambient temperature, atmospheric humidity and soil humidity.
Trigger email alerts when ambient temperature exceeds 30 deg C and log these alerts.
Trigger email alerts if soil humidity increases by 5% between consecutive readings (indicative of manual soil watering) and log these events.
Visualise all telemetry, alerts and events in real time.
The Azure technology stack shown in Figure 1 is used to deliver the solution. The solution will be deployed in a single Azure region. High Availability, Disaster Recovery, and Security have not been factored in the solution design.
Figure 1 – Solution Overview
Azure IoT Hub is used to ingest streaming data from the IoT device. The built-in Event Hub compatible endpoint is used with the default consumer group to allow Stream Analytics to consume the data. In this example, an IoT simulator is used in place of a physical device.
Data analysis is performed using Azure Stream Analytics to:
Validate telemetry data quality and plausibility.
Detect when ambient temperature exceeds 30 deg C and trigger an email notification.
Detect when humidity increases more than 5% from the previous reading and trigger an email notification.
Store all telemetry, alerts, and events in dedicated tables.
Azure Storage Account tables are used to store all telemetry, alerts, and events. Tables are partitioned by device ID based on the assumption that queries are scoped to specific devices. The use of message ID as the row key eliminates the possibility of duplicates.
Azure Functions are used to trigger email alerts via SendGrid. In this instance, two separate functions are created for Alerts and Events, allowing for custom extensions in the future.
Power BI is used to visualise the data in real time.
This solution is designed to demonstrate a simple end-to-end data pipeline. In a real-world scenario the following items would need to be considered:
IP filters should be used to restrict access to IoT Hub from known locations to limit unauthorised connections.
Messages could be routed to blob containers for future reference/analysis in the event of a Stream Analytics failure.
Azure Security Centre should be enabled for enhanced security monitoring.
Additional measures could be taken to verify plausibility of sensor data and flag when sensors are likely to have been damaged.
Telemetry lifecycle should be considered, and old data archived/deleted.
Azure Cosmos DB should be considered based on data volume, performance and replication requirements.
Where data is sensitive in nature, column level encryption should be considered.
Where multiple actions need to be carried out for a single event, messages should be placed in a queue/bus.
Dashboards could be built to allow the observation of trends over time.
This solution does not cover High Availability or Disaster Recovery. These will need to be factored in when designing a production system to avoid outages in case of an Azure region failure.
Stay tuned for the next post in this series, which is a step-by-step guide to how you can build a solution in Azure that meets the specifications outlined above.