Monday, August 25, 2014

Sensor data with MQTT

Messaging protocols like MQTT makes it easy to design dynamic, scalable and modular architecture for sensor integration.

Message queues with publish/subscribe schema makes data producers and consumers independent from each other. The producer/publisher does not need to know who if any is interested in the data. Consumer can be changed in fly without any interruption to producer, and vice verse.

Pub/sub architecture
By nature, such architecture is unreliable, as producer does not get acknowledgement whether the data was received by anybody. MQTT tries to tackle that by introducing some QoS functionality;"at most once", "at least once" and "exactly once" delivery. In addition to that, publisher may define last will and retain messages. The last will will be delivered to subscribers if connection to the publisher gets broken. 

One common mistake with MQTT is to consider it as a pipe to deliver structured data in between producer and consumer.  Even if MQTT can do it, it's not according to the original design principle. The topic field of each message should contain relevant metadata about the meaning of the message. By representing the information in the payload data itself, the benefit of broker is lost. Let the broker do its job!

Let's have a practical example


Tellstick Duo
Tellstick Duo from Telldus can receive data from various wireless sensors from different vendors. In case of WT450H transmitter, an example of raw sensor event data from telldus-core driver consists of following fields:
  • class: sensor
  • protocol:mandolyn
  • id: 22
  • model: temperaturehumidity
  • temp: 22.9
  • humidity: 56 

It's feels quite natural to formulate the data as a JSON message and delivered it with topic something like /sensor/telldus. But! That's not how MQTT is supposed to be used. Better way is add the metadata in the topic and let payload only to contain the actual data. Something like:

  /sensor/temperature/<id> 22.9
  /sensor/humidity/<id> 56

Why this way?  The broker can do its job by delivering messages to those and only those who are interested in that particular message. If the topic would be just plain /sensor/telldus, every consumer would receive every message, and then in application level they should parse the message and decide whether they interested at all in it.

Thursday, August 21, 2014

Fault-tolerant IoT architecture

Distributed databases makes it easy to set up fault-tolerant architecture for IoT and more.

Let's assume a system with data sources like wireless sensors, gateways and back-end servers. In order to ensure that there is no single point of failure, some level of redundancy is needed by duplicating gateways, connections and back-end systems.


Fault-tolerant IoT architecture.

Data exchange between gateways and back-ends can be realized with help of distributed database, without need for separate transfer mechanism. As described in my earlier posting, distributed databases can be characterized by whether they favor  availability or consistency.

GaianDB is a dynamic distributed federated database provided by IBM.  GaianDB advocates a flexible "store locally, query anywhere" (SLQA) paradigm. Data is stored in one database, and queries are propagated across the whole cluster to find the given data. This approach by itself does not guarantee high availability, but when combined with redundancy, it gives nice fault-tolerant system.

In the diagram above, each sensor is expected to be listened by two or more gateways, if no-fault condition. Each gateway has it's own database storing data received from sensors it is able to listen. This means there is redundant data recorded in the system. It is important to store or buffer data locally in gateways. In case of temporary connection failure the data is not lost but can be later retrieved from the gateway.

The cluster is dynamically self-organizing, which means it always look for optimal route in between nodes, if there exists any. If individual link or node is lost, data is routed other way round. With help of redundancy, at any given situation no single failure blocks the whole system from working.

Databases favoring consistency are not good fault-tolerant architectures. Typically such have one DB instance defined as master for any given data entity. Data is available via every secondary DB, but if the master DB is out of order, all secondary ones will cease providing the data, as they can not guarantee its consistency. RethinkDB is a popular example of such database.

Monday, August 4, 2014

Gain your DevOps attitude

Playing around with a cluster is a good exercise for any developer. "It works on my desktop" is not  enough anymore.

I have to admit I have developer background and developer attitude. Since I assembled the RPi cluster reported in an earlier post, I have had to start thinking the operations way. How do I deploy into multiple instances the configuration I'm running now in this single board. Four nodes is enough to make you understand manual copy and configuration is not the way to go.


How to perform testing, deployment, configuration and management in traceable, reliable and effort-inexpensive way? This is one question that DevOps is trying to answer, by emphasizing communication around development, operations and QA functions. In order to be able to communicate efficiently, people must share common concepts, and think more or less the same way the counterpart does. This is where devops people are coming in by mixing the roles.

DevOps is considered as the third generation of software development methods, after waterfall and agile. There exists some criticism against devops, as it's considered consuming all developers time by doing less challenging tasks like QA and operations. This is why automation is important. DevOp is not supposed to perform manual testing or configure manually several instances of cloud environments.

DevOp uses his or her developer skills to build automated testing, deployment and management environment, and then focuses on developing something new, and let computers run the less challenging and repetitive tasks. At least in theory.

Back to the RPi cluster. Compiling decent database from sources natively in a single RPi takes a day or more. Cloud build environment could boost the process significantly, combined with automated cluster deployment and management tools. When I decided to build a cluster, I thought it's about studying distributed databases and messaging for IoT, but in practice it's a cluster management exercise making me a devops..

Friday, August 1, 2014

Distributed databases

Understanding fundamental differences of distributed database is crucially important when selecting proper database for an IoT application.

There are dozens of databases available at the day. Among distributed, perhaps most known ones are Cassandra, CloudDB, Riad, and RethinkDB. But which one is most suitable for my specific purpose?

Wikipedia  describes several characteristics of distributed databases. However, the article does not put enough attention to the perhaps most important factor; does the database favor data consistency or high availability?

Let's assume we have a cluster of four interconnected databases. (Surprisingly this setup reminds to what was described in previous posting). Let's assume each DB has number of sensor's and actuators behind it, only connected to one DB at a time, as illustrated below.

Distributed database.
What happens when connection to a database (A) is lost? There are two scenarios:
a) high availability: other databases B,C,D continue to provide the last known state of the nodes behind the DB A
b) data consistency: other databases B,C,D do no more provide information about nodes behind A, as they can not guarantee the data consistency. (integrity).

Which one is better? Well, it depends on your application. In a sensor network type of application where time-series data is typically stored, the high availability approach can do well. The historical data is not supposed to be altered at any way, thus it remains to be valid. Redundant databases can continue providing the last know history of lost nodes. If connection to the DB A is recovered later, all the databases can then synchronize the missing data from the duration of the connection lost.

In a real-time control type of application, historical data is perhaps not that important, but the only thing that matters is the current state of the system. With such an approach, in case of connection lost, it's better not to share the past state of the lost nodes. This is considered as favoring data consistency, no false data is provided. If a sub-system is not available, also data related to it is missing. This is quite natural.

The CAP theorem provides more scientific explanation of the difference explained above. In case of selecting most suitable DB for an IoT application, this is perhaps the most important factor to take into account. For example, how databases are synchronizing with each other is not that relevant, as long as they are connected and can synchronize. In IoT system, fault tolerance is one of the most important aspects, as system partition (connection breaks) are more than likely to occur.