Big data, a new historian

While openSCADA and Eclipse SCADA already have some simple value archive that can store values for years, including master/slave replication and composite servers, we know that the archive won’t scale beyond the boundaries of one server.

That is why we started thinking of having one “big data” like historian for Eclipse SCADA. We started playing around a little bit with HBase and came up with some pretty nice ideas. Hadoop and HBase as a backend allow us to scale the way we want to.

Since we started by building an extension to our current value archive system, we started the development inside the Eclipse SCADA project, and are still tracking some design thoughts there. However it became obvious very soon that this system would not only work for Eclipse SCADA, but could also be used by other systems as long as we keep the interfaces as open as possible. So hopefully this project will end up as a new Eclipse project beside Eclipse SCADA.

So what is the current state of all this? It is an early development stage! We do have a specific use case in the moment that we want to build and for this it seems to be ready. We also have a lot of ideas in our minds that could be implemented. For the moment our focus is on realizing our use case and once we achieved that, we want to make a first release of the source code.

Here is what already works:

  • Creating value stores and storing data in raw format
  • Compacting raw data to more efficient storage formats
  • Extracting data from the storage using a CSV format
  • Query before and after the query region

Things that we need to do first:

  • We do want to compress doubles depending in their value. Since this will influence the storage format, we do want to make this as soon as possible.
  • Create a build system. Building from the IDE is ok for some time. But not for long.

Next:

  • Create a HTTP based collector framework with buffering
  • Implement an Eclipse SCADA collector module
  • REST API
  • Extend the Native API
  • Lots more…

Peek:

Haystack data exported to Eclipse Birt using the HTTP CSV servlet.

Haystack data exported to Eclipse Birt using the HTTP CSV servlet.

A simple Eclipse RAP based Query UI

A simple Eclipse RAP based Query UI

The output of the CSV export servlet

The output of the CSV export servlet

This entry was posted in News. Bookmark the permalink.

2 Responses to Big data, a new historian

  1. Buddy says:

    Hi,

    this is a very good idea and we also thought about implementing a historian on base of
    hadoop. But as implementing time series on to of hadoop has been thought of quite a couple of times, why fiddle with the basics and reinvent the wheel? Wouldn’t it be a better starting point
    basing it on OpenTSDB? It already implements a time series, does re-sampling and also got some
    stats functions.

    Cheers,
    Matt

    • Jens Reimann says:

      Hi,

      we already evaluated it, and there are some flaws in that [1]. Also there is kairosdb, which also does re-implement OpenTSDB due to the same reasons. But they are using Cassandra as backend for sub-second time resolution. At the moment we achieve a higher throughput than kairosdb.

      [1] https://wiki.eclipse.org/EclipseSCADA/Plan/Haystack

Leave a Reply

Your email address will not be published. Required fields are marked *