Architecture

DIAS, the Dynamic Intelligent Aggregation Service, is a fully decentralized multi-agent networked system for lightweight data analytics, such the the computation of aggregation functions. Its main components and novel features are illustrated below.

The Data Model of Possible States

Each DIAS node generates a finite number of possible states that are real values representing an application parameter, for example, the low, medium, high energy profiles of a household or the number of stars with which a movie is ranked. At each time, one of these possible states is the selected state. This is the one that is provided as input in the aggregation functions. Although having a finite number of possible states is a fit for movie ranking, other applications may operate with highly dynamic data streams that would make a fully decentralized aggregation infeasible. Such applications may involve Internet of Things sensors such as smart phones and pervasive devices. In this case, the possible states can be extracted via data mining and machine learning techniques, for instance, a k-means clustering with the centroids of the clusters representing the possible states.

Multi-agent Model

DIAS has two main software actors: (i) the disseminator and (ii) the aggregator. The disseminator keeps the possible states and the selected state. It is responsible for disseminating its selected state in aggregators of the network. During runtime, a disseminator may change its selected state. The aggregator collects the selected states of all disseminators in the network and computes a broad range of aggregation functions such as the summation, average, maximum, minimum, top-k and other. Each node in a DIAS network may be equipped with an disseminator, an aggregator or both, depending on the application domain and the available network resources, such as bandwidth.

Peer-to-peer Communication

DIAS performs peer-to-peer communication at two levels. At the the bottom level lies the peer sampling service, a gossip-based communication protocol. It is used as a fully decentralized lookup service by providing information about the available agents online in the system. Moreover, it creates a highly connected and dynamic overlay network that can be used to rapidly disseminate the aggregation information in the network. DIAS publishes aggregators in the peer sampling service and disseminators receive this information. Moreover, other information such as connectivity status of nodes, is disseminated in order to deal with join and leave scenarios.

At the top level, DIAS performs the actual aggregation. This involves period aggregation sessions that are two-message information exchange between a disseminator and an aggregator. A successful aggregation session results in an exploitation or an update. The exploitation is an aggregation session performed between an aggregator and a disseminator for a first time, therefore new values are counted as input in the aggregation functions. The update is a repeated aggregation session with the purpose to update the computed values of the aggregation functions with a new selected state, different that an earlier counted one.

Intelligent Distributed Memory System

The memory system of DIAS is a distributed collection of probabilistic data structures, the bloom filters. A bloom filter stores memberships of elements. A query of an element to a bloom filters returns true or false with a probability of false positives that depends on the size of the bloom filter, the number of elements hashed and the hash functions used. A simple bloom filter allows insertions of elements but not removals, in contrast to a counting bloom filters that can perform removals as well.

The goal of the DIAS memory system is to keep historic information in the disseminators and aggregators about (i) which interactions have been performed between them and (ii) what states have been exchanged between them.

Information about interactions are stored in two simple bloom filters:

Aggregator Membership in Disseminator (AMD)

This bloom filter of a disseminator stores the IDs of aggregators with which aggregation sessions have been performed.

Disseminator Membership in Aggregator (DMA)
This bloom filter of an aggregator stores the IDs of disseminators with which aggregation sessions have been performed.

Information about the exchanged states are stored in two counting bloom filters:

Aggregator Membership in Selected state (SMAs)

These bloom filters of a disseminator, one for each possible state, store the IDs of aggregators that have aggregated a certain possible state as the selected one.

Aggregator Membership in Selected state (SMAs)

This bloom filter of an aggregator stores the IDs of selected states that have been aggregated as input in the aggregation functions

The memory system of DIAS is used to perform two tasks:

1. ClassifIcation of aggregators sampled in the aggregation pool as (i) exploited, (ii) unexploited and (iii) outdated. Unexploited aggregators result in aggregation sessions with an exploitation outcome, whereas, outdated ones in updates. Classification is performed by the disseminators that initiate aggregation sessions.

2. Validation of the classification outcome at the side of the aggregator to detect false positives or decrease their probability as well as catch inconsistencies by network delays and failures.

More information about the classification and validation process can be found here.

Network and Aggregation Dynamics

DIAS can deal with a dynamic number of participating disseminators and aggregators. The accuracy of the computed aggregation functions is challenged when a disseminator leaves the network. Their selected states needs to be disaggregated from the computed aggregation functions.

When a DIAS nodes leaves the network, the disseminator migrates to another online node from where it undertakes self-corrective actions of the computed aggregated by establishing special aggregation sessions for this purpose. The migrated disseminator is capable of detecting when the original DIAS node joins back by using the peer sampling service. This process can be triggered either proactively, in case of failures, or reactively, in case of controlled leaves.