Aggregation functions are used in distributed environments to make system-wide information locally available in the nodes of a network. The computation of different aggregation functions, e.g., summation, average, maximum etc., in large-scale distributed systems is challenging and crucial for a wide range of applications. This is especially the case when the input values of these functions dynamically change during system runtime. Related approaches of decentralized aggregation are function-dependent, interaction-dependent, assume static values or cannot always tolerate duplicates and continuously changing information.


This paper introduces DIAS, the Dynamic Intelligent Aggregation Service. DIAS is an agent-based middleware that addresses these issues with a holistic approach: an efficient availability of the distributed information in every node of the network that enables the simultaneous computation of almost any aggregation function. Such an abstraction initially requires a significant communication and storage cost and has a rather large overhead. These issues are resolved by introducing an implicit local representation and storage of the explicit distributed information: aggregation memberships in bloom filters.


The performance impact of bloom filters in DIAS is critical for its applicability as it compensates and reduces the initial high communication and storage required for such an abstraction.


Experimental evaluation under various aggregation and resource-constrained settings shows that DIAS is an efficient and accurate decentralized aggregation service.


The pervasiveness of Internet of Things devices in techno-socio-economic domains such as Smart Cities and Smart Grids results in a massive scale of data about our society. Decision-making by system operators or policy-makers requires a sophisticated understanding of these data with real-time data analytics methods. However, common data analytics methods often serve exclusively corporate and commercial interests and result in privacy-intrusion, surveillance, profiling and discriminatory actions. This paper illustrates an alternative data analytics approach that relies on participatory citizens to contribute Internet of Things data and crowdsourced computational resources in order to compute aggregation functions in a collective fashion. This democratization calls for a fully decentralized and privacypreserving system design with which a local data management mechanism implemented in smart phones can guarantee highly accurate computations under highly dynamic data streams. Experimental evaluation with real-world Smart Grid data illustrates the performance trade-offs and shows how they can be managed in an automated and empirical way using decision trees.


The feasibility of large-scale decentralized networks for local computations, as an alternative to big data systems that are often privacy-intrusive, expensive and serve exclusively corporate interests, is usually questioned by network dynamics such as node leaves, failures and rejoins in the network. This is especially the case when decentralized computations performed in a network, such as the estimation of aggregation functions, e.g. summation, are linked to the actual nodes connected in the network, for instance, counting the sum using input values from only connected nodes. Reverse computations are required to maintain a high aggregation accuracy when nodes leave or fail. This paper introduces an autonomic agent-based model for highly dynamic self-corrective networks using decentralized reverse computations. The model is generic and equips the nodes with the capability to disseminate connectivity status updates in the network. Highly resilient agents to the dynamic network migrate to remote nodes and orchestrate reverse computations for each node leave or failure. In contrast to related work, no other computational resources or redundancy are introduced. The self-corrective model is experimentally evaluated using real- world data from a smart grid pilot project under highly dynamic network adjustments that correspond to catastrophic events with up to 50% of the nodes leaving the network. The model is highly agile and modular and is applied to the large-scale decentralized aggregation network of DIAS, the Dynamic Intelligent Aggrega- tion Service, without major structural changes in its design and operations. Results confirm the outstanding improvement in the aggregation accuracy when self-corrective actions are employed with a minimal increase in communication overhead.


The Internet of Things empowers citizens to interconnect their devices, such as smart phones, into large-scale participatory decentralized networks, which they can use to make real-time collective measurements as public good, for instance, crowd-sourcing the monitoring of traffic in a city. This approach is an alternative to big data analytics systems that are often expensive to access, privacy-intrusive and allow discriminatory and profiling actions over citizens’ data. On the contrary, large-scale decentralized networks are complex to manage and collective measurements, i.e. computations of aggregation functions, need to encounter several dynamics such as continuously changing input data streams and highly varying temporal demand for access to the collective measurements. This paper proposes a highly reactive self-adaptation model to tackle the challenge of dynamic computational demand in large-scale decentralized in-network aggregation. The self-adaptation process makes nodes self-aware about other nodes that join and leave the network and therefore it makes them capable of self-orchestrating the communication to improve accuracy and minimize communication cost. The model is simple, yet agile. This is shown when applied in DIAS, the Dynamic Intelligent Aggregation Service without introducing architectural changes. Evaluation using data from a real-world smart grid pilot project as well as extreme demand profiles that scale up and down the demand 50% on average confirm the cost-effectiveness of in-network aggregation empowered by selfadaptation. The findings are confirmed both in simulation and a large-scale live deployment in a cluster infrastructure with 3000 independent Java virtual machines each running a DIAS node. Overall, the results encourage new promising pathways towards the broader adoption of self-adaptive participatory data analytics in large-scale decentralized networks.



The design and management of networked systems that are large-scale and decentralized is challenging. These systems are usually organized in virtual networks: the overlay networks. An overlay network lies at the application-level and on top of physical or other overlay networks. Overlay networks implement complex application and organizational functionality not supported by underlying network services. This integration and design approach results in low abstraction, modularity and reconfigurability of applications that are based on overlay networks. In contrast to this practice, this thesis introduces the conceptual architecture of ASMA, the Adaptive Self-organization in a Multi-level Architecture. ASMA is the main contribution of this thesis and is designed for building middleware systems of overlay networks that provide generic capabilities to different distributed applications: the overlay services. The abstraction, modularity and reconfigurability of ASMA is achieved by its multilevel design approach. Three conceptually defined levels of overlay networks and their interactions provide discovery, structuring and coordination of system entities without a centralized management authority. The interactions between the three levels of ASMA form feedback loops that improve the quality of an overlay service incrementally. This thesis shows that a few lines of algorithmic expressions defined by ASMA are adequate to realize the complex system functionality of two introduced overlay services: (i) AETOS, the Adaptive Epidemic Tree Overlay Service and (ii) DIAS, the Dynamic Intelligent Aggregation Service. Both overlay services advance the state of the art by providing two generic application capabilities. AETOS builds and maintains overlay networks organized in tree topologies that meet different application criteria. DIAS computes different aggregation functions over a set of dynamically changing values distributed in an overlay network. Both overlay services of ASMA provide a proof-of-concept about their higher abstraction, modularity and reconfigurability at the cost of higher communication overhead compared to related work. AETOS provides self-organization of tree topologies with the graph properties of degree-bounding, ordering, balancing and completeness. AETOS performs a gossip-based discovery of nodes in a network. These nodes, ranked according to application criteria, are clustered based on their proximity computed by their ranking distance. Clustering of nodes as candidate parents and children provides a more cost-effective search space compared to random searching. Bidirectional links are negotiated and established with these parents and children based on ‘request’, ‘acknowledgment’, ‘rejection’ and ‘removal’ interactions. Different tree topologies can be self-organized by adopting adaptation strategies that hide complex clustering and selection configurations. Experimental evaluation illustrates the performance trade-offs and reconfigurability of AETOS in various experimental settings. This evaluation concludes that AETOS is a generic and flexible overlay service for the self-organization of tree topologies. DIAS makes aggregates, such as average, summation, maximum, etc., locally available in every node of an overlay network. In contrast to other related methodologies, aggregation in DIAS is function-independent, routing-independent and dynamic as aggregates are adapted if distributed input values change during runtime. DIAS achieves this abstraction and flexibility by introducing the concept of aggregation memberships. An aggregation membership provides historic information about a computed aggregation value by indicating if this value is new, outdated or duplicate. This distinction guarantees accurate computation of aggregates. It also provides two adaptation strategies based on which new or outdated aggregation values may be preferred in computations of aggregates. An explicit storage of aggregation memberships is not a scalable and decentralized aggregation approach. Nevertheless, DIAS stores aggregation memberships in probabilistic data structures: the bloom filters. A bloom filter provides large space savings at the cost of false positives. A distributed consistency mechanism is introduced to detect false positives and, therefore, prevent inaccuracies in the computations of aggregates. Experimental evaluation confirms the high accuracy of DIAS under different experimental settings and performance trade-offs. The applicability of AETOS and DIAS is studied in the domain of the Smart Power Grid. More specifically, two decentralized demand-side energy management mechanisms are introduced based on these overlay services: (i) EPOS, the Energy Plan Overlay Selfstabilization and (ii) ALMA, the Adaptive Load Adjustment by Aggregation. EPOS and ALMA are the contributions of this thesis in the application domain of the Smart Power Grid. EPOS coordinates the energy consumption of a large number of thermostatically controlled devices such as water heaters, refrigerators etc., to achieve the global system objectives. More specifically, EPOS performs self-stabilization by eliminating the oscillations and power peaks in the total energy consumption if and when it is required. Thermostatic devices are controlled by communicating software agents that generate, select and execute operational plans for their devices without direct involvement and impact on consumers. EPOS achieves energy self-stabilization by using AETOS to selforganize agents in a tree overlay network within which they perform a decentralized aggregation and coordinated decision-making of their local energy consumption. Experimental evaluation using synthetic data shows the high energy stabilization achieved in various experimental settings. ALMA complements EPOS under extreme conditions in which the Smart Power Grid requires an actual decrease or increase in the power demand due to failures or excessive micro-generation. ALMA achieves adjustments of aggregate energy consumption with possible demand options of local energy consumption, representing a wide range of comfort and economy levels, that can be pre-defined and dynamically selected by incentivized consumers. Aggregate information about power demand can be made locally available to consumers by the DIAS overlay service. The feasibility of ALMA is evaluated analytically using data from an operational Smart Power Grid: the Olympic Peninsula Smart Grid Demonstration Project. In conclusion, this thesis indicates that introducing decentralized computing systems in an information era expanding to new critical application domains, such as the Smart Power Grid, is a promising endeavor towards more sustainable development and a resource-based economy in future societies.