A Spatiotemporal Data Summarization Paradigm for Real-time Operation of Smart Grid

May 22, 2017 | Autor: Adnan Anwar | Categoría: Algorithms, Distributed Computing, Data Mining, Database Systems, Energy, Smart Grid, Smart Metering, Big Data, Data Summarization, Big Data Analytics, Data Aggregation, Electrical Power Distribution, Smart Grid, Smart Metering, Big Data, Data Summarization, Big Data Analytics, Data Aggregation, Electrical Power Distribution

Share Embed

Laporkan tautan ini

Descripción

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015

1

A Spatiotemporal Data Summarization Paradigm for Real-time Operation of Smart Grid Zubair Shah, Adnan Anwar, Abdun Naser Mahmood, Member, IEEE, Zahir Tari† , Member, IEEE, and Albert Y. Zomaya† , Fellow, IEEE Abstract—In a smart grid distribution management system, operation, planning, forecasting and decision making relies on demand-side management functions, which require real-time smart grid data. This data has significant dollar value because it is extremely useful for efficient control and intelligent prediction of the energy consumption, and expert management of residential and commercial load. However, the huge amount of smart grid data generated at a very high velocity poses a number of challenges for its accurate and efficient management. Due to such explosive volume of smart grid data, the utility companies have a huge demand for efficient summarization techniques to mine interesting patterns and extracting useful and actionable intelligence. Research from various domains has shown that summarization can significantly improve the scalability and efficiency of various data analytic tasks including transactional database mining and data streams mining, intrusions and anomalies detection, network monitoring, point of sales data mining and information retrieval. Therefore, in this paper, we have developed a summarization paradigm (a set of algorithms, data structures, and query mechanisms) that enables the utility company to accurately infer various energy consumption patterns in real-time by automatic monitoring of smart grid data using significantly less computational resources. The proposed summarization paradigm is suitable for processing spatiotemporal smart grid data streams and can provide answers in real-time to various smart grid applications such as demand-side management, direct load control, smart pricing and Volt-VAr control. Both theoretical bounds and experimental evaluations are presented, which shows that the memory required for the proposed data structure grows linearly for the first 52 weeks, but interestingly, after the first year, the memory growth is negligible. The experimental analysis also shows that the proposed method can process around 4 million smart meter readings every second or 120 million readings every minute. The comparison of the proposed approach with widely commercially used Database Management System (DBMS) shows that in terms of update time the proposed approach is about 200 times faster than the DBMS, and in terms of query time the proposed approach is about 340 times faster than DBMS. This shows that the proposed method outperforms DBMS in terms of both update cost and query cost. Index Terms—Big Data, AMI, Smart Meters, Smart Grid, Data Streams Summarization

F

1

I NTRODUCTION

U

NLIKE traditional power grid, smart grid offers new data-centric services to consumers by analysing a large amount of data collected through distributed measurement devices including power measurement units, substations, generators, energy storage systems and smart meters [1], [2]. Power companies implement various functions in the smart meter including Demand Side Management (DSM), and Smart Pricing (SP) to monitor energy consumption patterns of the consumers for more cost-effective and reliable energy management including the shaping of residential or commercial loads [3], [4]. For instance, high forecasting accuracy of such energy consumption patterns enables the utilities to estimate the electricity cost and properly set the electricity prices, capturing the interdependency between the energy demand and the prices [5]. An example of

• • • • •

Z. Shah and A. Anwar are with University of New South Wales, Canberra, Australia. A.N. Mahmood is with the Department of Computer Science,School of Engineering and Mathematical Sciences, La Trobe University, Melbourne, Australia. Z. Tari is with the School of Science, Royal Melbourne Institute of Technology, Melbourne, Australia. A. Y. Zomaya is with the Center for Distributed and High Performance Computing, School of Information Technologies, University of Sydney, Australia. † This work is partially supported by the ARC Linkage Project LP160100406.

this interdependency is load-synchronization, where a large portion of the load is shifted from hours of high prices to hours of low prices, without significantly reducing the peakto-average ratio [6]. Load-synchronization can be performed using a DSM function such as Direct Load Control (DLC), where utilities are able to access household smart meter readings and based on customer consent, they can remotely turn on and off household electrical appliances, such as air conditioner, washing machine and pumps. DLC gives more power to the utilities to shape the peak demand load (saving expensive peak hour generation) and offers customers price incentives [7] (in the form of off-peak rates, real-time pricing, time-of-use pricing and critical peak pricing) and through preprogrammed reduction of household consumption. However, improper monitoring and control of the power flow can increase the possibility of failure and destabilisation of a smart grid. It has been reported in [8], [9], [10], that in order to perform the data dependent tasks such as DSM, DLC and SP, a huge amount of smart grid data need to be collected in real-time and analyzed at multiple spatiotemporal levels or hierarchies. These tasks require intelligent real-time monitoring techniques in order to be capable of detecting abnormal events, finding their location and causes, and predicting and eliminating faults before they happen [11]. In order to deal with the high level of uncertainties in smart grid data,

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015

2 Proposed Summarization Scheme

Re adi

ngs

Data Archived in DBMS

s

ing ead

Da

ta

Rea ding

D at

a

R

Historic raw data for (forecasting and long term planning)

s

Instantaneous and real-time data for (operations and maintenance)

Near real-time and recent past data for (DSM, DLC, SP, Volt-VAr)

In-large view of proposed summarization scheme

Stream Processing Engine Flat to Hierarchical Conversion (Algorithm 4)

Stream of Smart Grid (Flat) Data

Real-time Query Figure 5 e.g f(Q)[{ +,-,* }R]

Query Engine

(Figure 5)

In-Memory Hierarchical Summary of Smart Grid Data (Algorithm 3)

Approximate Answer

Decision Making e.g., DSM, DLC, SP, Volt-VAr

Fig. 1: (Upper portion) A pictorial view of data collection and management scheme in a smart grid. (Lower portion) An in-large view of the proposed data stream summarization model showing the different steps in the summarization process. its extreme size, and the need for real-time learning and decision making, the smart grid highly demands big data management, advanced data analytic techniques and powerful monitoring techniques [12]. Recently nearly all major companies, including IT giants such as IBM and Oracle, grid giants like Siemens, General Electric, eMeter, Schneider Electric, ABB and startups like Opower, and AutoGrid all have started their big data projects and are competing to bring a set of IT tools that are largely new to the utility industry [13]. Since the smart grid data are usually assumed to arrive from a variety of heterogeneous data sources; at high velocities; and in large volumes; therefore, given such high dynamicity, heterogeneity and volume of real-time data, one of the main challenges facing by today’s big data community is how to improve scalability of analytic operations and reduce errors [14]. Big data offers tremendous potential for smart grid operations through extracting useful intelligence and actionable rules, which is often accomplished through summarization, such as sampling, synopses and sketches, dimensionality reduction, and histograms. Instead of working on large and complex raw data directly, data analytic tasks such as DSM, DLC and SP can make use of carefully constructed summaries to improve their scalability and efficiency. For example, these summaries maintain the

statistical properties of data while allowing fast real-time updates and query [15]. The real-time monitoring requirements of the smart grid data impose a number of challenges for the operations of smart grid applications. First, the volume of data from smart grid is huge and is growing annually at an explosive rate worldwide [16]. For instance, a 2009 report shows that a typical US utility needs to process data from around 2 million customers every day [17]; this figure has grown significantly in the last few years and yet is at the beginning of the growth. For example, by 2020, the number of installed smart meters in Europe will reach 240 million while China is forecasted to install about 400 million smart meters by that date [13]. The huge volume not only creates problems for storing the data in remote devices, but it also makes the scalable analysis of the data extremely challenging. Second, the smart grid data arrives in high velocity; it arrives in a continuous, unbounded and unpredictable streaming fashion. The continuous nature of data requires data structures and algorithms with efficient access and update times to cope up with the stream speed and to provide real-time response to the users [18]. Finally, although smart meters data is generally flat (e.g., smart meter id, location, consumption per minute), many decision-making functions in the smart grid need to analyse and aggregates data at multiple nodes

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015

of hierarchy (e.g., Hourly consumption of all smart meters in a Suburb). Analysis of such hierarchical data is challenging because a single data item must be represented, updated and queried at multiple nodes of the hierarchy. Existing techniques that deal with each of these issues in other domains cannot be practically applied in the smart grid context directly. For example, the stream processing and database community [19] (for a detailed survey see also [20]) has extensively studied the volume and velocity issues of big data. However, existing data stream processing approaches including IBM InfoSphere1 , Spark2 , Flink3 and Storm4 may not be suitable for smart grid data summarization due to the following reasons. First, the majority of these approaches are based on frequent item mining techniques [20], which maintain certain important (e.g., frequently occurring items) data elements in the summary, and discard rest of the data. Such approaches may not be appropriate (e.g., discarding data of some smart meters) for certain utility functions on smart grid data such as billing, and state estimation (analysing power characteristics of the grid), where it is not feasible to discard data. Second, these approaches cannot capture the spatiotemporal properties of the data. For instance, smart grid data is often collected from individual smart meters located at customer premises; however, for effective and smooth operation of a smart grid, this flat power consumption data needs to be analyzed and aggregated over various geographical location (street, suburb, city), over a period of time (minute, hour, days, and months), or even both. Such special hierarchical analysis of the smart grid data cannot be supported by existing conventional database management systems, time series database management systems, big data and data stream processing tools including IBM InfoSphere, Spark and Storm due to the lack of support for arbitrary hierarchical and temporal analysis of data. 1.1

The Sketch of the Proposed Approach

This section explains the proposed approach to summarize smart grid data. Fig. 1 shows that the smart meters located at each customer premises get electricity from distribution lines and feed household consumption data in a form of continuous stream to a central computer using smart grid communication technology (including power line communication, wireless communication or wired connections). There are various types of data which can be categorised based on time of generation of this data and the location where this data can be used. This data can be divided into the following three categories: (i) Instantaneous data that includes current, voltage and power measurements, which are collected in the distribution lines. This type of data is used for operation and maintenance such as fault detection, power quality monitoring and condition monitoring. (ii) Real-time and recent past data that includes Advanced Metering Infrastructure (AMI) data such as smart meter readings, timestamp and ID of the smart meter, which is collected from smart meters located at customer premises 1. https://www-01.ibm.com/software/data/infosphere/ 2. http://spark.apache.org/ 3. http://flink.apache.org 4. http://storm.apache.org/

3

using smart grid communication technologies. It is essential to make sense of this data and mining interesting user consumption patterns from it, preferably in real-time, to help organisations and to make better decisions. Operations and decision making in smart grids largely depends on timely and reliable grid measurement data and consequent mined patterns from such data. For instance, the patterns that are mined from such data can be used for smart grid applications such as DSM, DLC, SP and Volt-VAr control etc. (iii) Historic data is again the same AMI data collected from user premises using smart grid communication technologies and is often archived in the data warehouses. This data is often stored for a long period of time and is used for both short and long term forecasting and long-term strategic planning using commercially available sophisticated forecasting and planning tools. This paper addresses the challenges of analysing only the real-time and recent past data (i.e., type (ii) data). From an algorithmic point of view, such data handling requires extremely fast updating mechanisms including data structures and algorithms that are capable of processing a large amount of data in real-time. To address these issues, this paper proposes a novel summarization paradigm that is suitable for processing spatiotemporal smart grid data streams, which is inspired by our previous work on network monitoring [19]. The proposed approach fits between instantaneous data generation and historic data archival of a smart grid data (see Fig. 1). Refer to the upper portion of the Fig. 1, which shows that a stream processing engine (whose in large view is illustrated in the lower portion of the Fig. 1) receives high-speed smart grid data stream (which is also received by DBMS for long term archival) and summarizes this stream using proposed approach. To explain the concept further, refer to the lower portion of the Fig. 1, which illustrates several components of the proposed stream computational model including stream processing engine (see Section 3.2 and Algorithm 4), an efficient lattice data structure (see Section 3.1 and Algorithm 3) and a query engine (Section 3.3). The stream processing engine receives high-speed smart grid data stream and summarizes it using Lattice Data Structure (LDS). A special query engine is developed which can use in-memory summary that exists in various nodes of the LDS to answer real-time queries. The query engine has its own query language suitable for efficiently querying the LDS containing the summary of recent smart grid data. This is very useful for dealing with the challenges of smart grid streaming data, and to provide real-time answers to various smart grid applications such as DSM, DLC, SP and Volt-VAr control. The proposed stream processing engine, a special data structure, algorithms and query mechanisms have theoretical performance guarantees. 1.2

Our Contributions

The contributions of this paper are as follows: (1) An efficient data structure (i.e., LDS) shown in Fig. 2 is designed, which offers the theoretical guarantee of accuracy and efficiency and is capable of handling a large amount of continuously arriving real-time data. The LDS consists of nodes that are arranged in the form of a lattice, where each

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015

node contains a Cylindrical shape Data Structure (CDS), see Fig. 3. Each CDS is a first-in-first-out circular buffer, which is constructed from N (user supplied parameter) independent circular Time-queue Data Structures (TDSs) of fixed length, see Fig. 4. The intuition behind a CDS is to build a summary unit that can store the last C numerical values from N objects (e.g., smart meter readings) that generate a continuous stream of numerical values. Values older than the last C readings are deleted from the current CDS and entered in an aggregated form in its ancestor CDS. Hence, the central idea of the LDS is to progressively maintain aggregated information in CDS at higher levels of the lattice nodes; it stores recent data in granular form in CDS at lower levels of the lattice nodes and the historic data in an aggregated (or abstracted) form in CDS at higher levels of the lattice nodes. This can help to reduce space requirements and allows to pick the right level of granularity for the desired summary. (2) A summarization algorithm (Section 3.2) is developed that can accurately compute and maintain a spatiotemporal summary using the proposed LDS. Theoretical analysis (Section 4) shows that the algorithm is very efficient in terms of data storage and response to queries; it updates a single item in O(1) time and responds to a point query in O(1) time. Importantly, the memory requirement of the proposed technique is independent of the size of the data (see Lemma ~ and 1), and only depends on user-supplied parameters Ψ ~ Φ. (3) A query engine (with its own simple query language) is developed and implemented that supports a wide range of spatiotemporal queries on the summarized data (Section 3.3); and apart from the theoretical analysis of the algorithm, experimental tests (Section 5) and comparison with state-ofthe-art data management tool highlight the efficiency of the proposed technique. 1.3

Road Map

The remainder of the paper is organised as follows: Section 2 provides insight into existing work in the smart grid, big data and data stream processing literature. Section 3 describes the problem, its solution and various types of data analytic tasks required by a utility company. Section 4 provides a concise theoretical analysis of the proposed approach. Section 5 shows experimental analysis and comparisons with a state-of-the-art data management tool. The concluding remarks are given in Section 6.

2

E XISTING W ORK

A great deal of work has been done on the smart grid; focusing on different aspects of the transformation of the traditional power system to a modern smart grid. For example, different approaches have been proposed for reliable and effective smart grid operations including privacy [21], security [22], communication models and protocols design [23], long-term forecasting [24], SCADA simulation framework [25], fault-tolerant fine-grained data access [26] and demand management [7]. However, practical data handling for smart grid applications such as DSM, DLC, Volt -VAr control and smart and real-time pricing have not been studied extensively. Information management of a smart grid usually consists of information gathering, information processing, and

4

information storing. Since information gathering is a challenging task because it involves data collection from heterogeneous devices at different locations; therefore, a number of communication architectures have been proposed, which can be found in some recent surveys [27]. The heterogeneity of smart grid devices also poses challenges for information processing because data can be received from a number of different devices. Fortunately, this issue has recently been addressed by a proposal of standardisation of data structures used in smart grid applications to solve interoperability issue [28]. However, to process a huge amount of smart grid data is still a big challenge. For information storing the most common standard is database management system (DBMS) [29]. The famous DBMS tools include Oracle, Microsoft SQL Server, IBM DB2 and Informix, SAP Sybase, and MySQL. In recent times, parallel and distributed file systems such as Apache Hadoop and Google MapReduce are also gaining popularity [29], [30], but not yet deployed for realtime smart grid applications. In practice, information is sampled and processed for real-time operations, and raw data is archived in the data warehouse for the future use [31]. In the context of industrial sensing network, considering a big data environment, the data management issue is extremely challenging in terms of memory and time. For example, it is not practical to store all sampled sensor information due to memory limitations. Moreover, due to unprecedented granularity and volume of a huge number of sensor data, another challenge is to process this data in real or near realtime. A number of smart grid applications require response time ranging from a few seconds to a few minutes, e.g., DSM, DLC, SP and coordinated Volt-VAr management. For such applications commonly used data management tools including DBMS are not suitable. Cloud computing may also appear to meet some of the smart grid information management demands; therefore, many successful models based on cloud computing have been proposed for the operations of smart grid applications [32], [33], [34]. Cloud computing based models for smart grid applications have many advantages including energy saving, cost saving, agility, scalability, and flexibility since computational resources are used on demand [35]. However, researchers have also pointed out significant problems in cloud computing based models for smart grid applications. First, the application of cloud computing on smart grid data processing has big challenges of confidentiality and privacy [36], [37]. Second, from the electrical companies’ perspective, security is still a challenging problem, since the hackers of systems located in the cloud cannot easily be traced [38]. Third, a related major issue is the recovery of data in the case of a possible failure of the cloud service [39]. Finally, currently, there have been no cloud computing based models proposed which can support real-time response needed for the smart grid applications.

3

P ROPOSED M ETHODOLOGY

See Fig 1, which depicts smart grid data collection mechanism. Consider that each smart meter sends a tuple τ that contains three entries, < e, t, v >, where e is the unique identifier of the smart meter, t is the timestamp, and v is the smart meter reading at time instance t. Both e and t are

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015

hierarchical attributes and their hierarchies are shown and explained in Fig 2. We assume that the sensors in the smart meters feed power consumption readings each second to a buffer in the proposed LDS that stores the summarized information. At the end of each minute (i.e., 60 seconds) the summarizer algorithm (see Section 3.2) collects the readings (i.e., tuples τ [ ]), processes and stores them in the LDS summary structure. We first describe the LDS and then explain the summarizer algorithm. 3.1

Lattice Data Structure

We have designed a mathematical lattice-like hierarchical data structure (i.e., LDS) shown in Fig 2, which offers theoretical guarantees of accuracy and efficiency for the Summarizer algorithm. The central idea of the LDS is to progressively maintain aggregated information at higher levels of the hierarchical structure. This hierarchical structure is derived from hierarchical attributes that are of interest to the user, such as time (e.g., seconds, minutes, hour, day, week, year), and geographic location (e.g., smart meter, street, suburb, district, city). Since, recent data from smart meters is critical for real-time smart grid operations; therefore, the LDS is designed to store recent data in granular form at the lower level(s) of the lattice and the old data in an aggregated (or abstracted) form at the higher level(s) of the lattice. The benefit of this gradual abstraction process is reduced space requirements, and ability to pick the right level of granularity for the desired summary (e.g., consumption of smart meters in 1 hour, or consumption of energy of an entire suburb). The hierarchical structure of LDS is shown in Fig. 2, which consists of nodes that are labelled according to time and location dimensions. These nodes in the lattice contain pointers to their ancestor nodes, where each node contain a Cylindrical shape Data Structure (CDS) (shown three of them in Fig. 3). A CDS is constructed from N (which is user supplied parameter) independent circular queues of fixed length (illustrated in Fig 4) that we call Time-queue Data Structure (TDS). A TDS is simply a FIFO circular buffer, where the new data item overwrites the data item in the next slot. Each time a data item is added to a slot at the head of the TDS, the data from the slot at the tail of the TDS is removed, assuming the queue is full. Algorithm 1 provides the pseudo code for creating TDS data structure with its enqueue functionality. Insertion into CDS is also performed using the enqueue operation (See Algorithm 2), which is an interface to the enqueue function of TDS. The enqueue operation stores a value ∆ in the TDS located at index j of the CDS. Algorithm 2 provides the pseudo code for creating a node of the LDS that consists of CDS, and pointers to other CDS in the nodes of the lattice (e.g., with an ancestor-descendant relationship). In each TDS, P is an integer pointer that points to the next buffer slot, which is updated after each enqueue operation according to P = (P + 1) M od C , where C is a user supplied parameter for setting the capacity of TDS; therefore, when only C number of insertion happens in a TDS, then all those C insertions are aggregated and inserted in a single slot of another upper level TDS in the lattice. As a general rule, the parameter C should be set according

5

to the time dimension. For example, for the node “HourStreet” each TDS should be set as 24 because there are 24 hours in a day. However, if the utility requires storing more hourly data in the “Hour-Street” node; for instance, for the last 100 hours; the C value can simply be set to 100 for nodes with hour labels. Since, the ancestor node of “Hour-Street” is “Day-Street” (along the time dimension with solid lines); therefore, in this case when 24 insertion happens in a TDS of “Hour-Street” node, then all those 24 (rather than 100) insertions are aggregated and inserted into a single slot of an upper level TDS at the “Day-Street” node. In the TDS during its enqueue operation, a sum is returned when C insertions happen, otherwise, 0 is returned. The return value is, in fact, a signal to Summarizer algorithm (Algorithm 4) and is used to decide when to perform aggregation. When a non-zero sum is returned, aggregation is performed, otherwise, no aggregation happens (see lines 11 to 20 of Algorithm 4). The intuition behind the CDS is to build a summary unit that can store the last C numerical values from N objects (e.g., smart meter readings) that generate a continuous stream of numerical values. Values older than the last C readings are deleted from the current CDS and entered in an aggregated form in its ancestor CDS. This process continues for as many levels as required by the hierarchy (e.g., time and location). Therefore, the hierarchical structure of the LDS is exploited to maintain a summary of various granularity in the form of a lattice. This method of summarization also ensures that no data is discarded or lost, but kept in an aggregated form. The number Q of nodes η in the lattice can be calculated using η = i∈{|Ψ|,| ~ Φ|} ~ (1 + hi ), where hi is the height of ~ the hierarchy of attribute i (see Figure 2 for an example), Ψ ~ and Φ are vectors to represent time and location hierarchies. The pseudo code to create the LDS is given in Algorithm 3, ~ and Φ ~ vectors to set the capacity C ∈ Ψ ~ and which needs Ψ ~ length N ∈ Φ of each CDS in the LDS nodes. A tradeoff between memory and query cost: The hierarchical structure of the LDS offers a trade-off between the amount of information stored in the LDS and the time required to answer queries based on the stored information. We refer to a level of the lattice as the set of nodes that are spread along the time hierarchy but all of them are at the same location hierarchy. For instance, in Fig 2 nodes {(0,0),(1,0)· · · (4,0)} are spread along time hierarchy, but all of them are at level 0 of location hierarchy, and we refer to these nodes as level 0 of the LDS. Similarly, nodes {(0,1),(1,1)· · · (4,1)} are referred to as level 1 of the LDS, and so on. It can be seen that the nodes at the higher level of the LDS hierarchy contain data from their descendant levels in aggregated form. For example, level 1 of the LDS contains the data of level 0 of the LDS, and level 2 of the LDS contains the data of level 1 of the LDS, and so on. Hence, higher levels can be removed from the LDS (reducing the full LDS to a partial LDS), and summaries of that levels of the LDS can be computed on demand e.g., at query time. As an example consider “Week-Street” node of the LDS in Fig 2, which contains the aggregated data from “WeekSM” node. If “Week-Street” node is removed from the LDS, then any query pertinent to this node can be answered from its descendant node “Week-SM”. In fact in the entire lattice

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015

any ancestor node can be computed from both of its descendants; therefore, a query pertinent to an ancestor node can be answered from its descendants as well but at the cost of delayed response due to time required to compute values equivalent to ancestor node. For example, “WeekStreet” can also be calculated from both of its descendants either “Week-SM” or “Day-Street”. The advantage of the full LDS is that, it provides a quick access to summarized information at the expense of extra space. The advantage of the partial LDS is that, it requires less space but is slower than the full LDS to answer queries. Hence, user can make a tradeoff between space and query response time according to available resources and requirements. Algorithm 1 Creating a TDS 1: Class TDS { 2: int C, P = 0 sum = 0; double T DS[C]; 3: function E NQUEUE(∆e ) 4: sum = sum + ∆e − T DS[P ]; 5: T DS[P ] = ∆e ; 6: P = (P + 1) M od C ; 7: if P == 0 then 8: Return sum; 9: else 10: Return 0; 11: end if 12: end function }

Algorithm 2 Creating a LDS Node with CDS 1: 2: 3: 4: 5: 6: 7: 8: 9: 10:

Class Node { int timeLvl, locLvl; CDS[ ]; N ode lef t, right; procedure I NIT CDS(C, N) Initialize CDS with N ; Initialize each T DS with C ; end procedure function E NQUEUE(∆e , index j ) sum = CDS[j].Enqueue(∆e ); Return sum; end function }

3.2

Smart Grid Data Summarization using LDS

This section provides details on how to use the LDS to efficiently summarize a large amount of continuously generated smart grid data in real-time. The proposed summarization scheme can best be understood by Fig 3 and Algorithm 4. A detail real world example is also provided at the end of this section. The main concept is that, at each node of the LDS, the algorithm associates a TDS to an entity ξ such as smart meter, street, suburb, district, or city. The algorithm maps each ξ to a positive integer j according to a simple one-to-one mapping, and then inserts the data of ξ to the slots of TDS located at index j of CDS. The algorithm receives input from buffer B in the form of a set of tuples τ [ ] (i.e., τ = < e, t, v >). Smart meters are continuously sending current readings (i.e., v ), however, we interest ∆e = vt−1 − vt in the LDS, which refers to consumption since the last reading. Line 4 and 5

6

Algorithm 3 Creating the LDS 1: Class Lattice

{

~ Φ ~ ; N ode[ ][ ] n; 2: int[ ]Ψ, 3: procedure B UILD L ATTICE( )

~ 4: for i = 0 to Φ.length do ~ 5: for j = 0 to Ψ.length do ~ , Φ[i]); ~ 6: n[i][j].INITCDS(Ψ[j] 7: n[i][j].locLvl = i; 8: n[i][j].timeLvl = j; 9: end for 10: end for ~ 11: for i = 0 to Φ.length − 1 do ~ 12: for j = 0 to Ψ.length − 1 do 13: if i = 0 then 14: n[i][j].right = n[i][j + 1]; 15: n[i][j].lef t = n[i + 1][j]; 16: else 17: n[i][j].lef t = n[i + 1][j]; 18: end if 19: end for 20: end for 21: end procedure }

of Algorithm 4 computes ∆e as follows: let prvM sur[j] be the previous reading of smart meter e (let e be mapped to an index j ), then ∆e is calculated by ∆e = v − prvM sur[j]. After computing ∆e , algorithm update prvM sur[j] = v , for computing ∆ in future. In this way algorithm computes the set Γ of all the ∆s, where index of ∆e in set Γ is j . Instead of separately keeping a record of time t in TDS, we associate the slot number of TDS with t, as follows. Given a tuple τ = (e, t, ∆e ), the algorithm computes the slot for ∆e using both e and t. The algorithm inserts each ∆e to the lowest node (i.e., node with label “MinSM” in Fig 2) of the LDS using Enqueue(∆e , j) at line 10 of Algorithm 4. If Enqueue() returns a non-zero value, it indicates that the TDS for storing consumption in minute has completed the full circle (e.g., 60 minutes), then the algorithm traverses up the left branch of the lattice (using lines 12-19) and inserts the returned value (i.e., sum of last 60 minutes) to the “Hour-SM” node. If insertion into “HourSM” node also returns a non-zero value, then the algorithm traverses up and inserts sum into ”Day-SM”, and so on. Next, the algorithm aggregates all the ∆s to street level, using Gen(Γ, hierarchy(l)), where Gen() is a function that takes a set of values Γ, and aggregates them to level l based on user supplied location hierarchy hierarchy(l). Let, ⇒ represent a belongs to relationship, e.g., ei ⇒ sj means smart meter ei belongs to street sj , then the algorithm takes as input a table (or a file) containing information about the smart meter’s location in terms of street, suburb and city. Thus, function Gen(Γ, hierarchy(l)) returns a set Γagg of all the aggregated ∆s, where index of ∆e ∈ Γagg is j . The algorithm then inserts the aggregated data to “MinStreet” node, exactly the same way as the insertion into “Min-SM” node explained earlier. It iteratively performs the aggregation to all the levels of location hierarchy by traversing the right branch (location hierarchy) of the lattice and inserts the aggregated data to it’s respective nodes.

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015

7 Year-City|[4,4]

Year-Dist|[4,3]

Year-Suburb|[4,2]

Year-Street|[4,1]

Year-SM|[4,0]

Week-City|[3,4]

Week-Dist|[3,3]

Week-Suburb|[3,2]

Week-Street|[3,1]

Week-SM|[3,0]

Day-SM|[2,0]

Day-Dist|[2,3]

Day-Suburb|[2,2]

Day-Street|[2,1]

Day-City|[2,4]

Hour-Dist|[1,3]

Hour-Suburb|[1,2]

Hour-Street|[1,1]

Hour-SM|[1,0]

Hour-City|[1,4]

Min-City|[0,4]

Min-Dist|[0,3]

Min-Suburb|[0,2]

Min-Street|[0,1]

Min-SM|[0,0]

Attributes Hierarchies

Buffer B

e1, t,v1

eN, t,vN

e2, t,v2 e3, t,v3

e4, t,v4

e5, t,v5

Fig. 2: A pictorial view of the proposed summarization scheme that shows a lattice formed by two-dimensional data from Smart Meter (SM). The Time hierarchy grows towards the left diagonal, and the Location hierarchy grows towards the right diagonal. The bold arrows and dashed arrows are different and are used to indicate the flow of data; data always flow along the solid lines during the normal operation of Summarizer algorithm (Algorithm 4). The dashed lines are meant to show that the query analyser can also use another descendant (marked by dashed lines) to perform aggregation through these paths at query time. Example: In order to understand how to build LDS from smart meter data, this example illustrates the concept. Consider that the LDS is initialized using Location and Time vectors {16, 8, 4, 2, 1} and {60, 24, 7, 52, 10} respectively. The Location vector {16, 8, 4, 2, 1} is used to decide the number of TDSs in the CDSs. For instance, this vector corresponds to {SM=16, Street=8, Suburb=4, Dist=2, City=1}; therefore, this will be used to build the CDSs at various levels of the LDS as follows; CDS with 16 TDS at nodes with labels SM, CDS with 8 TDS at nodes with labels Street, CDS with 4 TDS at nodes with labels Suburb, CDS with 2 TDS at nodes with labels Dist, and CDS with 1 TDS at nodes with labels City. This means that all the CDS at nodes with labels SM will have 16 TDS in it. Similarly, all the CDS at nodes with labels Street will have 8 TDS in it. Next, at each node of the LDS, the algorithm associates a TDS to a location entity such as SM, Street, Suburb, Dist, and City. The algorithm maps each location entity to a positive integer j using a simple one-toone mapping and then inserts the data of that location entity to the TDS located at index j of the CDS. For example, the data of SM 9 will be inserted into number 9 TDS of CDS at nodes with labels SM.

The Time vector is used to decide the capacity (the number of slots) of the TDS at various levels. For instance, the Time vector {60, 24, 7, 52, 10} corresponds to {Min=60, Hour=24, Day=7, Week=52, Year=10}. Therefore, it means that the number of slots for various TDS (circular queues) at various levels of the LDS will be as follows; there will be 60 slots in each TDS at nodes with labels Minute; there will be 24 slots in each TDS at nodes with labels Hour; there will be 7 slots in each TDS at nodes with labels Day; there will be 52 slots in each TDS at nodes with labels Week; finally, there will be 10 slots in each TDS at nodes with labels Year. Next, the algorithm will insert the data of a location entity e into a slot of TDS at time t as follows. Given a tuple (e, t, ∆e ), the algorithm will find the slot for e using both e and t. Consider, the algorithm receives the tuple with e= 5, t = 13/05/2015 0845 and ∆e =152. Next, e is used to locate a TDS and t is used to find the slot in the found TDS. Since the tuple contains the 45th minute consumption 152 of SM 5, therefore, the algorithm will insert the consumption 152 at slot number 45 (out of 60) of number 5 TDS at Minute level (the lowest node of LDS). After receiving 15 more tuples (to complete the hour), the algorithm will aggregate all the

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015

8 918

49 65

72 81 58 72

e1

39 48

71 91

91 72 65 51

49

64

61

58

37

812

851

911

712

716

...

991

513

s1

67

e3

917

81 57

62

812

910

711

sM

Smart Meter to Street Aggregation

41

eN

6 Minute to Hour Aggregation

...

1

6

6

4

2

5

4

8

5

1

8

5

4

7

5

2 1

e1

8

2

2

2

3

e2

e3

...

1

eN

Fig. 3: An illustration of the structure of the CDSs within the lower three nodes of the lattice. It shows that a CDS consists of a multiple TDS organized in cylindrical fashion. The example shows that the 60 minute slots of e1 at the bottom level CDS are aggregated into a single slot of e1 at the left ancestor CDS (Minute to Hour Aggregation). Also, the right ancestor aggregates single slot from several ei that belong to certain Street si (Smart Meter to Street Aggregation).

[6] [5]

Algorithm 4 Data Summarization using LDS

[4]

1: Class Summarizer { 2: double prvM sur[ ]; Lattice lt; 3: procedure P ROCESS(τ [ ]) 4: Γ = {∆e |∀e ∈ τ [ ] : (j ← M ap(e))

[3]

5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26:

⇒ (∆e = v − prvM sur[j])} ∀e ∈ τ[ ] : if (j ← M ap(e)) then U pdate prvM sur[j] = v; N ode r = lt.n[0][0]; while r is not N ull do for each ∆e ∈ Γ do j ← indexOf (∆e ) in Γ; sum = r.Enqueue(∆e , j); if sum! = 0 then N ode l = r.lef t; while l is not N ull do sum = l.Enqueue(sum, j); l = l.lef t; if sum == 0 then break; end if end while end if end for r = r.right; Γagg = Gen(Γ, r.LocLvl); Γ = Γagg ; end while end procedure }

[2]

[7]

918 735 689 814

[8]

910 812

[9] 911 716

P

[10]

513

[11]

644

[12]

715 973 [13] 817 862 587 461 [14] [1] [0] [15]

Fig. 4: A pictorial view(c) of a TDS with 16 slots.

minute values and will insert it into slot 8 (the last values belong to the 8th hour) of number 5 TDS at the hour level. 3.3

A Query Language for the LDS

We have designed a simple query language to retrieve information from the LDS, which has the following syntax:

f (Q)[{+, −, ∗, /} R]

(1)

where f is a function such as All, M ax, M in, Sum, Avg , or Count, and Q is a parameter of f , with a simple structure as follows: Q = [T ime, tr], [Loc, lr] (2) where T ime is a temporal hierarchy, Loc is a location based energy consumption hierarchy, tr and lr determine the specificity of time and location respectively. The last portion [{+, −, ∗, /} R] (where R is any valid number) of the query syntax given in Equation 1 is optional, and operates on f (Q). The query language supports a wide range of highlevel user queries. For example, a user query, “what is the Weekly consumption per smart meter”, can be represented using All([W eek, ∗], [SM, ∗]). Similarly, another

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015

9

Fig. 5: A snapshot of the query analyzer tool for smart grid data summarization application. high-level user query, “which Suburb demand is high during the last 5 Weeks”, can be issued using M ax([W eek, 1 − 5], [Suburb, ∗]). Moreover, consider a user is interested to find the “average consumption per day of each smart meter during the last year”, the query can be represented using All([Y ear, 1], [SM, ∗])/365. It is interesting to know that Avg([Y ear, 1], [SM, ∗]) returns only one value, which is the average of the matrix returned by [Y ear, 1], [SM, ∗]. We have designed a GUI-based query analyser tool (see Fig 5), which takes the user query (i.e., Equ. 1), and displays the result. The query analyser tool can also plot the output of user queries using different visualisation graphs such as Time Series Graphs, Bar Chart, Pie Chart etc. We have developed a query engine, which first parses user query to verify query syntax. In the case of errors in the syntax, the query engine displays useful error messages to help the user to identify the error(s). Second, query engine uses T ime and Loc in Q (from Equ. 2) to locate the node that contains data pertinent to the user query. Next, using the location range lr, the query engine identifies the index of the TDS in CDS, and using the time range tr, the query engine identifies the slots of TDS. For instance, if T ime = W eek , Loc = Suburb, tr = 42 and lr = 7, then the slot 42 of TDS 7 in the CDS at node (3,2) is located, which refers to the correct data specified by the query. Then the data identified by query engine using Q is read from the LDS, which is in the form of a lr × tr matrix. Finally, function f (i.e., Sum, Avg , etc.) operates on the retrieved data from the LDS within the query engine. Also, if user query (Equ. 1) contains the optional part (i.e., [{+, −, ∗, /} R]), then it is carried out on the result of f (Q) before returning the result to the user. 3.4

Utilities of the Proposed Summarization Technique

There are many applications of the proposed summarization technique; due to space we only mention two of them.

3.4.1 Voltage Management of Distribution Network using LDS based Summarization Here, first, we discuss voltage management of distribution network very briefly. Next, the relation of the proposed summarization paradigm with the voltage management application is clearly explained. Finally, the application is illustrated in a figure. Voltage Management: A key challenge of todays utility distribution system is to manage a stable voltage profile throughout the network. Typically, power flows from the substation to the load centres and the network is designed in such a way that the voltage drops gradually throughout the distribution lines. At peak load conditions, voltages of the end nodes of a distribution feeder may sometimes go below the lower limit of the stability margin due to a sudden increase in demand, which is typically known as the voltage drop problem. On the other hand, under a light-load condition where the primary voltage is already high, an additional distributed generation (DG) may cause over voltage, known as the voltage rise problem. For both voltage rise and voltage drop problems, some node voltages may violate the voltage stability margin set by ANSI which may jeopardise the whole system operation. In these circumstances, voltage management using local measurement based voltage regulators does not work properly [40]. Hence, a coordinated approach among voltage regulators using full network observability is essential. Relation with the proposed summarization paradigm: One big challenge of the coordinated voltage management is that the approach needs information of power consumption for all electric nodes, which are physically located in different areas (that can be represented spatially using streets, suburbs or districts). In practice installing sensors or monitoring devices at different nodes (or typically known as bus in energy system), is cost prohibitive or infeasible. Hence, an alternative approach is essential which can provide an

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015

estimation of the actual physical sensor readings. Here, that task is performed using our proposed lattice based summarization paradigm which can estimate any node measurements/reading for any time instance by aggregating the smart meter data only without requiring original sensor data. Using such aggregation technique, the system operators do not need to install sensors throughout the grid. Rather, smart meter data obtained from end-users are used as an alternative source for sensor data using our proposed approach and can be used for coordinated voltage control. Illustration with an example: Fig. 6 provides an overview of how the proposed technique can use smart meters data for intelligent Volt-Var control without the requirement of large-scale installation of sensors. In a smart grid setup, end-users are utilising smart meters for monitoring and control of residential load demands. Therefore, using our proposed data summarization model, the smart meters data can be aggregated at different granularities to quantify the aggregated power consumptions at different upstream nodes of the network for optimal voltage management using a coordinated Volt-VAr approach. As shown in Fig. 6 the power flow of the primary feeder that is normally obtained using a sensor can be approximated using the smart meter measurements in our approach to data aggregation without the physical deployment of that sensor. Experimentally we have observed that the estimated values using the end user’s smart meter data based on our proposed method produce almost similar measurements with the original measurement data obtained from the power system simulation. 3.4.2 Direct Load Control using LDS based Summarization DLC is a useful function implemented in smart meters as a part of DSM, which allows (based on customer consent) utilities to remotely control consumer consumption by turning on/off household appliances such as dishwashers, washing machines and pumps. DLC helps both the consumers and utilities to synchronise the load by shifting a large portion of a load from hours of high prices to hours of low prices. Clearly, to implement DLC effectively, utilities not only need real-time consumption of consumers but also aggregated consumption such as consumption of all the streets, suburbs or districts in the last few minutes or hour. Since, proposed approach based on LDS allows to access consumption at various temporal (Minutes, Hours, Days, Weeks, Years) and spatial (Smart Meter, Street, Suburb, District or City) granularities in near real-time. Utilities can understand the overall demand of the smart grid at various time and locations, which can be useful for load control or load synchronisation. For example, districts that have high demand pattern at a certain period of time, LDS can be queried to find the suburbs and streets in that district, which are responsible for the increase in the demand. Once utilities find suburbs and streets responsible for high demand, they can use DLC for these targeted heavy users to control their peak time loads (e.g., residential loads including dishwashers, and washing machine; industrial loads including auxiliary power plants, chemical factories that can intermittently turn off power without affecting operation). Furthermore, smart meters can also allow real-time pricing based on dynamic load data monitored through proposed approach based on LDS from

10

consumers which can be coupled with home automation systems to enable more fine-grained control of consumers household usage.

4

T HEORETICAL A NALYSIS

This Section provides the bounds on the Space (Lemma 1), Update cost (Lemma 2) and Query cost (Lemma 3) of the proposed data structure (i.e., LDS).

~ and Φ ~ vectors Lemma 1. Let the LDS be initialized by Ψ (Algorithm 3). Then the total space required by the LDS is ~ · Φ) ~ . independent of the size of data and bounded by O(Ψ Proof. The memory required by a CDS is C × N , where C ∈ ~ , and N ∈ Φ ~ . Let us denote a CDS at node ni,j of the Ψ LDS by CDSi,j , and its memory by Ci × Nj . Then the space required by the LDS is the space required by all the CDS, P|Ψ|,| ~ Φ| ~ ~ ·Φ ~ = O(Ψ ~ · Φ) ~ . i.e., Ci × Nj = Ψ i,j

It is clear from the above lemma that the space (memory) required by LDS does not depend on the size of data, it only ~ and Φ ~ . For example, suppose depends on the values of Ψ the time and location vectors are {60, 24, 7, 52, 10} and {8239, 2000, 200, 50, 1} respectively, then the space needed by the full LDS would be {60, 24, 7, 52, 10}×{8239, 2000, 200, 50, 1}= 1601370. So the total number of counters used by LDS would be 1601370. If each counter of LDS requires 100 Bytes, then the total space require by LDS would be 160137000 Bytes = 152.72 Megabytes. Lemma 2. The update cost of the LDS per insertion of tuple τ is constant i.e., O(1). Proof. For each incoming tuple τ , the algorithm updates the corresponding tuple in buffer B, which is a hash-based data structure that takes 1 hash operation to update a tuple. After every 60 updates (i.e., v0 · · · v59 ) in buffer B, the algorithm computes ∆ = v59 − v0 and inserts it into the minute level nodes (0, 0), (0, 1) · · · (0, 4) at the right diagonal of lattice in Fig 2. Computing ∆ takes one operation and insertion into a node takes one more operation. Therefore, after receiving 60 tuples in the buffer, the algorithm performs around 70=(60+ 5* (1(Compute ∆)+1(insertion)) operations for the full LDS. 70 = O(1). For the Hence, the number of updates per τ is 60 partial LDS, the number of operations is even less. As the time goes on, the algorithm performs updates into upper nodes of the LDS. For instance, after an hour the algorithm inserts the sum of the last 60 minutes into the next available slot at the hour level TDS. However, the ratio of the number of updates to the number of tuples received in an hour is still O(1), hence, the number of updates per tuple τ is always O(1). Lemma 3. The proposed summarization scheme answers the Point queries in a constant time i.e., O(1), and Range queries in O(tr × lr) time (see Equation 2). Proof. Point queries require one lookup to locate a CDS at the node of the LDS, and another lookup to locate the slots of TDS at that node. Hence, point queries require a total of two lookups to access data from the LDS, i.e., O(1) time. The number of lookups for the Range queries depends on time and location ranges, i.e., tr and lr. For each time range

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015

11

Optimal Tap decisions determined using Aggregated consumption technique

Data Aggregation is performed here

Control Center

LTC Controller Actuator

SM

Primary Feeder

inf o

com es to

the

Sensor deployment; Alternatively, the proposed summarization technique can approximate the aggregated power consumption.

First Customer

con

tro l

cen tre

Secondary Feeder Last Customer

Upper Voltage Stability Limit

Voltage

Voltage Profile

Lower Voltage Stability Limit Distance from Substation

Fig. 6: Coordinated Volt-VAr control using proposed technique

9

Memory (Bytes)

1.56

270

1.6

240

1.5

210

1.4

180 1.48 150 1.40

120 90

1.32

Memory (Bytes)

Memory Update Time Query Time

Avg. Time per τ (ns)

x 10 1.64

x 10

9

1 Lvl PLDS 2 Lvls PLDS 3 Lvls PLDS 4 Lvls PLDS Full LDS

1.3 1.2 1.1 1

60

0.9 1.24 1 Lvl PLDS

30 2 Lvls PLDS

3 Lvls PLDS

4 Lvls PLDS

0.8 1 10

Full LDS

Data Structures

2

10

3

10

Number of Weeks

Fig. 7: Comparison of Memory, Update and Query cost for various PLDS

Fig. 8: The memory growth of various PLDS with respect to time

tr, the algorithm requires issuing tr point queries. Similarly, for each location range lr, the algorithm requires issuing lr point queries. Thus, the range queries require a total of tr × lr lookups, i.e., O(tr × lr) time.

200,50,1}, which means that the LDS creates 8239 TDS at smart meter levels, 2000 at Street levels, 200 at Suburb levels, 50 at Dist level, and 1 TDS at City levels. The proposed summarization scheme is evaluated from different perspective, for instance, various partial LDS5 are designed and compared in terms of memory required (Section 5.1), update time and query time (Section 5.2). The data insertion time and data access time of latticebased summarization scheme are evaluated and compared (Section 5.3) with the state-of-the-art commercially available relational Database Management System (DBMS), which is referred to as DBMS-X.

5

E XPERIMENTAL R ESULTS

The proposed data structures and algorithms have been implemented using Java (v 1.8). The system configuration for testing was Intel Core i7 with a 3.4GHz processor, 16 GB RAM, and 64 bit Windows operating system. We have tested our algorithms on the IEEE benchmark 8500 node distribution system [41] with sample data from 8239 sensors sending flat (non-hierarchical) measurement data every minute. We have tested our system for the following time ~ is {60,24,7,52,10}, and location hierarchies: the time vector Ψ which allocates the number of slots for each circular queue (TDS) at each level of the LDS. For example, the first value ~ indicates that there are 60 slots at the Minute level; 60 in Ψ the last value 10 indicates that there are 10 slots at the ~ is {8239, 2000, Year level in the LDS. The location vector Φ

5.1

Memory Requirements

Fig 7 plots the memory required by various techniques. Notice that, in Fig 7, the left y-axis represents memory usage by the summarization techniques, and the right yaxis represents the average time spent per τ . It can be seen 5. 1 Lvl PLDS stands for 1-level partial LDS, which contains the lowest 1 level of the LDS, 2 Lvls PLDS stands for 2-levels partial LDS, which contains the lowest 2 levels of the LDS, and so on.

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015

efficiently stores the streaming data from the smart grid sensors without exhausting its memory, thus it is scalable and particularly suitable for the time-based summarization of smart grid sensors data.

300 Update Time Query Time

Avg. Time per τ (ns)

250

200

150

5.2 100

50

0

1 Lvl PLDS

2 Lvls PLDS

3 Lvls PLDS

4 Lvls PLDS

Data Structures

Full LDS

Fig. 9: The breakdown of Query and Update cost for various PLDS 5

10

DBMS−X 1 Lvl PLDS full LDS

4

Avg. Time per τ (ns)

12

10

3

10

2

10

1

10

0

10

Update

Query

Fig. 10: The Query and Update cost comparisons of 1 Lvl PLDS and full LDS with DBMS-X

that the memory requirement increases with the number of levels in the LDS. A closer look reveals that the upper levels do not require as much memory as the lower levels since the upper levels maintain the data in a more aggregated form than the lower levels. Fig 8 shows how various PLDS grow with time. Notice that, we have inserted the per minute readings of each sensor into various data structures, and plotted the memory required against the time. Fig 8 shows that the memory required for each data structure grows linearly for the first 52 weeks. Note that after the first year memory grows sublinearly. This is due to the fact that LDS stores data beyond the current year in less detail, hence memory requirement is negligible for subsequent years. For example, once the LDS has stored 1 year data, any new data will trigger the following action. First, the aggregated value from 52 weeks is inserted into the first empty slot of the Year node. Next, the new data overwrites the corresponding old data (e.g., new minute overwrites old minute). Therefore, any new data insertion does not require any new memory. Furthermore, at the end of every year only 1 memory slot (around 100 bytes) per location entity ξ is created at the Year level. Hence, the memory growth after year 1 is negligible. The LDS stores various time-based summaries in different nodes, and provides efficient access to those summaries. It preserves the time-based summaries by updating the summaries with newly arriving data, by properly overwriting the older data from the summaries at lower nodes, and accumulating the deleted data into the summaries at higher nodes. Experiments demonstrate that the LDS

Update and Query Cost

Fig 7 and 9 explains a number of observations on update and query cost for various PLDS. First, it can be seen that the update time increases linearly with the number of levels in the LDS, as expected. Second, it explains how the proposed technique can scale with increasing number of smart meters. For example, the full LDS takes on average 261 ns to process each reading τ from an individual smart meter. This means that the full LDS can process around 4 million smart meter readings every second. However, if the data is collected every minute it can handle 120 million readings every minute. The number of updates required by 1-level PLDS is much smaller than the full LDS because it simply contains fewer data to update. Hence, the 1-level PLDS is much more scalable than the full LDS, as it can handle around 2 billion (34.5m × 60) readings per minute, which is more than adequate for most countries. Third, the query cost decreases with increasing PLDS levels. Generally, data structures containing more data require higher access or query time. However, since different levels of the LDS contains precomputed summaries (of different granularities), therefore, the query time decreases (see Fig 7 and 9) with increasing number of levels. On the other hand, if LDS has fewer levels, then the query time is higher than the full LDS since the summary must now be computed during query time rather than directly accessed. Fourth, there is a trade-off between update and query cost, which is evident from the declining query cost and increasing update cost in Figure 7. We can see (from the intersection of the red and green lines) that the 2-level PLDS offers the best performance in terms of both query and update costs. In general the full LDS can efficiently process the streaming data from the smart grid sensors, however, we conclude that the 2-level PLDS is more scalable and suitable for the summarization of very high volume data, and also has an optimal update and query costs. 5.3

Comparison with DBMS-X

This section discusses how smart grid data can be handled using a commercial DBMS-X, as is the current practice by most utilities. Although the traditional databases are easy to implement and adopt, they are very inefficient for large scale data handling such as smart grid data. The commercial DBMSs (such as DBMS-X) are data management tools, which are optimised for redundancy, consistency, sharing and integrity constraints. On the other hand, the LDS is a summarization scheme designed specifically to process large volume streaming data and to provide efficient access to already build time-based summaries. The techniques may not be directly comparable due to their architectural and design differences, however, for data intensive applications such as smart grid, we simply wanted to show how much value the LDS adds over DBMS-X, which is a widely used off-the-shelf data storage and processing tool.

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015

For an experiment, a simple table T is created in the DBMS-X with three columns (Time, ID, Readings) to store information from smart meters, i.e., τ = < t, e, v >. Then, one year data from the test system is inserted into T, and the average insertion time per τ is calculated. Average query time is calculated based on SQL queries with various ranges for the queries. One example of such query is shown below. Other types of queries that have been tested include, INSERT and UPDATE. SELECT SUM(Readings) FROM T WHERE ID=e AND Time BETWEEN ta AND tb ;

The time ranges such as ta and tb are different time ranges in the query. For the sake of consistency in comparison, the same location and time ranges have been used to query both DBMS-X and LDS. Fig 10 compares the update and query time of 1-level PLDS (minimum update cost and maximum query cost) and full LDS (maximum update cost and minimum query cost) with DBMS-X. It can be seen that the LDS is very fast both in terms of update and query time compared to DBMS-X. In terms of update time, the full LDS is about 200 times faster, and 1-level PLDS is about 650 times faster than DBMS-X. In terms of query time, the full LDS takes 30 ns, and 1-level PLDS takes 150 ns, while DBMS-X takes 88 µs. It can be seen that the proposed method outperforms DBMS-X in terms of both update cost and query cost. 5.4

Technical summary of the experimental results

In summary, detail experimental analysis of the proposed data structure shows that overall its memory has sub-linear growth with respect to both data size and time, which is a significant improvement for streaming data. This shows that the LDS efficiently stores the streaming data from the smart grid sensors without exhausting its memory, and is particularly suitable for the real-time summarization of smart grid sensors data. In terms of runtime, experimental results also show that the full LDS can process around 4 million smart meter readings every second or 120 million readings every minute, while the 1-level PLDS can process around 2 billion readings per minute. The comparison of the proposed approach with widely commercially used DBMSX shows that in terms of update time the proposed approach is about 200 times faster than the DBMS-X, and in terms of query time the proposed approach is about 340 times faster than DBMS-X. This shows that the proposed method outperforms DBMS-X in terms of both update cost and query cost.

13

analysis. The proposed scheme is efficient both theoretically and practically as demonstrated with experimental results and evaluated against commercial DMBS application.

R EFERENCES [1] [2] [3] [4] [5] [6]

[7]

[8]

[9]

[10] [11] [12]

[13] [14] [15] [16]

[17] [18] [19]

6

C ONCLUSION

Power companies are trying to leverage advanced metering data to increase efficiency and utility of the power distribution system operation. This paper considers practical data handling issues and solves the challenges of the huge amount of measurement data. A novel summarization technique that deals with huge amount of customer consumption data is proposed with theoretical and experimental

[20] [21]

[22]

K. Manickavasagam, “Intelligent energy control center for distributed generators using multi-agent system,” Power Systems, IEEE Transactions on, vol. 30, no. 5, pp. 2442–2449, Sept 2015. J. Valenzuela, J. Wang, and N. Bissinger, “Real-time intrusion detection in power system operations,” Power Systems, IEEE Transactions on, vol. 28, no. 2, pp. 1052–1062, May 2013. C. Vivekananthan, Y. Mishra, and F. Li, “Real-time price based home energy management scheduler,” Power Systems, IEEE Transactions on, vol. 30, no. 4, pp. 2149–2159, July 2015. T. Logenthiran, D. Srinivasan, and T. Z. Shun, “Demand side management in smart grid using heuristic optimization,” Smart Grid, IEEE Transactions on, vol. 3, no. 3, pp. 1244–1252, 2012. A. Motamedi, H. Zareipour, and W. D. Rosehart, “Electricity price and demand forecasting in smart grids,” IEEE Transactions on Smart Grid, vol. 3, no. 2, pp. 664–674, June 2012. A. H. Mohsenian-Rad and A. Leon-Garcia, “Optimal residential load control with price prediction in real-time electricity pricing environments,” IEEE Transactions on Smart Grid, vol. 1, no. 2, pp. 120–133, Sept 2010. A.-H. Mohsenian-Rad, V. W. Wong, J. Jatskevich, R. Schober, and A. Leon-Garcia, “Autonomous demand-side management based on game-theoretic energy consumption scheduling for the future smart grid,” Smart Grid, IEEE Transactions on, vol. 1, no. 3, pp. 320–331, 2010. ¨ C. Buccella, C. Cecati, V. C. Gungor, D. Sahin, T. Kocak, S. Ergut, and G. P. Hancke, “Smart grid and smart homes: key players and pilot projects,” Industrial Electronics Magazine, IEEE, vol. 6, no. 4, pp. 18–34, 2012. V. C. Gungor, D. Sahin, T. Kocak, S. Ergut, C. Buccella, C. Cecati, and G. P. Hancke, “A survey on smart grid potential applications and communication requirements,” Industrial Informatics, IEEE Transactions on, vol. 9, no. 1, pp. 28–42, 2013. F. Benzi, N. Anglani, E. Bassi, and L. Frosini, “Electricity smart meters interfacing the households,” Industrial Electronics, IEEE Transactions on, vol. 58, no. 10, pp. 4487–4494, 2011. P. D. Diamantoulakis, V. M. Kapinas, and G. K. Karagiannidis, “Big data analytics for dynamic energy management in smart grids,” Big Data Research, vol. 2, no. 3, pp. 94–101, 2015. Z. Aung, M. Toukhy, J. R. Williams, A. Sanchez, and S. Herrero, “Towards accurate electricity load forecasting in smart grids,” in in DBKDA 2012, The Fourth International Conference on Advances in Databases, Knowledge, and Data Applications, 2012, pp. 51–57. C. Lai and M. D. McCulloch, “Big data analytics for smart grid,” Newsletter, 2015. G. De Francisci Morales, “Samoa: A platform for mining big data streams,” in Proc of WWW companion, 2013, pp. 777–778. C. Estan, S. Savage, and G. Varghese, “Automatically inferring patterns of resource consumption in network traffic,” in Pro of Appl, tech, arch, and prot for comp com. ACM, 2003, pp. 137–148. S. Rusitschka, K. Eger, and C. Gerdes, “Smart grid data cloud: A model for utilizing cloud computing in the smart grid domain,” in Smart Grid Communications (SmartGridComm), 2010 First IEEE International Conference on, Oct 2010, pp. 483–488. M. Shargal and D. Houseman, “The big picture of your coming smart grid,” Smart Grid News, vol. 5, 2009. Z. Hesabi, Z. Tari, A. Goscinski, A. Fahad, I. Khalil, and C. Queiroz, “Data summarization techniques for big data-a survey,” in Handbook on Data Centers. Springer, 2015, pp. 1109–1152. Z. Shah, A. N. Mahmood, and M. Barlow, “Computing discounted multidimensional hierarchical aggregates using modified misra gries algorithm,” Performance Evaluation, 2015. S. Muthukrishnan, Data streams: Algorithms and applications. Now Publishers Inc, 2005. A. Abuadbba and I. Khalil, “Wavelet based steganographic technique to protect household confidential information and seal the transmitted smart grid readings,” Information Systems, vol. 53, pp. 224–236, 2015. H. Suleiman, I. Alqassem, A. Diabat, E. Arnautovic, and D. Svetinovic, “Integrated smart grid systems security threat model,” Information Systems, vol. 53, pp. 147–160, 2015.

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015

[23] X. Deng, L. He, C. Zhu, M. Dong, K. Ota, and L. Cai, “Qos-aware and load-balance routing for ieee 802.11 s based neighborhood area network in smart grid,” Wireless Personal Communications, pp. 1–24. [24] T. Hong, J. Wilson, and J. Xie, “Long term probabilistic load forecasting and normalization with hourly information,” Smart Grid, IEEE Transactions on, vol. 5, no. 1, pp. 456–462, 2014. [25] C. Queiroz, A. Mahmood, and Z. Tari, “SCADASim-a framework for building SCADA simulations,” Smart Grid, IEEE Transactions on, vol. 2, no. 4, pp. 589–597, 2011. [26] J. Wu, M. Dong, K. Ota, Z. Zhou, and B. Duan, “Towards faulttolerant fine-grained data access control for smart grid,” Wireless Personal Communications, vol. 75, no. 3, pp. 1787–1808, 2014. [27] Z. Fan, P. Kulkarni, S. Gormus, C. Efthymiou, G. Kalogridis, M. Sooriyabandara, Z. Zhu, S. Lambotharan, and W. H. Chin, “Smart grid communications: Overview of research challenges, solutions, and standardization activities,” IEEE Communications Surveys Tutorials, vol. 15, no. 1, pp. 21–38, 2013. [28] D. G. Photovoltaics and E. Storage, “Ieee guide for smart grid interoperability of energy technology and information technology operation with the electric power system (eps), end-use applications, and loads,” 2011. [29] Z. Aung, “Database systems for the smart grid,” pp. 151–168, 2013. [30] P. D. Diamantoulakis, V. M. Kapinas, and G. K. Karagiannidis, “Big data analytics for dynamic energy management in smart grids,” Big Data Research, vol. 2, no. 3, pp. 94 – 101, 2015, big Data, Analytics, and High-Performance Computing. [31] I. Song, S.-Y. Yun, S. Kwon, and N. Kwak, “Design of smart distribution management system for obtaining real-time security analysis and predictive operation in korea,” Smart Grid, IEEE Transactions on, vol. 4, no. 1, pp. 375–382, March 2013. [32] S. Rusitschka, K. Eger, and C. Gerdes, “Smart grid data cloud: A model for utilizing cloud computing in the smart grid domain,” in Smart Grid Communications (SmartGridComm), 2010 First IEEE International Conference on. IEEE, 2010, pp. 483–488. [33] S. Bera, S. Misra, and J. J. Rodrigues, “Cloud computing applications for smart grid: A survey,” IEEE Transactions on Parallel and Distributed Systems, vol. 26, no. 5, pp. 1477–1494, 2015. [34] M. Ghamkhari and H. Mohsenian-Rad, “Energy and performance management of green data centers: A profit maximization approach,” IEEE Transactions on Smart Grid, vol. 4, no. 2, pp. 1017– 1025, 2013. [35] B. Hayes, “Cloud computing,” Commun. ACM, vol. 51, no. 7, pp. 9–11, Jul. 2008. [36] M. Yigit, V. C. Gungor, and S. Baktir, “Cloud computing for smart grid applications,” Computer Networks, vol. 70, pp. 312–329, 2014. [37] A. Kasper, “Legal aspects of cybersecurity in emerging technologies: Smart grids and big data,” in Regulating eTechnologies in the European Union. Springer, 2014, pp. 189–216. [38] T. Campbell, “Cloud computing security,” in Practical Information Security Management. Springer, 2016, pp. 193–204. [39] D. S. Markovic, D. Zivkovic, I. Branovic, R. Popovic, and D. Cvetkovic, “Smart power grid and cloud computing,” Renewable and Sustainable Energy Reviews, vol. 24, pp. 566–577, 2013. [40] A. Anwar, A. N. Mahmood, and M. Ahmed, “False data injection attack targeting the LTC transformers to disrupt smart grid operation,” International Conference on Security and Privacy in Communication Networks (SecureComm), 2014. [41] R. Arritt and R. Dugan, “The ieee 8500-node test feeder,” in Transmission and Distribution Conference and Exposition, 2010 IEEE PES, April 2010, pp. 1–6.

PLACE PHOTO HERE

Zubair Shah received a B.S. degree in computer science from University of Peshawar (Pakistan), and an M.S. degree in computer system engineering from Politecnico di Milano (Italy). He is currently working toward a PhD degree at the University of New South Wales (Canberra, Australia). His research interests include Data Mining & Summarization, Big Data Analytics, and Machine Learning.

14

PLACE PHOTO HERE

Adnan Anwar received Masters in Engineering (Research) from the University of New South Wales (UNSW) Australia and Bachelors (first class honours) from IUT. Currently, he is working with UNSW towards his PhD. His research interest includes Smart Grid modelling and simulation, security of Smart Grid, intelligent methods and applications.

Abdun Naser Mahmood received PhD from the University of Melbourne, Australia, in 2008; the MSc (Research) degree in computer science and the B.Sc. degree in applied physics PLACE and electronics from the University of Dhaka, PHOTO Bangladesh, in 1999 and 1997, respectively. He HERE has been working as an academic in Computer Science since 1999. He is currently in the School of Engineering and IT, University of New South Wales. He has been a lecturer since 2000, an Assistant Professor since 2003 at the University of Dhaka. He was received Postdoctoral Research Fellowship at the Royal Melbourne Institute of Technology (2008-2011). In 2011, he joined UNSW as a Lecturer in Computer Science until he joined La Trobe University, Melbourne in 2017, where he is currently working as a Senior Lecturer in Cyber Security. His research interests include data mining techniques for scalable network traffic analysis, anomaly detection, and industrial SCADA security. He has published his work in various IEEE Transactions and A-tier international journals and conferences.

Zahir Tari is a full professor in Distributed Systems at RMIT University (Australia). He received a bachelor degree in Mathematics from University of Algiers (USTHB, Algeria) in 1984, PLACE MSc in Operational Research from University of PHOTO Grenoble (France) in 1985 and PhD degree in HERE Computer Science from University of Grenoble (France) in 1989. Prof Tari’s expertise is in the areas of system’s performance (e.g. Web servers, P2P, Cloud) as well as system’s security (e.g. SCADA systems, Cloud). He is the co-author of six books (John Wiley, Springer) and he has edited over 25 conference proceedings. Prof Tari is also a recipient of over 9M$ in funding from ARC (Australian Research Council) and lately part of a successful 7th Framework AU2EU (Australia to European) bid on Authorisation and Authentication for Entrusted Unions. Finally, Zahir was an associate editor of the IEEE Transactions on Computers (TC), IEEE Transactions on Parallel and Distributed Systems (TPDS) and IEEE Magazine on Cloud Computing.

Albert Y. Zomaya is the Chair Professor of High Performance Computing & Networking in the School of Information Technologies, University of Sydney, and he also serves as the Director PLACE of the Centre for Distributed and High PerforPHOTO mance Computing. Professor Zomaya published HERE more than 600 scientific papers and articles and is author, co-author or editor of more than 20 books. He is the Founding Editor in Chief of the IEEE Transactions on Sustainable Computing and serves as an associate editor for more than 20 leading journals. Professor Zomaya served as an Editor in Chief for the IEEE Transactions on Computers (2011-2014). Professor Zomaya is the recipient of the IEEE Technical Committee on Parallel Processing Outstanding Service Award (2011), the IEEE Technical Committee on Scalable Computing Medal for Excellence in Scalable Computing (2011), and the IEEE Computer Society Technical Achievement Award (2014). He is a Chartered Engineer, a Fellow of AAAS, IEEE, and IET. Professor Zomaya’s research interests are in the areas of parallel and distributed computing and complex systems.

Lihat lebih banyak...

A Spatiotemporal Data Summarization Paradigm for Real-time Operation of Smart Grid

Descripción

Comentarios