Querying on Encrypted Data

September 19, 2017 | Autor: Tanuj Jhamb | Categoría: N/A

Descripción

CSE-303 Hall, Department of CSE, NIT Calicut. April 21, 2014

Querying on Encrypted Data Tanuj Kumar Jhamb MTECH Ist Year DEPARTMENT OF Computer Science and Engineering NIT CALICUT

Abstract— We all know that networking and Internet technologies are emerging at very rapid rate. They are deploying the "Software as a Service" model for enterprise computing services. Some of the commercially deployed services are Spreadsheets, Electronic mail services, General storage services etc. Along with the SAAS they are also deploying the model "Database as a Service" which is used to create, store, modify and retrieve the data from anywhere in the world through Internet. While implementing or using these services, several issue can occurs, an important issue is "Data Privacy". And furthermore, the two main data privacy concerns are: First, the vendor of data needs to be assured that the data stored on the service-provider site must be secured from thefts by unauthorized authentication. Second, If the service-providers themselves are not trust worthy, then the data needs to be protected even from them also. "Data encryption" is one of the effective way to protect data integrity and privacy. In our paper, we will review the techniques that can execute queries over encrypted data. We will execute the encrypted query as much as possible at the server site, without decrypting the data. Decryption and the remainder of the query processing are performed at the client site. Different querying models are there, for querying encrypted data like we can construct different indexes to support different computations for different data types. Performing range query of numeric by constructing a bucket index, and complete fuzzy query of characters effectively by using the bloom filter as the encrypted index.

multiple Techniques & Technologies available for Database encryption. There are multiple flavors of Databases & some of the OEMs(Oracle Enterprise Manager) provide TDE (Transparent Data Encryption). In TDE same level of encryption can be achieved but TDE can make performance of database degraded if not configured accurately. Database encryption is applied to the tables, columns or rows in which sensitive data like credit card details, patients medical records are stored in a database. The encryption requirement is sometime controlled (governed) by the Regulatory Compliance or nature of Business or Data Privacy Laws specific to countries/continents. Using Database encryption can even limit DB Administrator in copying or seeing Business critical information.

Index Terms—Conjunctive Queries, Encrypted Graph-

Transparent data encryption (TDE) performs real-time I/O encryption and decryption of the data files and also to log entry files. The encryption uses a database encryption key (DEK), which is stored in the database boot record for the availability while the recovery process occurs. The DEK itself a symmetric key which uses a certificate stored in the master database of the server or an asymmetric key protected by an Extensible Key Management (EKM) module. TDE protects data "at rest", which means protecting the data files and log entry files. It also take care the laws, regulations, and guidelines established in various industries; while implementing . It also allow to various software engineers and

Structured Data, Character Data, Searchable Encryption, Analytical Queries, Broadcast Encryption

I. INTRODUCTION Database encryption is the process of transforming a data, that is inside a database from plain text format to cipher text, which is not meaningful. The transformation can be done by means of a suitable algorithms. The encryption of database help to protect the data that is stored on database. There are

Database stores huge amount of sensitive and important data which need to be protect from attacks, Cryptography support is one of the mechanism that is practically effective and efficient. However, a balance (tradeoff) should be maintain between the performance and the security because encryption and decryption greatly degrade the query performance.  Database-AS-a-Service model : It support a bucket method to store and search encrypted sensitive data on a remote server.  Order-preserving encryption schemes: It supports the range queries and aggregate queries over the encrypted data without decryption.

developers to encrypt the data with the help of AES and 3DES encryption algorithms without changing existing applications.

II. TECHNIQUES FOR STORING ENCRYPTED DATABASE There are various techniques for query processing over encrypted data, let us first discuss how the encrypted data is stored at the server. 1) Encrypted Data in the Database-Service-Provider Model[1] Suppose we have relation R(A1, A2, A3,...,An) we store on the server an encrypted relation Rs(etuple, A1s, A2s,...,Ans) where the etuple stores an encrypted string that corresponds to a tuple in relation R.Each attribute Ai corresponds to the index for the attribute Ai that will be used for query processing at the server. For example, consider a relation emp(eid, ename, salary, addr, did) contain information about employees. The emp table is mapped to a corresponding table at the server: e.g. emp s (etuple, eidS, enameS, salaryS, addrS, didS) It is only necessary to create an index for attributes involve in search and join predicates. In the above example, if we know that there would be no query that involves attribute addr in either a selection or a join, then the index on this attribute need not be created. Without loss of generality, we assume that an index is created over each attribute of the relation. There are many function that manages the encryption of data which are as follows: (i) Partition Function: We know what is stored in attribute A of R for each attribute Ai of R. We first map the domain of values, of attribute R.As into partitions { p1 , . . . ,pk}, such that: 1. these partitions taken together cover the whole domain 2. any two partitions do not overlap. Formally, we define a function partition as follows: partition(R.Ak) = {pl, p 2 , . . . , pk} Suppose the values of domain of this attribute lie in the range [0, 1000]. Assume that the whole range is divided into 5 partitions: a: [0,200], (200,400], (400,600], (600,800], and (800, 1000]. Different attributes may be partitioned using different partition functions. It should be clear that the partition of attribute Ai corresponds to a splitting of its domain into a set of bucket. (ii) Identity function: We define an identification function called ident to assign an identifier ident R.Ai(pj) to each partition pj of attribute. The ident function value for a partition is unique, that is, identR.Ai(pj) not equal to identR.Ai (Pz),

(iii) Mapping Functions We define a mapping function MapR.Ai that maps a value 'v' in the domain of attribute Ai to the identifier of the partition to which v belongs. We further classify two types of mapping functions: (a) Order preserving: A mapping function MapR.Ai is called order preserving. if for any two values vi and vj in the domain of Ai, if vi < vj, then Map R.Ai(Vi) < Map R.Ai(Vj) (b) Random: A mapping function is called random if it is not order preservin Storing Encrypted Data: We can store the encrypted relation Rs on the server. For each tuple t = (al,a2,... ,an) in R, the relation Rs stores a tuple: =

where encrypt is the function used to encrypt a tuple of the relation. The first column etuple contains the string corresponding to the encrypted tuples in emp. 2) Searchable Symmetric Encryption for Multi-user[2] Searchable encryption allow users to perform keyword based searches on an encrypted database. Ssearchable symmetric encryption scheme is constructed to make searches possible on encypted database. In searchable symmetric encryption (SSE) for multiusers, a single user places a document collectiion on a server and arbitrary users can submit queries to server to search in that document collection. In M-SSE(SSE for multiuser) scheme privacy of data and access privileges are managed with six polynomial time algorithms. M-Keygen (1k): A key generation algorithm which run by owner O of data to setup the scheme. Takes a security parameter k and returns an owner secret key KO. MBuildIndex(KO,D): Run by O to create Indexes. It takes owner's secret key and document collection D as input and returns the Index I. AddUser(KO,U): Run by O to add a user in group G. Takes owner's secret key KO and a user U as input and returns U's secret key KU. RevokeUser(Ko,U): Run by O to revoke a user from G. Inputs are owner's secret key KO and a user U and revokes the user's searching privileges.

MTrapdoor (KU,w): Run by a user (including O) in order to generate a trapdoor for a given word. It takes a user U's secret key KU and a word w as inputs, and returns a trapdoor TU,w. MSearch (ID,TU,w): It run by server S in order to search for the documets in D that contains word w. It takes the index I D for collection D and trapdoor TU,w for word w as inputs and returns D(w) if user U belongs to group G. CORRECTNESS OF M-SSE: Scheme SSE is correct only if a revoked user is no longer able to perform searches on the owner's data. Let D is document collection and ID is index of it, then a multi-user SSE scheme is correct if Pr [MSearch(ID,Tu,w ) = D(w) : U ∈ G ] =1 M-SSE makes use of a broadcast encryption (BE) scheme . In BE, a center encrypts a message m for a group G of privileged users who are allowed to access the message. Users in group are dynamically changing. When a user joins the group it receives a set of secrets called long-lived secrets.Given an encrypted message, the long-lived secrets allow a user to decrypt it if the user was non-revoked at that time. BE is used as building block in M-SSE construction in order to manage user revocation. In order to retrieve the document that contain word w, an authorized user U first computes a single-user trapdoor T U,W , and applies a pseudo-random permutation φ with secret key r on this trapdoor and sends it to server. This key r is known by the owner, by the set of currently authorized users and by the server. Upon receiving φ r( TU,W ) , server recovers the trapdoor as TU,w =φ-1 (φr (TU,W )) . Each time a user of group G is revoked, the owner picks a new r and stores it on the server such that only non-revoked users can decrypt it. With Broadcast encryption this r is distributed to the of nonrevoked users also. Revoked users cannot recover the current r so server will not provide access to them. M-SSE construction is very efficient on server side because server only needs to evaluate trapdoor and permutation to check whether user is revoked or not. Owner O should posses an additional secret to perform authentication with server when he wants to update D. This additional secret guarantees that only O can perform update to document. 3) Privacy Preserving Query Over Encrypted Graphstructured Data(PPGQ)[3] It establish a set of strict privacy requirements for secure cloud data utilization system to become a reality. As a general data structure to describe the relation between entities, the graph has been increasingly used to model complicated structures and schema less data, such as the personal social network (the social graph) . Querying on graph structured data is different from a query on a relational data base(tuple format) that we usually see. Considering the large amount of data centralized in the data center, it is a very challenging task to effectively utilize the graph-structured data after encryption.

The proposed technique utilizes the principle of “filtering and verification” and a pre-build feature based index is used efficiently for taking the inner product of query graph and the entire data graph which prunes the data set. And strict system wise privacy requirements are followed inorder to avoid security breaches. Filtering is used for reducing the size of the data retrieved from the server to the client so that the overhead on performing decryption on unwanted data can be reduced. In conventional method we perform the graph containment query directly which degrades the performance of the system to a large extent. In this technique we use graph data, feature set and searchable index. The searchable index and the graph data will be given to the database service provider (server). An authorized user acquires a trapdoor and upon receiving the trapdoor the server performs the query and retrieve the results to the client. Query graph and data graph are associated with binary vectors in which each bit represents whether corresponding feature is subgraph isomorphic to the data graph or query graph as well. The secure Euclidean distance computation technique in the secure kNN scheme is used for the performing inner product between data graph and the query graph. The inner product actually gives the super graph set of the query and this is further decrypted in the client side. The mining of the feature set and the building of the feature based index is done in the data owner side before outsourcing it to the server. The possible attack here is the security breaches at the server. Since server has the data set and the searchable index, so that feature frequency can be used to identify features in a query. The trapdoors generated and the searchable index are also vulnerable. The server should learn nothing from the query graph. The strict privacy requirements suggested in this paper are, Index Privacy, Feature Privacy, trapdoor Unlink ability, Access Pattern Privacy. The technique ensures this privacy requirements by extending the simple bit structure to more sophisticated structures which breaks the scale relationship between the final inner product and the number of features in the query graph. Random trapdoors are generated even for the same kind of queries at different times to ensure trapdoor unlink ability 4) Update Query Over Encrypted Data[4] It consist of a concrete example of different queries that we usually perform in a database. This is claimed to be an efficient method for Update query over encrypted data. In an Update query “false positive” hits cannot be tolerated at all. This makes UPDATE query different from SELECT or SEARCH queries where post processing is possible so that “false positive” are tolerated to an extent. The technique disc used facilitates us to update a column directly even though it is encrypted. Two tables namely Actual_Table and Additional_Table are used to implement the method. Actual_Table contains whole data along with the encrypted column. The additional table contains the encrypted data column of Actual_Table in the plain form and key (Primary key) of Actual_Table in the

encrypted form, which hides the relationship between Actua_Table and Additional_Table. If user issues a query it checks the conditioned column first. If the column is not encrypted then update directly in the column. If the conditioned column is encrypted then the system checks whether the user is authorized. If authorized then the encrypted key is retrieved from the Additional_Table first and then the key is decrypted. Update is performed in columns corresponding to this decrypted key values. The main advantage of this technique is that there is no false positive hits as the query decrypts and retrieves the required values only as per the given condition in WHERE clause. This makes the technique fast enough compared to other techniques. 5) Fast Query Over Encrypted Database on B+ Tree[5] Now a days a database stores huge amount of sensitive and important data which need to be protect from attacks, Cryptography support is an effective mechanism. However, a tradeoff must be made between the performance and the security because encryption and decryption greatly degrade the query performance. Database-AS-a-Service model : support a bucket method to store and search encrypted sensitive data on a remote server. Order-preserving encryption schemes :support the range queries and aggregate queries over the encrypted data without decryption were proposed by Agrawal But the order of plaintext is exposed due to encrypted data based on order-preserving encryption scheme. A major challenge in the encrypted data is how to represent indexing information. It can be solved in two ways: (i)The indexing information should be related with the data well enough to provide for an effective query execution mechanism. (ii)The relationship between indexes and data should not open the door to inference and linking that can comprise the protection granted by encryption Architecture of Encrypted Storage Based on B+ Tree It contains two relation table 1)Enc_fields 2)Enc_key Enc_fields(fid, DB_name, Rel_name, Field_name) Enc_keys(kid, fid, key) There is encrypted component which is used to encrypt the field value specified by the system table Step of storing encrypted data (i) After the parsing analysis SQL language are tranferred into DBMS through the ODBC(JDBC) . (ii)Verification:Whether storing data contains the field need to be encrypted or not with the help of the encrypted field in the relation Enc_field.

(iii)If need to be encrypted ,firstly call a function of creating index to create B tree; then call cryptographic function to encrypt the value of the specified field when data is written into the disk. (iv)Add a record to relation Enc_keys which record key corresponding to the encrypted fields. Creation and Encryption of B+ Tree It is a very common index structure, which consists of key and pointer pairs. In B+ tree, the key K points to the corresponding record through the pointer. Firstly, we find out the matching key in the B+ tree. Secondly, we search the record of the data file in term of the pointer. We can utilize B+ tree to achieve fast query storage space occupied by B+ tree is so far less because memory can hold only B+ tree not data file. B+ tree is arranged with ordering, we can quickly search the key K in B+ tree by 2-phase query Algorithm. Algorithm: Query over Encrypted Data. Input: SQL language which is used to query the encrypted data. Output: A collection of records satisfied with the query predicates. Method: Step 1: When a SQL language is transferred into DBMS by ODBC(JDBC), the parser analyzes the SQL query, verifies the query semantics, and generates the query plan. After optimized and rewritten the query plan is delivered to the executor. Step 2: Query B+ tree until find out all node satisfied with the query conditions. (i) Basis. If the present node is a leaf node, call the decryption component, decrypt the leaf node and match the key inside it. If the key meets with the conditions, then the corresponding pointer will be used to query the encrypted data file. (ii) Induction. If the present node is an inner node, first decrypt the root node of the subtree, then compare the key of the root node with query key in order to find appropriate next subtree. Repeat the process until the present node is a leaf node. Step 3: According to the pointer obtained in step 2, find out the right data block and the right records from the encrypted data files, and call the decryption component to decrypt the encrypted data. Step 4: Return the decrypted records. Encrypted Storage of B+ Tree: There are two way to encrypt storage B+ tree

(1)Encrypt the whole B+ tree:-This way can speed up the query since storage space occupied by B+ tree is so far less (less no of node)than that of the corresponding data file. (2) Encrypting each node:There are fewer nodes needed to be decrypted. In this way, the query performance will further improve. Query over encrypted data based on B tree: (1) We can locate the right record in the data file by B+ tree. we only need to decrypt the records related to the query predicates can farthest decrease the records need to be decrypted and dramatically enhance the performance. 2) B+ tree is encrypted on each node, we only decrypt a few nodes. In general, the depth of B+ tree is no more than 3 layers. If root node is in memory then there is only two node will have to decrypted which gives better performance. III. CONCLUSION Data to any organization is a most valuable and confidential property. The security of this data for an organization is always a big challenge. In today’s technological world, the database can get easily vulnerable for attacks due to lack of verification during the testing phase that can get attacked frequently. Major security issues are identified after the data integrity violated. Here in this paper, some encryption methods are discussed that can help to reduce the risks of attack and protect the sensitive data. Encryption has become a most vital thing for securing data because, once data is encrypted, if it is stolen, it will be still unreadable without the encrypted key required to decrypt the data.

IV. Future Work In Database-Service-Provider Model the encryption provides confidentiality but give no assurance of integrity unless we use some digital signature or Hash function. By Using strong encryption algorithms can give both confidentiality and Integrity and parallely increase performance of querying database. In MSS scheme only one user can update data, so we can extend the scheme such that all users in group can also update database. In B+tree encryption model for supporting queries on encrypted data, a secret key can produce tokens for testing any query requested by user. That can make the accessing of data inefficient and time consuming. We can present a general framework that can test the secret key without analyzing the token that will make the Database accessing efficient and optimized. V. ACKNOWLEDGEMENT I would like to thank Sujeet Prakash, Sithara EP, Vineethanand and Virendra Singh for their valuable suggestions during writing the paper.

REFERENCES [1] Hakan Hac , Bala lyer , Chen Li, Sharad Mehrotra 1, "Executing SQL over Encrypted Data in the Database-Service-Provider Model”, Department of Information and Computer Science, University of California, Irvine, CA 92697, USA, Proceedings-Nov 2011 in ACM conference on Computer and Database Security [2] R. Curtmola, J. Garay, S. Kamara, R. Ostrovsky"Searchable Symmetric Encryption:Improved Definitions and Efficient Constructions" Proce-eding Proceedings-Jan 2006 of the 13th ACM conference on Computer and communications security [3] Ning Cao , Zhenyu Yang, Cong Wang, Kui Ren, and Wenjing Lou, “Privacy-Preserving Query over Encrypted Graph-Structured Data in Cloud Computing ”, in 2011 31st International Conference on Distributed Computing Systems. [4] Shaukat Ali, Azhar Raut: Saeed Mahfooz , “UPDATE Query over Encrypted Data ”. September 2012, IEEE conference, University of Waterloo, Can-ada [5] Zheng-Fei Wang, Hunan Bus. Coll., Changsha, Ai-Guo Tang, Wei Wang "Fast query over encrypted data based on B+ tree" Published in: International Conference on Apperceiving Computing and Intelligence Analysis, 2009. ICACIA 2009. Date of Conference:23-25 Oct. 2009 E-ISBN: 978-1-4244-5206-4, Print ISBN: 978-1-42445204-0 INSPEC Accession Number: 11035373 [6] Dan Boneh1, and Brent Waters,"Conjunctive, Subset, and Range Queries on Encrypted Data" Stanford University, SRI International, audit log. ACM journal in Proceedings of NDSS ’04, 2004. [7] Michael Shawn, Ching Fon Fu, Jhang Xuing "Multi-Dimensional Range Query over Encrypted Data" Elaine Shi John Bethencourt T-H. Hubert Chan Dawn Song Adrian Perrig Carnegie Mellon University, 2007 IEEE Symposium on Security and Privacy(SP'07) 0-7695-28481/07 2007 [8] Stephen Tu M. Frans Kaashoek Samuel Madden Nickolai Zeldovich, "Processing Analytical Queries over Encrypted Data" International Journal of Computer Applications (0975 – 888) Volume 47– No.12, June 2012 [9] Dr. Anwar Pasha Abdul GafoorDeshmukh; Dr. Anwar Pasha Abdul GafoorDeshmukh; Transparent Data Encryption- Solution for Security of Database Contents; (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 2, No.3, March 2011 [10] https://www.net-security.org [11] http://en.wikipedia.org/wiki/Database_encryption

Lihat lebih banyak...

Querying on Encrypted Data

Descripción

Comentarios