A dual (IP4/IPv6) \'durable storage\' commercial service

June 23, 2017 | Autor: Eduardo Jacob | Categoría: Distributed Computing, Service Provider

Descripción

A dual (IP4/IPv6) “durable storage” commercial service Eduardo Jacob, Juan José Unzilla, Mª Victoria Higuero, Purificación Saiz, Marina Aguado, Christian Pinedo Departamento de Electrónica y Telecomunicaciones. Universidad del País Vasco/Euskal Herriko Unibertsitatea [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]

Abstract In this article we will describe a commercial service that will use distributed resources, such as storage or bandwidth, in a cooperative effort. We name this service “durable storage”. This service is able to guarantee the existence of the data it contains in “any circumstance” at “any time”. Although the solution is based on a P2P paradigm, we use a centralized approach that matches the commercial side of the service. We have designed the system with anonymity, confidentiality and security in mind. The ISP, or service provider, will not only assure several of the main functions, such as authentication or directory maintenance, but constitutes as well the only billing point for every user. We will describe the architecture or the service and the implementation of the prototype which is a dual stack IPv4/IPv6 application. Finally we will point out that the solution is not only IPv6 compatible, but it could clearly benefit from an IPv6 only version.

1. Introduction Several years ago our research group was contacted by an ISP in order to evaluate the feasibility of ASP (Application Service Provider) services. We began studying the services being offered in other markets, mainly in North America at the time. We made some interesting findings: apparently there have been many problems related to that model and it seemed there was no real business using it. We also found that our ISP’s customers were very concerned about two aspects of the massive use of ASP: one related to the fact that external connectivity would become a critical asset (at a time in which email and www were more a commodity than a real business tool) and the other linked to the privacy of data and the use of the applications. According to this study we changed our approach to the problem and began to consider it from a different

point of view: which service would be desirable for current ISP users? We found that most of them were using a flat rate connection that was mainly used only during business hours, so a service that could make use of that paid and unused bandwidth (outside business hours) could be welcomed. The services we devised were based on remote backup systems or network disks. After that, the Grid and P2P paradigms become very popular, and a small firm whose activity was designing services for a middle-size ISP asked us if we could design a way of commercializing GRID services. We made a proposal [1] for a “durable storage” service, based on a central entity (easily recognizable as an ISP) which was the only billing point and which acted as a clearing house between participants. There were provisions for participants which only wanted to use resources. The cost sharing proposal was later reflected in the eInfrastructure Reflection Group White Paper for the EC [2]. Later, when designing the architecture we found that the P2P paradigm could bring us valuable ideas. More or less at that time, our research group became involved in IPv6 related activities so it was natural to design this service with this technology in mind. We do think that IPv6 has the potential to ease the application deployment and security model. In the following pages, we will first present the “durable storage” service. Later we will study the Peer to Peer paradigm. We will continue with the architecture and implementation details and finally we will present some conclusions.

2. The “durable storage” service The “durable storage”, is defined by us as a service able to guarantee the existence of the data it contains in “any circumstance”, including wide area disasters. An approach with some similarity can be found in an article by Ross Anderson [3], although the final objective of that work was to assure the durability of a file through

Proceedings of the 19th International Conference on Advanced Information Networking and Applications (AINA’05) 1550-445X/05 $20.00 © 2005 IEEE

anonymity, to try and be sure that it could not be erased, not even by security incidents or by “Her Majesty's judges”. In [4], an available implementation of an anti-censorship approach is described, the Freenet Project; however only the most popular data are kept with this approach. In [5] the objective was to design a system that offered “global persistent data store designed to scale to billions of users. It provides a consistent, highly-available, and durable storage utility atop an infrastructure comprised of untrusted servers”. As we will show below, our design shares some of the goals of that reference, but also adds some others. Our approach will offer “durable storage”, in a cooperative effort between customers and ISP, being possible to define service level agreements for it. The parameters negotiated are: data size, number of replications, location of replications, time to complete replication and customer bandwidth employed. In our model, there are two entity types: the ISP and the system users. The users themselves can choose to take part by purchasing the service as usual, or actively. In this latter case, they are part of the network, sharing data storage and network communication resources. The ISP, on the other hand, has several tasks. First, it is responsible for the directory service, which will be used for both, storing the files and recovering them from the system when necessary. Before accepting a file, the system should verify if it can maintain the service level agreement. Once a file is accepted, the system will choose the customers to be used to store these data. The ISP can also employ its own resources (storage and bandwidth) to maintain the quality of service in the event of the whole system not being temporarily able to maintain it (e.g. communications outage). Other functions of the ISP include the accounting issues and, specifically in this service, the compensation of the service cost with the actual cost of the resources every customer shares with others. The SLA is based in storage size, number of replicas, geographical location requirements and hourly bandwidth utilization. There are many constraints involved in the design of a proper security model. For example, the customers should know neither the content, nor the owner of the data stored in their systems. This involves anonymizing not only the data, but also the connections. As we have already mentioned, authentication and confidentiality are clearly part of the future clients’ requirements.

3. The Peer to Peer paradigm and the commercial environment Although there is much expectation and many announced applications [6], there are very few

commercial P2P based applications that are really working. Probably SKYPE is one of them, although there are others being developed, mainly in the field of cooperative work. The objective of our work was to evaluate the P2P paradigm to see if our architecture can be adapted to it. The first approach we thought about was very much related to the strict peer to peer paradigm. We could offer a service to the customers: remote backup using P2P technology, with software installed in all different premises of every customer. The data would be replicated among the different locations normally on a best effort basis. This is adequate for companies with multiple premises, and from the point of view of the users it is good for the confidentiality of data, as the information never leaves the corporate network. However, there are also some problems: the classic P2P approach does not guarantee by itself a tight control of the replication process. And there is also a more subtle problem: the P2P paradigm by definition does not need a third party, and we were exploring services to be offered by an ISP. In [7] we have studied the different P2P protocols to provide this service. There are basically de-centralized and centralized approaches. Gnutella or Gnutella2 are examples of first type and Napster and BitTorrent are examples of the second one. For our purposes BitTorrent, is the clear candidate. The centralized and tracker approach it uses is definitely very well suited to our needs. We concluded that we could design this service with P2P technology.

4. Service Architecture Before defining the architecture we studied several options. The final solution should fulfill the following objectives: - Modular design that could easily accommodate both type of clients. - Good security model, in which authentication and anonymity are needed at different stages. - Clearly separated management and replication responsibilities. - Centralized design. - Efficiency in operation. After some work we decided to propose the following architecture that is based on three key elements. The first element is the “Control Server Group” or CSG. The CSG is a group of Control Servers or CS’s. The second element is the “Data Server Group” or DSG, which comprises of several Data Servers, or DS’s. And the third type of element is just the “Users” or U’s, which are the users of the durable storage

Proceedings of the 19th International Conference on Advanced Information Networking and Applications (AINA’05) 1550-445X/05 $20.00 © 2005 IEEE

service. The functionalities of each element will be described hereafter. The entities that will be described now and their interactions when a user inserts a file in the system, are represented in figure 1.

Figure 1: System Architecture

4.1. Control Server A Control Server, or CS, is the key element in the administration and control of the service. A group of Control Servers is desirable, because it increases the resilience of the system, by means of the replication of administrative data and tasks. The CSG is clearly under the control of the ISP offering the service. The tasks associated to a CS are the following ones: - Authentication of users (U) and Data Servers (DS): This is necessary to protect the system from unauthorized access to the resources, and from substitution or impersonation attacks on it. - Application of the data policy to each system user. This policy reflects the security and availability settings for the data of each user. It is defined by the Service Level Agreement, which determines the number of data replicas, the hourly bandwidth used for replication and the speed of this replication process. - Processing of user requests. Each request made by a user should be authorized by the CS or, generally speaking, by the CSG. - Accounting. As this is the central point of the architecture, where every transaction is authorized, it is also the natural point for accounting tasks. - DS management and control. The CS should be able to manage and control the resources of the

DSG. The DSG control is totally centralized regarding file admission and retrieval control. - Temporary support for fulfilling the Service Level Agreement. The CSG has also the mission of assuring the SLA, in conditions where the DSG is not able to cope temporarily with it, for example as temporary storage. As it has been shown that everything except data transfer is handled through the CS, it is easy to see the convenience of having a resilient system, and hence the adequacy of employing a CSG rather than a single CS. At this point we can conclude that most of the complexity of the solution is concentrated in the Control Server. This is very convenient for a commercial solution, as it leaves very small complexity on the user side and on the DS side. This architecture presents a single point of failure in the CS but this eventuality is handled through CS replication in the CSG, and it also benefits from ISP premises protection. The functional architecture of a CS is depicted in figure 2. A number of databases store administrative data but also file storage information, and are used by several modules. In the figure we can see the process of a successful file insertion into the system by a user. As it can be seen, the data storage interaction is done through a proxy.

Figure 2: Control Server Architecture

The replication of files is handled through another module that runs according to the CS policy.

4.2. Data Server Group The following element in the described architecture is the DSG or Data Server Group. This entity comprises of several Data Servers distributed, ideally, over a wide geographical area. It must be noticed that the minimum number of them is related to the SLA offered. At this

Proceedings of the 19th International Conference on Advanced Information Networking and Applications (AINA’05) 1550-445X/05 $20.00 © 2005 IEEE

point a DS is a system that is used for storage and, as such, it can also be located at a User’s U premises. The main tasks of the DSG are the following: - Data storage: This includes Read/Write/Delete information. - Information replication: Following the CSG instructions, the DSG will take part in the process of proper data replication over the needed DS, and thus relieve the GSG from the replication task. - Notification of special circumstances to the CSG: The DS should be able to transmit exceptions to the CSG (bandwidth or hardware/software changes, etc.)

4.3. User The last element of the architecture is the user, U. The user is the entity that signs up for the service. As explained before, the user of the durable storage system can participate in the system either as a user or also offering bandwidth and storage. The invoicing of the service will take that into account. The main tasks of the customer are: - Introducing/retrieving/deleting data from the system: This is actually the service itself. The method employed to interact with the service should be kept as easy as possible. - Data Ciphering: The only way to offer the users enough confidence for them to actually use the system is to guarantee that they have full control of the data secrecy and also to let them feel it. This implies that the data should be ciphered before leaving the customers’ premises. As it could have already been inferred from the description above, the files are treated as opaque elements, and the access model is based on capabilities. For this, lengthy random file IDs that are meaningful only for the duration of the operation are employed.

4.4. Security model From the security point of view, a number of features are highly desirable and must be carefully introduced in the system design: - Authentication of every entity involved in the service: The solution is based on SSL and certificates for identifying Control Servers and Data Servers, and user/password pairs for client identification. Although there is no technical problem in using certificates for the client authentication, we feel that at this time we should take the burden of certificate management out of the client side. A system like the one presented by some of the authors of this paper in reference [8] could be a solution to this problem.

- Anonymity: This feature may be important for a system user in two ways: making opaque the actual data storage place to the user, and not letting the DS know about the origin of these data. This feature will be implemented by carefully proxying the CS and DS flows. - Confidentiality of data: data in original format must only be available to the owner. This feature can be easily implemented through symmetric ciphering. In order to be able to guarantee the restricted availability of the original data, the key should only be available to the owner of the data, and never leave his premises. Whether this ciphering should be managed and applied by the durable storage client software or offline by the system user is still under consideration, as there is a trade-off to be made between usability and trust.

5. Implementation At this time we have developed a ‘proof of concept’ version of this service that runs over IPv4 and IPv6. It’s a dual/stack application, and every relevant function and associated data record can store addresses in both formats. The interaction between IPv4 and IPv6 nodes in a concrete solution where both types of nets and nodes are used should be studied and adequate transition mechanisms used. We have chosen Python as the programming language for several reasons, first, BitTorrent itself is programmed in Python, second, we had already tweaked Python 2.3.4 to better support IPv6 for porting Zope to IPv6 [9], so we had experience using IPv6 with Python and third, this language is very useful for RAD. As we have said before, the security is implemented using SSL and certificates for identifying both servers, and user/password pairs for User identification. The interaction between Data Servers (DS), Control Server (CS) and Users is implemented with SOAP-RPC. We have added IPv6 support to a M2Crypto 0.13 module (httpslib) and we have also modified the SOAPpy 0.11.3 package, to permit the use of M2Crypto library by the client. As we have said before, BitTorrent [10] is developed in Python, and there is a patch that makes BitTorrent version 3.3 original client IPv6 compliant. Sadly, version 3.4 of this protocol is no longer IP/transport neutral [11]. There are some functionalities, mainly related to massive base-64 conversions or MD5 checksum calculations that are being done through spawning of external applications from the main Python code for performance reasons.

Proceedings of the 19th International Conference on Advanced Information Networking and Applications (AINA’05) 1550-445X/05 $20.00 © 2005 IEEE

The interaction between every module, is quite complex and we consider it outside the scope of this article. As we have a full SOAP-RPC approach for communicating client and servers, the integration in other applications should be straightforward. For the tests, we have chosen to use a text-based interface that can be seen in figure 3.

Figure 3: Text based interface

At this point in time the prototype is fully functional in IPv4/IPv6 environments. It supports file insertion, retrieval, replication, local and remote listing and attribute/ownership changing. In figure 4, we can see a file insertion, retrieval and listing. The only missing parts at this stage of the project are the logic for the SLA based selection of the DS in the case of a great number of participants, and the bandwidth management and enforcement system. The databases are already populated with the relevant data, and the source code contains references to these modules. We expect to finish the coding before the end of the year.

6. Conclusions and future work. At present, we have a working version of our durable storage service. The functional tests over a LAN and with a small number of DS are finished. The following tests will check the performance of the system. We don’t expect many surprises in the performance of the file replication subsystem. This is already studied in [12], although our approach is based on a specialized setup. The test bed will also be able to measure the performance of operations like concurrent file insertion and removal. We have conducted these tests over our corporate IPv4 and our experimental IPv6 network. We would like now to begin tests with remote sites. While this is easily achievable with IPv4 networks, it’s somewhat more difficult with IPv6, due to lack of partners with connectivity. Our dual stack solution is designed to be useful in an IPv4 Internet with some experimental IPv6 nodes or networks, which is the dominant situation in our environment. For this, some design decisions that have been taken are suboptimal from the view of its use on a mainly IPv6 network. We feel that we could tune better our application if a mainly IPv6 scenario is considered and this is what the next major version of the service will try to address. The improvements we could expect from an IPv6 only version are related to several areas. One of them is the simplification of the actual incoming NAT traversal that is now handled through port redirection and proxies. That would in turn ease the setup in the case of several Users or Data Servers in the same network. We could also achieve a simplification of the security model and get increased security through the use of AH and ESP headers. Sadly, as in many other cases, although the technical advantages of an IPv6 only version are easily perceived, they are not sufficient to provoke an IPv6 adoption. We hope that this work will be another reason to initiate the transition. Nowadays, society is expecting new uses for already known or available technologies, but we think that the future is more tied to new paradigms or new services, than to new versions of old ones. The commercial approach we have devised in which individuals are allowed to participate not only with monetary resources is in our opinion a step in that direction. In this sense, we would like to point that we think that IPv6 is the path for subsequent steps.

7. Acknowledgments Figure 4: File insertion, retrieval and listing

These ongoing efforts are co-funded by the University of the Basque Country (Euskal Herriko

Proceedings of the 19th International Conference on Advanced Information Networking and Applications (AINA’05) 1550-445X/05 $20.00 © 2005 IEEE

Unibertsitatea / Universidad del País Vasco) and Contec S.L., a private company. We also would like to thank the European IST Project Euro6ix for giving our experimental network native IPv6 connectivity for the tests. Our project will finish in December, 2004.

8. References [1] E. Jacob, J.J. Unzilla, M.V. Higuero, P. Saiz, “Design of a commercial service based on Grid technologies”.”TERENA Networking Conference”, 2004. [2] eInfrastructure Reflection Group White Paper, version 5.51, available at http://www.heanet.ie/einfrastructures/ White_Paper_version_5.51.pdf [3] R. J. Anderson, “The Eternity Service”, presented in “Pragocrypt 96”, 1996, Praga, Pgs. 242-252. [4] I. Clarke, O. Sandberg, B. Wiley, T.W. Hong, ”Freenet: A Distributed Anonymous Information and Retrieval System”, 2001, “Lecture Notes in Computer Science", Volume 2009. [5] J. Kubiatowicz et al, “OceanStore: An Architecture for Global-Scale Persistent Storage” ,2000, “Proceedings of ACM Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems”

[6] Oram, A., “Peer-to-Peer : Harnessing the Power of Disruptive Technologies”, O'Reilly, 2001 [7] C. Pinedo, E. Jacob, J.J. Unzilla, M.V. Higuero, “Análisis de los protocolos P2P y su aplicación al almacenamiento perdurable".” (A review of P2P protocols and its application to durable storage), Proceedings URSI 2004 Conference [8] E. Jacob, F. Liberal, J.J Unzilla, “PKIX-based certification infrastructure implementation adapted to non-personal end entities” Future Generation Computer Systems. Elsevier. ISSN: 0167-739X Vol: 19 - 2, Feb. 2003. Pgs.: 263-275 [9] J. Matías, E. Jacob, J.J. Unzilla, M.V. Higuero, “Transición a IPv6: Un ejemplo con aplicación real” (Transition to IPv6, an example with real applicability), Proceedings URSI 2004 Conference. [10] B. Cohen, BitTorrent protocol specification. http://bitconjurer.org/BitTorrent/ protocol.html [11] J. Mohácsi, “Peer-to-peer filetransfer protocols and IPv6”, NIIF/HUNGARNET TF-NGN meeting, 1/Oct/2004, available at: http://www.terena.nl/tech/task-forces/tf-ngn/ presentations/ tf-ngn15/20040930_jm_ p2p_ipv6.pdf [12] D. Qiu and R. Srikant. “Modeling and Performance Analysis of BitTorrent-Like Peer-to-Peer Networks,” Proc. ACM SIGCOMM, Portland, OR, 2004.

Proceedings of the 19th International Conference on Advanced Information Networking and Applications (AINA’05) 1550-445X/05 $20.00 © 2005 IEEE

Lihat lebih banyak...

A dual (IP4/IPv6) \'durable storage\' commercial service

Descripción

Comentarios