IJES0602-0309 JIA

September 14, 2017 | Autor: Zhaoyan Shen | Categoría: Computer Architecture

Descripción

Int. J. Embedded Systems, Vol. 6, Nos. 2/3, 2014

167

A real-time flash translation layer via adaptive partial garbage collection Zhaoyan Shen, Xin Li, Lei Ju and Zhiping Jia* School of Computer Science and Technology, Shandong University, Shunhua Road, NO.1500, Jinan, 250101, China E-mail: [email protected] E-mail: [email protected] E-mail: [email protected] E-mail: [email protected] *Corresponding author Abstract: In real-time embedded systems with flash storage memory, partial garbage collection is used to avoid long response latency and provide guaranteed response time upper bound. However, existing partial garbage collection schemes involve too many valid page copy and erase operations, which results in degradation in both system average performance and endurance of flash memory. To solve this problem, in this paper, we propose a real-time hybrid-level flash translation layer (FTL) called RAFTL – a real-time FTL using an adaptive partial garbage collection policy. RAFTL allocates physical blocks mapped to the same logical block on demand to postpone the partial garbage collection process. Moreover, we adopt an adaptive partial garbage collection policy, which combines the centralised and the distributed partial garbage collection policies. The experimental results show that our scheme reduces the valid page copy and erase operations significantly. Also, on the basis of providing a guaranteed worst case system response time, RAFTL provides an average write request response time which is approximate to the time to write one page. Keywords: NAND flash; embedded systems; real-time; partial garbage collection; adaptive policy. Reference to this paper should be made as follows: Shen, Z., Li, X., Ju, L. and Jia, Z. (2014) ‘A real-time flash translation layer via adaptive partial garbage collection’, Int. J. Embedded Systems, Vol. 6, Nos. 2/3, pp.167–175. Biographical notes: Zhaoyan Shen is a Master student at the School of Computer Science and Technology, Shandong University. His main research interests include embedded system, trust models and trust computing. Xin Li is an Associate Professor at the School of Computer Science and Technology of Shandong University. His research interests include energy efficient scheduling, non-volatile memory and embedded system, where he has over 30 publications. Lei Ju is an Associate Professor at the School of Computer Science and Technology, Shandong University. His research interests focus on design, analysis and optimisation of real-time and embedded systems. He has authored a number of refereed publications, and served as the technical programme committee of several international conferences. Zhiping Jia is a Professor at the School of Computer Science and Technology, Shandong University. His research interests include design of high performance and trustworthy distributed and embedded systems. He has published more than 40 research papers in refereed international conferences and premier journals. His research is currently supported by the Natural Science Foundation of China (NSFC) and National 863 Program.

1

Introduction

NAND flash has been widely used in recent years, especially in some real-time systems. However, NAND flash cannot commit in-place updates. An update to a page (the minimum of write) is not available unless a larger region containing the page (known as a block) is erased

Copyright © 2014 Inderscience Enterprises Ltd.

firstly. Erasing one block includes one erase and several valid page copy operations. And erase operation on a block is slower by an order of magnitude. These characters may result in long and uncertain inter-operation latencies that are unacceptable for real-time systems. To solve this problem,

168

Z. Shen et al.

several real-time FTL schemes (Chang et al., 2004) using partial garbage collection have been proposed. FTL is an intermediate layer between the file system and the flash chip. FTL cannot only translate a logical address given by file systems to a physical address in flash memory but also reclaim spaces by erasing obsolete blocks in flash, known as garbage collection (Chung et al., 2009). During garbage collection, several valid page copy operations may be done first to copy the valid pages in the victim block which is going to be reclaimed to an empty block before erasing the block. But during this process, some unnecessary erase and valid page copy operations may not only reduce endurance (Wang et al., 2010, 2011) of the flash memory but also result in a large system average response time and the uncertain inter-operation. Garbage collection policies can be divided into two kinds: non-preemptive and partial. Non-preemptive garbage collection (Qin et al., 2011b; Gupta et al., 2009) is to centralise all the victim blocks and then collect them all at one time. While partial garbage collection is to divide the process into several steps and interleave these steps with read/write requests. Centralised partial garbage collection cannot reclaim a new victim block until the current reclaim process is done. Distributed partial garbage collection is free of this restriction. Several schemes (Choudhuri and Givargis, 2008; Qin et al., 2012; Chang and Kuo, 2002) have been proposed to provide predictable performance with partial garbage collection. Two such schemes are GFTL (Choudhuri and Givargis, 2008) and RFTL (Qin et al., 2012). GFTL maintains a garbage collection queue (GCQ) to conduct the centralised partial garbage collection. And RFTL conducts the distributed partial garbage collection by dividing the physical blocks in system into three groups. However, both of these schemes take too many efforts to provide a response time upper bound, which imports a mass of unnecessary erase and valid page copy operations and at last result in degradation in system average response time. In this paper, we design a real-time flash translation layer using an adaptive garbage collection policy called RAFTL. On the premise of providing a guaranteed real-time service, RAFTL improves the system average response time and endurance of the flash significantly by reducing the valid page copy and erase operations. Based on the theory that a postponed reclamation of a victim block may reduce the valid pages within it, since a later rewrite operation may make the original valid page invalid, RAFTL adopts a hybrid-level address mapping policy in which we allocate various physical blocks for a logical block on demand to postpone the garbage collection process (Qin et al., 2011a, 2011b). More importantly, we propose an adaptive partial garbage collection policy which combines the centralised and distributed partial garbage collection policies. Using this policy, RAFTL reduces the overhead consumed by garbage collection significantly thus providing an average write response time close to the time used to write one page. We evaluate our scheme with a set of traces running on a NAND flash memory simulator. The results show that,

compared with RFTL, RAFTL achieves a 48.1% decrease in block erase operations and a 98.1% decrease in valid page copy operations. Moreover, we get a 43.5% improvement in the average system response time compared with RFTL, and an 83.2% improvement compared with GFTL. The main contributions of this paper are summarised as follows: 1

We adopt a real-time FTL scheme that allocates various physical blocks for a logical block on demand. This scheme reduces valid page copy and erase operations remarkably while providing strict service time guarantees for read and write requests.

2

We propose an adaptive partial garbage collection policy, which combines the centralised and the distributed partial garbage collection policies. Distributed partial garbage collection is adopted when there are enough empty blocks in NAND flash memory. Once the number of empty blocks declines to a threshold, centralised partial garbage collection policy takes over.

The rest of this paper is organised as follows. Section 2 provides background and related work. Section 3 gives the problem formulation. Section 4 demonstrates the RAFTL technique. Section 5 presents the experimental results. In Section 6, we give the conclusion and future work.

2

Background and related work

2.1 NAND flash storage system architecture NAND flash is organised as blocks, and each block is divided into multiple pages. A page is made up of the data area and the spare area also known as the out of band (OOB) area. The OOB area is used to store meta-information such as error correction code of the corresponding page. A read/write operation can be committed in an entire page or an OOB area. Table 1 displays the basic operation specifications for NAND flash (Gupta et al., 2009). Table 1

NAND flash specifications Samsung 16 MB small block

Samsung 128 MB large block

Block size

16 KB

64 KB

Page size

512 B

2 KB

Characteristics

OOB size

16 B

64 B

Read page

36 μs

25 μs

Read OOB

10 μs

25 μs

Write page Erase

200 μs

300 μs

2,000 μs

2,000 μs

Figure 1 gives a typical architecture of flash memory storage systems. As shown, flash translation layer (FTL) is

A real-time flash translation layer via adaptive partial garbage collection an intermediate layer between the memory technology device (MTD) layer (Wang et al., 2010) and file system used to simulate NAND flash as a hard disk. FTL is mainly in charge of three functions: address mapping, wear-levelling and garbage collection. Address mapping is responsible for translating the logical address issued from the file system to a physical address adopted in flash memory chip. Wear-levelling function is to wear down memory blocks as evenly as possible. Garbage collection process reclaims spaces by erasing obsolete blocks, and the process can be divided into three steps. First, choose the victim block. And then copy the valid pages in the victim block to a new block. At last erase the victim block (Chung et al., 2009; Ji and Shin, 2012; Jang and Han, 2010). Figure 1

Architecture for NAND flash storage system

2.2 Related work In the past decades, many FTL techniques have been proposed to improve the system performance of NAND flash storage systems (Qin et al., 2010, 2012, 2011b; Choudhuri and Givargis, 2008; Kim et al., 2008). GFTL (Choudhuri and Givargis, 2008) is the first scheme proposing the partial garbage collection policy to provide strict service time guarantees for both read and write requests. Partial garbage collection policy (Choudhuri and Givargis, 2008; Qin et al., 2012; Choudhuri and Givargis, 2008) is to divide the garbage collection into several steps and then interleave them with read/write requests. In previous works, two kinds of partial garbage collection policies have been proposed: the centralised partial garbage collection policy (Choudhuri and Givargis, 2008) and the distributed partial garbage collection (Qin et al., 2012) policy. Figure 2 gives the difference of these two kinds of policies. Figure 2

Page-level FTL (Chung et al., 2009; Ban, 1995; Ma et al., 2011), block-level FTL (Chung et al., 2009) and hybrid-level FTL (Chung et al., 2009; Choudhuri and Givargis, 2008; Guan et al., 2013; Kim et al., 2002) are the three main mapping schemes in FTL designs. The page-level FTL maps one logical address to one physical address in units of pages, and it realises a high utilisation. But the mapping table in RAM is large. The block-level FTL’s mapping unit is block. The basic idea of block mapping is that the logical page offset within a logical block is identical to the physical page offset within the physical block. Its mapping table is much smaller in comparison but the block utilisation is lower. A hybrid-level FTL uses a block mapping technique to obtain the corresponding physical block. Then it uses a page mapping technique to locate an available empty page or the physical page which stores the requested data within the physical block. Hybrid-level FTL shows better address translation efficiency in cost of a longer worst case response time (Choudhuri and Givargis, 2008; Qin et al., 2012). In this paper, we focus on the hybrid-level FTL scheme. In this study, system response time refers to the time duration from the time when a request is issued from file system to the finishing time of the requested operation. Average system response time refers to the total response time divided by the total number of requests.

169

Distributed and centralised garbage collection policy

Both the centralised and distributed garbage collection policies conduct garbage collection in several steps. For the centralised partial garbage collection policy, different physical blocks must be reclaimed sequentially. In other words, centralised partial garbage collection cannot reclaim another victim block until the current reclaim is done. While for the distributed partial garbage collection policy, the reclamation of several physical blocks can be intersected, the copy and erase operations of several victim block can be done interleaved. GFTL (Choudhuri and Givargis, 2008) uses the centralised partial garbage collection policy. Full blocks in NAND flash memory are centrally organised in a GCQ, and garbage collections are consecutively performed as long as the queue is not empty. The partial garbage collection process is easy to control in GFTL. However, in GFTL, garbage collection is triggered once a physical block is full, and it introduces a large amount of extra valid page copy and erase operations. RFTL (Qin et al., 2012) adopts the distributed partial garbage collection policy. Blocks in RFTL are divided into three kinds: primary blocks, replacement blocks and buffer blocks. And a logical block owns three physical blocks belonging to the three different kinds of blocks respectively. Garbage collection in RFTL is invoked once a write request is issued to a full primary block. RFTL postpones the garbage collection process to some extent. But for logical blocks with frequently write

170

Z. Shen et al.

requests, erase operations are still too many. Besides, for logical blocks rarely written, RFTL leads to waste of some physical blocks. As a good supplement for above schemes, we focus on designing a scheme taking advantages of both the centralised and the distributed partial garbage collection to improve the performance.

3

by read and write operation respectively. A lower bound on p (denoted as L(p)), gives the maximum request arrival rate that an FTL can handle. The other symbols and their definitions used in this paper are listed in Table 2. Table 2

Problem formulation

The variable valid page copy operations can result in long and uncertain garbage collection latencies. By postponing the garbage collection process, we can decrease the valid page copy and erase operations effectively, and further improve the performance of the system (Qin et al., 2011b; Chang and Kuo, 2002). Figure 3 gives an example of garbage collection process. For simplicity, we assume that each block owns eight pages. As shown in Figure 3(a), a victim block owns six valid pages. To reclaim this block, six valid page copy and one erase operations are needed. One valid page copy operation can be treated as one read operation plus one write operation. Based on the specifications of the 128 MB NAND flash shown in Table 1, the time overhead to reclaim the victim block is 6*(25 + 300) + 2,000 = 3,950 μs. However, if the process was postponed by n requests, as shown in Figure 3(b), only 2 valid pages exist in the victim block, and the time overhead is 2,650 μs, which is much less than the overhead in Figure 3(a). Figure 3

Process of garbage collection

Symbol

Definition

Ter

Time to erase a block

Trdoob

Time to read an OOB area

Trdpg

Time to read a page

Twrpg

Time to write a page

π

Pages per block

Ln

Logical block numbers

Pn

Physical block numbers

UM

Upper bound of physical blocks to one physical block

Thgc

Threshold of empty block for garbage collection

Note that, Ter is the longest atomic operation in flash memory since the erasure of one block cannot be interrupted (Chang and Kuo, 2002; Qin et al., 2011a). So Ter is the minimum time a request will be blocked in the worst case and L(p) should be Ter in the ideal case. Base on the real-time task model, we formulate the problem as follows. Given an NAND flash memory chip and a task set V = {T1, T2, …, Tn}, how to design a FTL scheme that can reduce the unnecessary valid page copy and erase operations to the largest extend during the garbage collection process while providing deterministic service with an upper bound L(p) that is close to Ter.

4

In order to study the process formally, we model the NAND flash storage system as follows. We treat each I/O request coming from the file system to the FTL as an independent real-time task T = {p, e, d}, where p is the minimum separation time between two consecutive disk operations, e is the execution time for read/write operations and d is the deadline. Without loss of generality, we assume that p is equal to d. Two kinds of tasks exist in the system: read request tasks Tr = {pr, er}, and write request tasks Tw = {pw, ew}. pr and pw mean how often requests are issued from the file system. er and ew represent the time consumed

Symbols and definitions

RAFTL scheme

We have seen that when empty blocks in a NAND flash system are adequate, there is no need to do garbage collection too early. Since earlier garbage collection includes more valid page copy and erase operations. In this section, we describe the technique details on how RAFTL shrinks these operations with an adaptive garbage collection policy. Firstly, we demonstrate the address mapping approach in Section 4.1. Then we show the read/write operation and the adaptive garbage collection policy in Section 4.2 and Section 4.3, respectively. Finally, we give the garbage collection threshold (Thgc) analysis in Section 4.4.

4.1 Hybrid-level address mapping In RAFTL, one logical page number (LPN) is divided into a logical block number (LBN) and a block offset (BO). Each logical block owns a mapping slot as shown in Figure 4. Mi physical blocks are allocated to the ith logical block to serve the write requests, and Mi varies in an on-demand way (Qin et al., 2011b) with an upper bound UM. The indices of these Mi physical blocks are maintained in these mapping slots

A real-time flash translation layer via adaptive partial garbage collection (i.e., PBN_1, PBN_2, …). Valid page counts for the Mi physical blocks are stored as VP. For each logical block, a page-level mapping table is used to map one logical page to any physical page in its corresponding Mi physical blocks. The page mapping table is divided into N sub-tables (i.e., PMT_1, PMT_2, …, PMT_N), and each sub-table is stored in the OOB area of the newly allocated page just like MNFTL and RFTL (Qin et al., 2011b, 2012). Suppose each block contains π pages, and each OOB area can store m (0 < m ≤ π) page mapping entries, then N = ⎡π / m⎤. The N indices of the pages that contain the sub-tables are recorded in the mapping slot correspondingly. In order to obtain the address of a page, we just need to read one OOB area using the page-level mapping sub-table indices. Figure 4

RAFTL mapping slots

Notes: LBN: logical block number; LPN: logical page number; PMT: page mapping table; PBN: physical block number; VP: valid page count; BO: block offset

4.2 Read/write operation A write request is issued from the file system with data and a LPN, e.g., write(data, lpn), then the LPN is translated into a LBN and a BO. The first write request to a given LBN is served by the first free page in one of the physical blocks that are mapped to the logical block. Once a physical block is distributed to a logical block, pages in the physical block are allocated sequentially independent of whether a write or a rewrite operation is being handled. After a new physical page is allocated, the corresponding sub-table should be updated with the page index firstly, and then write the new physical page number to the OOB area with the data to the data area. The new allocated page index should be stored in the mapping slot to record the mapping information. For a rewrite operation, firstly read out the old mapping sub-table from the OOB area of the page pointed to by the indices in the mapping slot, then update the corresponding mapping sub-table and at last write it to the OOB area of the new page. The page table index in the mapping slot should also be updated with the new physical page. After π write requests, the physical block becomes full, if the number of physical blocks mapped to the logical block is less than the threshold UM and the number of free blocks in the NAND flash storage system are greater than Thgc (the threshold of empty blocks, to be explained in Section 4.4), a new free physical block will be allocated to the logical block. Otherwise, the partial garbage collection will be invoked before the allocation of a new free physical block. When the very first write request to a logical block comes, the page mapping table index in its mapping slot is null, with no garbage collection to be invoked and no OOB

171

area to be searched under this circumstance. This is the best case for a write request, and the response time is Twrpg. However, in the worst case that the partial garbage collection operation is triggered, the write request will be followed by an erase operation. Then the response time is Ter + Trdoob + Twrpg. A read request is issued from the file system with a LPN, e.g., read(lpn). Then the LPN is translated into a LBN and a BO. With the LBN and BO, we can get the physical page which contains the page mapping sub-table, and from the sub-table, we can get the physical page number. It is obvious that both the best case response time and the worst case response time for a read request are Trdoob + Trdpg.

4.3 The adaptive garbage collection policy RAFTL adopts an adaptive partial garbage collection policy that combines both centralised partial garbage collection and the distributed partial garbage collection policies. In RAFTL, when there are enough empty blocks in the NAND flash memory system, we use the distributed partial garbage collection policy. And the distributed garbage collection operation is invoked once the number of the physical blocks mapped to one logical block to be written is equal to UM. On the other hand, the centralised partial garbage collection policy is carried out when the empty blocks in the NAND flash memory system are less than Thgc. And during the centralised partial garbage collection process, each request is followed by one step of garbage collection operation. Figure 5 gives the process of the distributed partial garbage collection. Figure 5

The distributed partial garbage collection

In RAFTL, if one logical block is written frequently, the physical blocks mapped to it are allocated on demand as

172

Z. Shen et al.

shown in Figures 5(a), 5(b) and 5(c). When the first write request w1 to the logical block comes, the physical block B1 is used to serve it, and then after the following write requests’ (w2, w3, …, w8) arrival, the block B1 is full, and a new block B2 is allocated. Notice that the LPN of (w2, w3 …, w8) may be the same, so as Figure 5(a) shows, some pages of B1 become invalid. Once the number of physical blocks (B1, B2, …, BM) mapped to this logical block reaches UM, and the last physical block has only k free pages as shown in Figure 5(c), the distributed garbage collection will be triggered. These k free pages are used to store the valid pages in the victim block and to serve the following write requests (Wang et al., 2010). As the garbage collection process begins, firstly, we chose a victim block from the physical blocks mapped to this logical block with least valid pages. In Figure 5(c), suppose B1 owns the largest number of invalid pages, so the victim block should be B1. As the logical block owns UM physical blocks, the victim block contains α = ⎣π/UM⎦ valid pages at most, with the symbols shown in Table 2, we can partition the garbage collection operation into β steps:

β = ⎡⎢{α × (Trdpg + Twrpg + Trdoob ) + Ter } Ter ⎤⎥

(1)

With α pages for valid page copy operations, and β pages to serve the following write requests, we get k = α + β. After β steps, the valid pages are copied to the last physical block which is mapped to the logical block, and the victim block is erased and removed from the mapping slot corresponding to the logical block as Figure 5(d) shows. Then RAFTL chooses an empty block with the least erase time to serve the following write request to the logical block, and update the mapping slot in RAM. Once the empty blocks in the NAND flash memory system are less than Thgc, the centralised partial garbage collection policy is used instead of the distributed partial garbage collection policy. Different from GFTL, we do not have a GCQ to manage the blocks to be reclaimed, we just choose a block with the most invalid pages as the victim block after one block is erased. As Figure 2 shows, during the centralised partial garbage collection, each read/write request is followed by one part of the garbage collection operation. It is obvious that the empty blocks in the NAND flash system will grow with the centralised partial garbage collection process being done, and when the number of empty blocks gets to 2*Thgc, the centralised partial garbage collection policy is replaced by the distributed partial garbage collection policy again.

4.4 The analysis of Thgc Thgc is a lower bound on empty blocks in the NAND flash storage system to trigger the centralised partial garbage collection. So the following requests should be well served by the Thgc empty blocks in any case or say in the worst case. In this subsection, we will analyse the limit on Thgc.

Suppose the total LBN in the NAND flash storage system is Ln, and the total physical block number is Pn. In extreme cases, when the centralised partial garbage collection begins, all pages in the allocated physical blocks are full in the NAND flash storage system. The following scenario is a worst case write arrival sequence when the centralised partial garbage collection policy is employed: Ln write requests arrive such that each request is to a unique logical block. In this case, every following write request needs an empty block to be allocated, and this will lead to a persistent reduction in empty blocks in the NAND flash storage system. After the Ln write requests are served, Ln empty blocks are allocated to the logical blocks, and during the Ln write requests, ⎣Ln/β⎦ victim blocks are reclaimed. So during the centralised partial garbage collection, to guarantee enough empty blocks, we set the threshold Thgc to the difference of Ln and ⎣Ln/β⎦. Thgc = Ln − ⎣⎢ Ln / β ⎥⎦

(2)

To give an example of Thgc, suppose that Pn = 3Ln, π = 64 and UM = 8. With the parameters shown in Table 1, it is easy to get that β = 3. Put β in the equation (2), Thgc = (2/9)Pn. So based on this scenario, Thgc is equal to or greater than (2/9)Pn.

5

Evaluation

In this section, we present the experimental results with analysis. We implemented three FTL schemes: RAFTL, RFTL and GFTL on a NAND flash memory simulator we developed. Both RFTL and GFTL are representatively deterministic FTL schemes. GFTL adopts the centralised partial garbage collection policy and RFTL adopts the distributed partial garbage collection policy during the garbage collection process. Firstly, we introduce the experimental setup, and then we present the experimental results and give the analysis.

5.1 Experimental setup Table 3 summarises our experimental platform. As shown, three NAND flash memory chips with the capacity of 128 MB, 256 MB and 512 MB separately are simulated. Table 3

Experimental step

Hardware

Baseline FTLs

CPU

Intel Core i5 3 GHZ

Disk space

500 GB

RAM

4 GB

Flash size

128/256/512 MB

Flash simulator

NAND flash simulator

A real-time flash translation layer via adaptive partial garbage collection To evaluate the performance of different FTL schemes, following benchmarks are used. Media is a real-world trace we collect using DiskMon from a PC with Window 7 on NTFS file system downloading, playing and deleting multimedia files (e.g., MP3). It is consist of 100,477 read requests and 431,996 write requests. Financial (‘OLTP trace from umass trace repository’, http://traces.cs.umass.edu/ index.php/storage/storage) is a well-known write-dominant trace, it is made up of 1,141,778 read requests and 3,858,222 write requests, with each request visiting variable count of pages. The block size, page size, number of pages in a block, and the size of the OOB for each page are set to 128 KB, 2 KB, 64 and 64 Bytes, respectively. Besides, in our experiments, the parameters are based on the flash memory data sheet values shown in Table 1, and we assume the number of physical blocks is twice larger than that of the logical blocks.

5.2 Results and analysis In this section, we present the simulation results of the proposed RAFTL scheme, RFTL scheme and GFTL scheme in terms of real-time, count of valid page copy and erase operations, system average response time and average write request response time. Table 4 presents the performance of RAFTL scheme for the two traces based on varying trace utilisation and physical block upper bound UM. The column under T% means how many records we used in the trace file. The columns under Rbest, Ravg, Rworst means the best case, the average case and the worst case response time for read requests, respectively. The next three columns Wbest, Wavg and Wworst represent the best case, the average case and the Table 4

173

worst case response time for write requests. The average response time for all requests, the total number of valid page copy operations and the total number of erase operations are denoted by Tavg, Σcp and Σer, respectively. From Table 4, we see that our scheme provides guaranteed service for both read and write operations. The best case response time, worst case response time and average response time for a read request are all 50 μs, which is equal to Trdoob + Trdpg. If a write request is not a rewrite operation and no garbage collection is triggered, the best response time for the write request is 300 μs. However, when the garbage collection operation is invoked, the worst case response time for the write request is 2,325 μs, which is equal to Ter + Trdoob + Twrpg. Based on Table 4, we also see that the average response time and the count of valid page copy and erase operations vary according to the upper bound of the physical blocks mapped to one logical block UM. And it is obvious that when UM equals to 4, the system performance is not as good as that when UM equals to 8 or 12. This is because when UM is small (e.g., 4), the physical blocks mapped to one logical block are few, and this will lead to earlier invocation of the distributed partial garbage collection process, and then increase the unnecessary valid page copy and block erase operations, at last result in the average response time to rise. On the other hand, when UM gets to 12, although the distributed partial garbage collection operation is postponed, the empty blocks in system decrease rapidly, then the centralised partial garbage collection is triggered in advance. And in this case, the valid page copy operations decrease, while the erase operations rise, so we can see that the system response time changes little no matter the upper bound UM is 8 or 12.

RAFTL performance

Benchmarks

T%

UM

Rbest (μs)

Ravg (μs)

Rworst (μs)

Wbest (μs)

Wavg (μs)

Wworst (μs)

Tavg (μs)

Σcp

Σer

Media

50

4

50

50

50

300

351

2,325

296

14,850

14,646

50

8

50

50

50

300

344

2,325

291

12,490

12,736

50

12

50

50

50

300

345

2,325

291

12,387

12,854

100

4

50

50

50

300

355

2,325

310

25,004

23,501

100

8

50

50

50

300

347

2,325

304

20,665

20,260

100

12

50

50

50

300

347

2,325

304

20,536

20,272

50

4

50

50

50

300

360

2,325

330

90,889

82,652

50

8

50

50

50

300

349

2,325

321

47,586

71,184

50

12

50

50

50

300

349

2,325

321

47,569

71,309

100

4

50

50

50

300

361

2,325

328

131,014

163,860

100

8

50

50

50

300

348

2,325

318

58,376

136,550

100

12

50

50

50

300

348

2,325

318

57,928

136,554

Financial

174

Z. Shen et al.

Also, we can figure out that the value of Wavg and Tavg do not change much when the utilisation of the trace files rises from 50% to 100% from Table 4, this indicates that RAFTL scheme can work stably no matter how many read or write requests are issued. Figure 6

Copy and erase operations (see online version for colours)

case, but in RAFTL and RFTL, the address mapping information can be obtained by reading only one OOB area. By employing the adaptive partial garbage collection, RAFTL saves 98.1% of valid page copy operations and 48.1% of erase operations compared with RFTL. And compared with GFTL, RAFTL achieves a 99.7% improvement in valid page copy operations and an 82.3% in erase operations. In terms of system average response time, RAFTL gets a 43.5% reduction compared with RFTL and a 70.3% reduction compared with GFTL. Table 5 Traces

Metrics

Media

Financial

Figure 6 is the slip chart of the number of valid page copy and erase operations with regard to the value of UM which varies from 4 to 16. In the first phase in which UM grows from 4 to 8, the numbers of valid page copy and erase operation declines greatly. The reason is that, with the increase of UM, the garbage collection process is postponed substantially. This leads to a reduction of valid pages in a victim block. At last, the total count of valid page copy and erase operations cuts down drastically. During the second phase in which UM continues growing to 14, the profile changes little. The reason is as follows. During this phase, the number of valid pages when garbage collection is performed is small. As UM further grows, the reduction of valid pages in the reclaimed blocks is quite limited. While at the same time, centralised partial garbage collection will be triggered earlier, which leads to the number of garbage collections increases. Thus the number of block erase operations is then increased. In the last phase, UM becomes unreasonably large (e.g., 16), and then valid pages in one victim block cannot be reduced any more. Meanwhile, with the triggering time of centralised partial garbage being further moved ahead, the number of garbage collections is continuously increased. As a result, both the valid page copy and erase operations rise. Table 5 compares the system performances of RAFTL, RFTL and GFTL using the same trace and flash size. For the financial trace, we use the first one million rows in it. From Table 5, we can read that all the three FTL schemes can provide a worst case response time upper bound, but RAFTL and RFTL achieve a 40.38% improvement in the worst case response time compared with GFTL. And this results from that GFTL requires searching all the OOB areas of one block in order to read the valid page in the worst

Performance comparison FTL schemes RAFTL

RFTL

GFTL

Wworst (μs)

2,325

2,325

3,900

Wavg (μs)

347

652

1,966

Tavg (μs)

304

558

1,828

Σcp

2.06e4

1.07e6

1.77e6

Σer

2.02e4

4.13e4

3.28e4

Σoob

1.84e6

3.95e6

7.73e6

Wworst (μs)

2,325

2,325

3,900

Wavg (μs)

349

628

2,099

Tavg (μs)

319

565

1,902

Σcp

2.70e4

1.42e6

9.08e6

Σer

3.03e4

5.84e4

1.71e5

Σoob

2.52e6

5.31e6

2.91e7

By employing the adaptive partial garbage collection, RAFTL saves 98.1% of valid page copy operations and 48.1% of erase operations compared with RFTL. And compared with GFTL, RAFTL achieves a 99.7% improvement in valid page copy operations and an 82.3% in erase operations. In terms of system average response time, RAFTL gets a 43.5% reduction compared with RFTL and a 70.3% reduction compared with GFTL. All the improvements are based on the fact that, through the adaptive partial garbage collection, RAFTL postpones the garbage collection process. As the garbage collection process is postponed the valid pages in the victim block shrink and the utilisation of blocks improves, at last the time overhead is reduced responsively. So by reducing the unnecessary valid page copy and block erase operations, the adaptive partial garbage collection improves the system performance remarkably.

6

Conclusions and future work

In this paper, a real-time flash translation layer using adaptive partial garbage collection policy (called RAFTL) for NAND flash memory storage systems is proposed. On the basis of providing a real-time service guarantee, it reduces the valid page and erase operations significantly during the garbage collection process. We introduce a novel hybrid-level address mapping approach. In this approach, we allocate physical blocks to the same logical block on

A real-time flash translation layer via adaptive partial garbage collection demand. Besides, we also devise an adaptive partial garbage collection policy which combines the centralised and the distributed partial garbage collection policies. Experimental results show that, compared with RFTL, our scheme achieves a 48.1% improvement in erase operations and a 98.1% improvement in valid page copy operations. In the future, we will improve RAFTL scheme taking use of hot data. Further, employing RAFTL in 3D flash (Wang et al., 2012) and solid-state disks (SSDs) is also a challenging task to complete.

Acknowledgements This research is sponsored by the Natural Science Foundation of China (NSFC) under grants No. 60903031, 61070022, and 61202015, Research Fund for the Doctoral Programme of Higher Education of China (RFDP) grant 20120131120033, National 863 Programme 2013AA013202, Shandong Provincial Natural Science Foundation grant ZR2011FQ036, Outstanding Young Scientist Foundation of Shandong Province grant BS2010DX017, and Chongqing cstc2012ggC40005.

References ‘OLTP trace from umass trace repository’ [online] http://traces.cs.umass.edu/index.php/storage/storage (accessed 21 April 2013). Ban, A. (1995) ‘Flash file system’, US patent 5, 404, 485, 4 April. Chang, L. and Kuo, T. (2002) ‘A real-time garbage collection mechanism for flash-memory storage systems in embedded systems’, in RTCSA’02, March. Chang, L., Kuo, T. and Lo, S-W. (2004) ‘Real-time garbage collection for flash-memory storage systems of real-time embedded systems’, TECS, Vol. 3, No. 4, pp.837–863. Choudhuri, S. and Givargis, T. (2008) ‘Real-time access guarantees for NAND flash using partial block cleaning’, in SEUS’08, pp.138–149. Choudhuri, S. and Givargis, T. (2008). ‘Deterministic service guarantees for NAND flash using partial block cleaning’, in CODES+ISSS’08, pp.19–24. Chung, T., Park, D., Park, S., Lee, D., Lee, S. and Song, H. (2009) ‘A survey of flash translation layer’, Journal of Systems Architecture, Vol. 55, No. 2009, pp.332–343.

175

Guan, Y., Wang, G., Wang, Y., Chen, R. and Shao, Z. (2013) ‘Blog: block-level log-block management for NAND flash memory storage system’, LCTES’13, 14th, ACM. Gupta, A., Kim, Y. and Urgaonkar, B. (2009) ‘DFTL: a flash translation layer employing demand-based selective caching of page-level address mappings’, in ASPLOS’09, pp.229–240. Jang, K-H. and Han, T-H. (2010) ‘Efficient garbage collection policy and block management method for NAND flash memory’, in ICMEE 2010 2nd, pp.V1-327–V1-331. Ji, S. and Shin, D. (2012) ‘An efficient garbage collection for flash memory-based virtual memory systems’, IEEE Transactions on Consumer Electronics, November, Vol. 56, No. 4, pp.2355–2363. Kim, J., Kim, J.M., Noh, S., Min, S.L. and Cho, Y. (2002) ‘A space-efficient flash translation layer for compact flash systems’, IEEE Transactions on Consumer Electronics, May, Vol. 48, No. 2, pp.366–375. Kim, J., Lee, D., Lee, C-G., Kim, K. and Ha, E.Y. (2008) ‘Real-time program execution on NAND flash memory for portable media players’, in RTSS ‘08, pp.244–255. Ma, D., Feng, J. and Li, G. (2011) ‘LazyFTL: a page-level flash translation layer optimized for NAND flash memory’, in SIGMOD’11, pp.1–12. Qin, Z., Wang, Y., Liu, D. and Shao, Z. (2010) ‘Demand-based block-level address mapping in large-scale NAND flash storage system’, CODES+ISSS’10. Qin, Z., Wang, Y., Liu, D. and Shao, Z. (2011a) ‘A two-level caching mechanism for demand-based page-level address mapping in NAND flash memory storage system’, RTAS’11. Qin, Z., Wang, Y., Liu, D. and Shao, Z. (2011b) ‘MNFTL: an efficient flash translation layer for MLC NAND flash memory storage systems’, in DAC’11, pp.17–22. Qin, Z., Wang, Y., Liu, D. and Shao, Z. (2012) ‘Real-time flash translation layer for NAND flash memory storage systems’, in RTAS, pp.35–44. Wang, Y., Bathen, L., Shao, Z. and Dutt, N. (2012) ‘3D-FlashMap: a physical-location-aware block mapping strategy for 3D NAND flash memory’, DATE’12, 15th Design. Wang, Y., Liu, D., Qin, Z. and Shao, Z. (2011) ‘An endurance-enhanced flash translation layer via reuse for NAND flash memory storage systems’, in DATA’11 14th Design. Wang, Y., Liu, D., Wang, M., Qin, Z., Shao, Z. and Guan, Y. (2010) ‘RNFTL: a reuse-aware NAND flash translation layer for flash memory’, in LCTES’10, pp.163–172.

Lihat lebih banyak...

IJES0602-0309 JIA

Descripción

Comentarios