A cost model for long-term compressed data retention

Kewen Liao, Alistair Moffat, Matthias Petri, Anthony Wirth

Research output: Chapter in Book/Report/Conference proceedingConference Paper published in ProceedingsResearchpeer-review

Abstract

Vast amounts of data are collected and stored every day, as part of corporate knowledge bases and as a response to legislative compliance requirements. To reduce the cost of retaining such data, compression tools are often applied. But simply seeking the best compression ratio is not necessarily the most economical choice, and other factors also come in to play, including compression and decompression throughput, the main memory required to support a given level of on-going access to the stored data, and the types of storage available. Here we develop a model for the total retention cost (TRC) of a data archiving regime, and by applying the charging rates associated with a cloud computing provider, are able to derive dollar amounts for a range of compression options, and hence guide the development of new approaches that are more cost-effective than current mechanisms. In particular, we describe an enhancement to the Relative Lempel Ziv (RLZ) compression scheme, and show that in terms of TRC, it outperforms previous approaches in terms of providing economical long-term data retention.

Original languageEnglish
Title of host publicationWSDM 2017 - Proceedings of the 10th ACM International Conference on Web Search and Data Mining
PublisherAssociation for Computing Machinery, Inc
Pages241-249
Number of pages9
ISBN (Electronic)9781450346757
DOIs
Publication statusPublished - 2017
Externally publishedYes
Event10th ACM International Conference on Web Search and Data Mining, WSDM 2017 - Cambridge, United Kingdom
Duration: 6 Feb 201710 Feb 2017

Conference

Conference10th ACM International Conference on Web Search and Data Mining, WSDM 2017
CountryUnited Kingdom
CityCambridge
Period6/02/1710/02/17

Fingerprint

Costs
Data compression
Cloud computing
Throughput
Data storage equipment
Compliance

Cite this

Liao, K., Moffat, A., Petri, M., & Wirth, A. (2017). A cost model for long-term compressed data retention. In WSDM 2017 - Proceedings of the 10th ACM International Conference on Web Search and Data Mining (pp. 241-249). Association for Computing Machinery, Inc. https://doi.org/10.1145/3018661.3018738
Liao, Kewen ; Moffat, Alistair ; Petri, Matthias ; Wirth, Anthony. / A cost model for long-term compressed data retention. WSDM 2017 - Proceedings of the 10th ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, Inc, 2017. pp. 241-249
@inproceedings{60a759c74f6e4cdf9d864614ece103bb,
title = "A cost model for long-term compressed data retention",
abstract = "Vast amounts of data are collected and stored every day, as part of corporate knowledge bases and as a response to legislative compliance requirements. To reduce the cost of retaining such data, compression tools are often applied. But simply seeking the best compression ratio is not necessarily the most economical choice, and other factors also come in to play, including compression and decompression throughput, the main memory required to support a given level of on-going access to the stored data, and the types of storage available. Here we develop a model for the total retention cost (TRC) of a data archiving regime, and by applying the charging rates associated with a cloud computing provider, are able to derive dollar amounts for a range of compression options, and hence guide the development of new approaches that are more cost-effective than current mechanisms. In particular, we describe an enhancement to the Relative Lempel Ziv (RLZ) compression scheme, and show that in terms of TRC, it outperforms previous approaches in terms of providing economical long-term data retention.",
author = "Kewen Liao and Alistair Moffat and Matthias Petri and Anthony Wirth",
year = "2017",
doi = "10.1145/3018661.3018738",
language = "English",
pages = "241--249",
booktitle = "WSDM 2017 - Proceedings of the 10th ACM International Conference on Web Search and Data Mining",
publisher = "Association for Computing Machinery, Inc",

}

Liao, K, Moffat, A, Petri, M & Wirth, A 2017, A cost model for long-term compressed data retention. in WSDM 2017 - Proceedings of the 10th ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, Inc, pp. 241-249, 10th ACM International Conference on Web Search and Data Mining, WSDM 2017, Cambridge, United Kingdom, 6/02/17. https://doi.org/10.1145/3018661.3018738

A cost model for long-term compressed data retention. / Liao, Kewen; Moffat, Alistair; Petri, Matthias; Wirth, Anthony.

WSDM 2017 - Proceedings of the 10th ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, Inc, 2017. p. 241-249.

Research output: Chapter in Book/Report/Conference proceedingConference Paper published in ProceedingsResearchpeer-review

TY - GEN

T1 - A cost model for long-term compressed data retention

AU - Liao, Kewen

AU - Moffat, Alistair

AU - Petri, Matthias

AU - Wirth, Anthony

PY - 2017

Y1 - 2017

N2 - Vast amounts of data are collected and stored every day, as part of corporate knowledge bases and as a response to legislative compliance requirements. To reduce the cost of retaining such data, compression tools are often applied. But simply seeking the best compression ratio is not necessarily the most economical choice, and other factors also come in to play, including compression and decompression throughput, the main memory required to support a given level of on-going access to the stored data, and the types of storage available. Here we develop a model for the total retention cost (TRC) of a data archiving regime, and by applying the charging rates associated with a cloud computing provider, are able to derive dollar amounts for a range of compression options, and hence guide the development of new approaches that are more cost-effective than current mechanisms. In particular, we describe an enhancement to the Relative Lempel Ziv (RLZ) compression scheme, and show that in terms of TRC, it outperforms previous approaches in terms of providing economical long-term data retention.

AB - Vast amounts of data are collected and stored every day, as part of corporate knowledge bases and as a response to legislative compliance requirements. To reduce the cost of retaining such data, compression tools are often applied. But simply seeking the best compression ratio is not necessarily the most economical choice, and other factors also come in to play, including compression and decompression throughput, the main memory required to support a given level of on-going access to the stored data, and the types of storage available. Here we develop a model for the total retention cost (TRC) of a data archiving regime, and by applying the charging rates associated with a cloud computing provider, are able to derive dollar amounts for a range of compression options, and hence guide the development of new approaches that are more cost-effective than current mechanisms. In particular, we describe an enhancement to the Relative Lempel Ziv (RLZ) compression scheme, and show that in terms of TRC, it outperforms previous approaches in terms of providing economical long-term data retention.

UR - http://www.scopus.com/inward/record.url?scp=85015345570&partnerID=8YFLogxK

U2 - 10.1145/3018661.3018738

DO - 10.1145/3018661.3018738

M3 - Conference Paper published in Proceedings

SP - 241

EP - 249

BT - WSDM 2017 - Proceedings of the 10th ACM International Conference on Web Search and Data Mining

PB - Association for Computing Machinery, Inc

ER -

Liao K, Moffat A, Petri M, Wirth A. A cost model for long-term compressed data retention. In WSDM 2017 - Proceedings of the 10th ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, Inc. 2017. p. 241-249 https://doi.org/10.1145/3018661.3018738