Archivering data - more than backup

Some documents must be stored for a long time. The (Dutch) law has no specific regulations about how long data must be stored in archives, but it requires companies to state how long their data must be kept available.

This means that every company uses different standards for archiving data, and the type of information might also be different. For instance, the Dutch tax agency forces companies to keep tax records for at least 7 years. But the retention time of sales records might be 2 years, depending on the business. There are however circumstances where data must be kept much longer, sometimes even more than a human lifetime.

Some examples are:

  • (Pension)Insurance companies must keep records of their history of people and claims;
  • Hospitals store medical information during the lifetime of a patient;
  • The Justice department keeps records of crimes (specially if they are not solved) for a long time;
  • Newspaper archives, archives of television networks and government (like image- and sound archives).

Digital format

Data must be kept in such a way that it is certain the data can be read after a long time. This means the digital format (like a Word file or a JPG file), the physical format (like a CD or a magnetic tape) and the storage environment (temperature, humidity) must be such, that data can still be retrieved after several decades.

This is not a simple task.

Let's look back a few decades. In these years mainframes were common in the IT industry. Data was stored on reel-tapes, that are not readable anymore (using 8 or 9 parallel tracks, usually in EBCDIC format instead of ASCII). Data was also kept in propriety formats, like mainframe database tables.

Can you guarantee this data can be read and interpreted today?

After mainframes, PC's became the norm, using applications like Wordstar, Lotus 1-2-3, WordPerfect, MS-Word (how many versions?), Lotus Notes, PDF, etc. Furthermore, there are now audio- and video formats, like BMP, GIF, TIFF, MP3, WAV, MPEG (several versions). The list is too long to comprehend, and all of this was developed in the last 30 years.

Here is an interesting list with file formats and their specifications (47 pages long). It ill not be easy to make sure all these file formats can be read in 30 years time. And probably more file formats will be developed in the years to come.

I recommend to use open standards for storing data as much as possible. These standards should be not too complex, like the ODF format. The ODF format is a zipped file with XML text files in ASCII, that will be readable for a long time, because it's format is well described and not too complicated.

Physical format

Many physical storage format exists. Except the previously mentioned reel-tapes, these are also tape cartridges in many formats, DLT tapes, SDLT tapes, LTO tapes, DDS tapes, CD, DVD, Floppies (3,5", 5 1/4", 8"), etc.

Many archiving technology is storing data in optical formats. While this is much better than magnetic storage on disk or (even worse) on tape, is it not obvious that media like CD's or DVD's are still readable in many years time. How long will it take for Blu-ray- or HD-DVD's to be common media? And what will follow?

Therefore, it is advisable to transfer data that is to be kept for a long time to the latest storage media standard every 10 years (from Floppy to CD's) or to at least move the data to a new copy (burn a 10 year old CD on a new CD).


This entry was posted on Donderdag 05 Juli 2007

Earlier articles

Quantum computing

My Book

Security bij cloudproviders wordt niet beter door overheidsregulering

Passend Europees cloudinitiatief nog ver weg

Data Nederlandse studenten in cloud niet grootschalig toegankelijk voor bedrijven VS

VS kan nog steeds Europese data Microsoft opeisen ondanks nieuwe regels

The cloud is as insecure as its configuration

Infrastructure as code

DevOps for infrastructure

Infrastructure as a Service (IaaS)

(Hyper) Converged Infrastructure

Object storage

Software Defined Networking (SDN) and Network Function Virtualization (NFV)

Software Defined Storage (SDS)

What's the point of using Docker containers?

Identity and Access Management

Using user profiles to determine infrastructure load

Public wireless networks

Supercomputer architecture

Desktop virtualization

Stakeholder management

x86 platform architecture

Midrange systems architecture

Mainframe Architecture

Software Defined Data Center - SDDC

The Virtualization Model

What are concurrent users?

Performance and availability monitoring in levels

UX/UI has no business rules

Technical debt: a time related issue

Solution shaping workshops

Architecture life cycle

Project managers and architects

Using ArchiMate for describing infrastructures

Kruchten’s 4+1 views for solution architecture

The SEI stack of solution architecture frameworks

TOGAF and infrastructure architecture

The Zachman framework

An introduction to architecture frameworks

How to handle a Distributed Denial of Service (DDoS) attack

Architecture Principles

Views and viewpoints explained

Stakeholders and their concerns

Skills of a solution architect architect

Solution architects versus enterprise architects

Definition of IT Architecture

What is Big Data?

How to make your IT "Greener"

What is Cloud computing and IaaS?

Purchasing of IT infrastructure technologies and services

IDS/IPS systems

IP Protocol (IPv4) classes and subnets

Introduction to Bring Your Own Device (BYOD)

IT Infrastructure Architecture model

Fire prevention in the datacenter

Where to build your datacenter

Availability - Fall-back, hot site, warm site

Reliabilty of infrastructure components

Human factors in availability of systems

Business Continuity Management (BCM) and Disaster Recovery Plan (DRP)

Performance - Design for use

Performance concepts - Load balancing

Performance concepts - Scaling

Performance concept - Caching

Perceived performance

Ethical hacking

Computer crime

Introduction to Cryptography

Introduction to Risk management

The history of UNIX and Linux

The history of Microsoft Windows

Engelse woorden in het Nederlands

Infosecurity beurs 2010

The history of Storage

The history of Networking

The first computers

Cloud: waar staat mijn data?

Tips voor het behalen van uw ITAC / Open CA certificaat

Ervaringen met het bestuderen van TOGAF

De beveiliging van uw data in de cloud

Proof of concept

Een consistente back-up? Nergens voor nodig.

Measuring Enterprise Architecture Maturity

The Long Tail

Open group ITAC /Open CA Certification

Human factors in security

Google outage

SAS 70

De Mythe van de Man-Maand

TOGAF 9 - wat is veranderd?

Landelijk Architectuur Congres LAC 2008

InfoSecurity beurs 2008

Spam is big business

De zeven eigenschappen van effectief leiderschap

Een ontmoeting met John Zachman

Persoonlijk Informatie Eigendom

Archivering data - more than backup

Sjaak Laan


Recommended links

Genootschap voor Informatie Architecten
Ruth Malan
Gaudi site
XR Magazine
Esther Barthel's site on virtualization
Eltjo Poort's site on architecture


Feeds

 
XML: RSS Feed 
XML: Atom Feed 


Disclaimer

The postings on this site are my opinions and do not necessarily represent CGI’s strategies, views or opinions.

 

Copyright Sjaak Laan