First off, we worked with the lab team to determine that there were three distinct kinds of data coming from three different types of clients whose data had three different possible “shelf lives.” The first was very transient, short-lived data from walk-in customers looking to have images or entire rolls of film scanned, usually with some additional photo-retouching needs. The workflow for this kind of data meant it was usually in the lab for less than a week and the customer would not be paying for long-term storage. The second type was from customers bringing in 300GB to 1TB of data from larger-scale photo shoots that needed to be batch processed and possibly color corrected and/or retouched. Storage time for this type could be up to 30 days in the lab. Finally, the last kind of data was from customers paying for the long-term storage of millions of images, thus allowing them to call up and order prints on-demand.
Now understanding the life cycle of the data, we proceeded to design a backup method that specifically treated each of the three different types of data, rather than the existing method of trying to pull all of it through a single network connection to the backup server, and then finally writing it all to expensive tapes. With our new system, far less tape would be consumed and the data would never actually leave the servers, but would instead be backed up by the same server that was hosting it.
To achieve this, we developed new policies for the three data types. The first kind of data, from customers that did not have long-term storage needs, was now backed up to three rotating RAID sets instead of tape. Each set would be used for a week in a rotating pattern of Backup A, Backup B, and Backup C. On the 22nd day, the system would go back to Backup A and reuse it, thus overwriting the existing backup data. The end result was fully automated data backup for up to 21 days (or 14 days minimum), which is more than enough time, given the nature of the data and its expected life cycle. Additionally, since no tapes needed to be swapped, this cut roughly 50 percent of the overall tape usage.
The second kind of data, from larger photo shoots, was also backed up with rotating A-B-C RAIDs, but because the customers often wanted to go back and revisit this data later, it was also written out to tape at the end of the A-B-C rotation. Again, the vast majority of this was completely automated and the tapes were further reduced by roughly 25 percent.2
The final data set, requiring long-term storage, was handled in a slightly different manner. Because this kind of data was live in the lab for months on end and didn’t change very often, it was critical to ensure it was protected in case of fire, earthquake, etc. Therefore, it needed to be copied to tape and taken off-site on a regular basis. Since the process of scanning such large volumes of data and writing it to tape can take a significant amount of time, a main concern was that the process could not be completed when the lab was closed and could impact workflow the following morning. To avoid a potential issue, we designed a staged backup system where each night—and in only a few hours—the RAIDs containing the live data were synchronized to a matching set of RAIDs.3 We would then have the backup system scan the secondary set of RAIDs and from there, copy the data onto tape. This meant the lab could be working off the primary RAIDs at full speed while the backup system was making a copy of this data from the secondary RAIDs.4