Over the past couple of years the terms Backup and DeDuplication have become synonomous. The vast majority of organizations today, regardless of size, have either implemented DeDuplication within their environment, are considering implementing it, or at a minimum wondering if they would benefit from DeDuplication. This is in large part due to the fact that the industry as a whole has been force feeding the need for DeDuplication in an effort to grab a piece of the pie. Heck, the term DeDupe itself has all but become a household name. Believe me; I’d be sitting on goldmine in the Caribbean enjoying an endless supply of margaritas if I had an algorithm that would allow me to “dedupe” my wife’s shoe collection. But are the shoes really the problem?
The industry in general has conditioned everyone to believe that DeDuplication is THE ANSWER. The fact is what DeDuplication really addresses is the result of the problem, not the problem itself. Don’t get me wrong, DeDuplication is a very powerful technology and can offer some significant benefits to many organizations. But DeDuplication was designed to target a very specific set of requirements. For instance:
- Historically, it was always very difficult and for most companies impossible to replicate backup data offsite. DeDuplication now enables a very efficient way to accomplish this by analyzing the data at a block level. (Now keep in mind it’s not necessarily the “data” itself that made this task difficult in the past, it was the proprietary format that backup applications used that was the issue)
- Many companies now want to minimize the use of tape, or “redefine” the role of tape within their environment. The role of tape has been changing from that of a routine backup role, to more of a dedicated long term archive role. With the evolution of DeDuplication and disk based backups in general, companies can now utilize DeDuplication to store more near term copies of backup on disk, and then leverage tape exclusively to satisfy long term retention requirements.
If your objective is to achieve longer retention of backup data on disk, reduce dependence upon tape, and gain the ability to more efficiently move backup data offsite then DeDuplication alone can certainly enable this. However, the truth is there is so much more we can be doing to address the root problems around backup that are causing the need to even consider a technology such as DeDuplication.
Intelligent active archiving is a good example. When you look at the data in your production environment, it is very common to see 60% – 80% of that data is either not active data or is redundant data. Archiving can intelligently and transparently remove “static” and duplicate data from the production environment while keeping this data fully accessible. As a result, this archived data no longer has to be backed up on a regular basis. This single function alone could eliminate 60% – 80% (if not more) of your daily, weekly, and monthly backup requirements. Now ask yourself, if I reduced my backup requirements by 80% do I still need DeDuplication?
Recovery is another issue that archiving can help address. Most would agree that the restore process using traditional backup applications is unacceptable for the competitive nature of business operations today. Sure, we can now restore from disk, but your data is still held hostage by some proprietary format that requires some restore “process” just to be able to access your own data. During this process you reaffirm your faith by praying to the backup gods for the tapes to be good, or that the data is not corrupted. By leveraging an archive system as part of your backup strategy the data is immediately available in its native format to the end users and applications, completely eliminating the drudging restore process we have all come to regretfully accept as a part of life.
Part of the reason archiving is grossly underutilized by companies today is fear. There is a perception (and to some degree, lack of knowledge) that implementing an archive solution is complex and very costly. The fact that everyone has also become so accepting of the normal “backup” process has also made it very difficult for many to grasp the concept and benefits that archiving can bring to the table. For those who are considering or evaluating an archive solution, there are 5 key components that should be carefully considered. These are as follows:
True Active Archive
True active archive solutions provide the ability to maintain an online and/ or nearline active archive of data. A true active archive solution can act as the primary device and enable direct access to the data from within the archive repository, making it immediately available, without the need to stage the data back to primary disk. Many companies have self branded their solutions as an active archive but do not provide this capability.
Active Content Validation and Self Healing
This should be an absolute essential when evaluating any archive technology. Content validation and self healing enables the solution to automatically and transparently perform file integrity audits and perform advanced features such as block level file repair on disk or tape, block level digital fingerprinting, as well as individual file deletion on disk or tape with media reclamation capabilities.
Metadata Content Aware
This is a key component for any company that wants to or needs to maintain a history of related content (I.e., medical industry with years of patient information, law firm with client data, manufacturing with years of drawings and product data). This function provides the ability to analyze metadata and transparently consolidate related data and files into groups. For example, a manufacturer may have year’s worth of design data for a given product. This function can consolidate all of that data (based upon metadata characteristics) down to 2, 3, or X number of LTO tape cartridges, rather than having this data spread across hundreds of LTO-2, LTO-3, and LTO-4 tapes that the company has used over the years.
Many archive solutions today are designed specifically for email only, or standard file data. The vast majority of companies however are also running some form of OBDC applications such as SQL, Oracle, SAP, Sybase, etc, and those applications often times present the most challenges when it comes to performance, database maintenance, etc. The ability for a solution to transparently analyze and archive data at the row and column level within an application is extremely powerful. Take SQL for instance. Any technical person who understands SQL will tell you that as the database grows the performance degrades substantially and the ongoing maintenance becomes a real burden. By archiving the static row and column data out of the database, and into an “active archive”, you substantially improve the performance and manageability of the database while ensuring that all of the archive data is still 100% visible to the application and end users.
Storage Virtualization and Intelligent Tiering
A key consideration of any archive solution should be the ability to virtualize various types of storage and manage different tiers of storage, including disk and tape. Many solutions today can do this. However, taking this a step further the archive solution should be able to provide intelligent management of the storage tiers based upon conditions around performance, priority requests, as well as the type of physical storage being used such as MAID disk and others. For example, a good archive solution should be able to use disk performance buffers to provide fast access to the most frequently accessed data, as well as perform pre-emptive priority processing for transaction based data. Additionally, these solutions should be able to take advantage of the energy savings benefits that MAID disk can provide by managing the spin cycles of the disk to optimize performance.
There are many additional benefits that a good archive solution can provide to compliment an organization’s backup strategy, such as Replication, File-based DeDuplication, data availability “during” a recovery, 100% hardware “freedom”, compliance, as well as transparent migration support to protect against technology obsolescence. More and more organizations of all shapes and sizes will begin to recognize the value around archiving, and the industry as a whole will have no choice but to adopt archiving as a necessity (rather than a luxury) to get control of the real issue, THE DATA.
To discuss any specific requirements, solutions, or if you simply need a shoulder to cry upon over last night’s backup feel free to email me or call me at 440-498-2300 x225.