Think “Outside the DeDuplication Box” for Backup

pile-of-shoesOver the past couple of years the terms Backup and DeDuplication have become synonomous.  The vast majority of organizations today, regardless of size, have either implemented DeDuplication within their environment, are considering implementing it, or at a minimum wondering if they would benefit from DeDuplication.  This is in large part due to the fact that the industry as a whole has been force feeding the need for DeDuplication in an effort to grab a piece of the pie.  Heck, the term DeDupe itself has all but become a household name.  Believe me; I’d be sitting on goldmine in the Caribbean enjoying an endless supply of margaritas if I had an algorithm that would allow me to “dedupe” my wife’s shoe collection.  But are the shoes really the problem?

The industry in general has conditioned everyone to believe that DeDuplication is THE ANSWER.  The fact is what DeDuplication really addresses is the result of the problem, not the problem itself.   Don’t get me wrong, DeDuplication is a very powerful technology and can offer some significant benefits to many organizations.  But DeDuplication was designed to target a very specific set of requirements.  For instance:

  • Historically, it was always very difficult and for most companies impossible to replicate backup data offsite.  DeDuplication now enables a very efficient way to accomplish this by analyzing the data at a block level.  (Now keep in mind it’s not necessarily the “data” itself that made this task difficult in the past, it was the proprietary format that backup applications used that was the issue)
  • Many companies now want to minimize the use of tape, or “redefine” the role of tape within their environment.  The role of tape has been changing from that of a routine backup role, to more of a dedicated long term archive role.  With the evolution of DeDuplication and disk based backups in general, companies can now utilize DeDuplication to store more near term copies of backup on disk, and then leverage tape exclusively to satisfy long term retention requirements.

If your objective is to achieve longer retention of backup data on disk, reduce dependence upon tape, and gain the ability to more efficiently move backup data offsite then DeDuplication alone can certainly enable this.  However, the truth is there is so much more we can be doing to address the root problems around backup that are causing the need to even consider a technology such as DeDuplication.

Intelligent active archiving is a good example.  When you look at the data in your production environment, it is very common to see 60% – 80% of that data is either not active data or is redundant data.  Archiving can intelligently and transparently remove “static” and duplicate data from the production environment while keeping this data fully accessible.  As a result, this archived data no longer has to be backed up on a regular basis.  This single function alone could eliminate 60% – 80% (if not more) of your daily, weekly, and monthly backup requirements.  Now ask yourself, if I reduced my backup requirements by 80% do I still need DeDuplication?

Recovery is another issue that archiving can help address.  Most would agree that the restore process using traditional backup applications is unacceptable for the competitive nature of business operations today.  Sure, we can now restore from disk, but your data is still held hostage by some proprietary format that requires some restore “process” just to be able to access your own data.  During this process you reaffirm your faith by praying to the backup gods for the tapes to be good, or that the data is not corrupted.  By leveraging an archive system as part of your backup strategy the data is immediately available in its native format to the end users and applications, completely eliminating the drudging restore process we have all come to regretfully accept as a part of life.

Part of the reason archiving is grossly underutilized by companies today is fear.  There is a perception (and to some degree, lack of knowledge) that implementing an archive solution is complex and very costly.  The fact that everyone has also become so accepting of the normal “backup” process has also made it very difficult for many to grasp the concept and benefits that archiving can bring to the table.  For those who are considering or evaluating an archive solution, there are 5 key components that should be carefully considered.  These are as follows:

True Active Archive

True active archive solutions provide the ability to maintain an online and/ or nearline active archive of data.  A true active archive solution can act as the primary device and enable direct access to the data from within the archive repository, making it immediately available, without the need to stage the data back to primary disk.  Many companies have self branded their solutions as an active archive but do not provide this capability.

Active Content Validation and Self Healing

This should be an absolute essential when evaluating any archive technology.  Content validation and self healing enables the solution to automatically and transparently perform file integrity audits and perform advanced features such as block level file repair on disk or tape, block level digital fingerprinting, as well as individual file deletion on disk or tape with media reclamation capabilities.

Metadata Content Aware

This is a key component for any company that wants to or needs to maintain a history of related content (I.e., medical industry with years of patient information, law firm with client data, manufacturing with years of drawings and product data).  This function provides the ability to analyze metadata and transparently consolidate related data and files into groups.  For example, a manufacturer may have year’s worth of design data for a given product.  This function can consolidate all of that data (based upon metadata characteristics) down to 2, 3, or X number of LTO tape cartridges, rather than having this data spread across hundreds of LTO-2, LTO-3, and LTO-4 tapes that the company has used over the years.

Application Support

Many archive solutions today are designed specifically for email only, or standard file data.  The vast majority of companies however are also running some form of OBDC applications such as SQL, Oracle, SAP, Sybase, etc, and those applications often times present the most challenges when it comes to performance, database maintenance, etc.  The ability for a solution to transparently analyze and archive data at the row and column level within an application is extremely powerful.  Take SQL for instance.  Any technical person who understands SQL will tell you that as the database grows the performance degrades substantially and the ongoing maintenance becomes a real burden.  By archiving the static row and column data out of the database, and into an “active archive”, you substantially improve the performance and manageability of the database while ensuring that all of the archive data is still 100% visible to the application and end users.

Storage Virtualization and Intelligent Tiering

A key consideration of any archive solution should be the ability to virtualize various types of storage and manage different tiers of storage, including disk and tape.  Many solutions today can do this.  However, taking this a step further the archive solution should be able to provide intelligent management of the storage tiers based upon conditions around performance, priority requests, as well as the type of physical storage being used such as MAID disk and others.  For example, a good archive solution should be able to use disk performance buffers to provide fast access to the most frequently accessed data, as well as perform pre-emptive priority processing for transaction based data.  Additionally, these solutions should be able to take advantage of the energy savings benefits that MAID disk can provide by managing the spin cycles of the disk to optimize performance.

There are many additional benefits that a good archive solution can provide to compliment an organization’s backup strategy, such as Replication, File-based DeDuplication, data availability “during” a recovery, 100% hardware “freedom”, compliance, as well as transparent migration support to protect against technology obsolescence.  More and more organizations of all shapes and sizes will begin to recognize the value around archiving, and the industry as a whole will have no choice but to adopt archiving as a necessity (rather than a luxury) to get control of the real issue, THE DATA.

To discuss any specific requirements, solutions, or if you simply need a shoulder to cry upon over last night’s backup feel free to email me or call me at 440-498-2300 x225.

-Rob Oddo

Faster AS/400 (iSeries) Backup and Disaster Recovery

as400Somewhere in the server room near you lurks the sometimes ominous OS/400 system running on iSeries server hardware. Typically, company “lifeblood” accounting applications such as those from JDE (J.D. Edwards) are running on these systems.  Perhaps I am being a little biased here by my chosen profession but the words “lifeblood”, mission critical, and even accounting all put the brain on automatic with the following questions:

  • If the system goes down what are the consequences?
  • How will workflow be interrupted?
  • What percentage of the company will be affected?
  • Ultimately, what are the dollars associated with downtime?
  • How do we address the above and ensure that the inevitable downtime has a minimal impact on the corporation and its customers?

Of course, as with any application or system we first turn to the corporate backup team.  In a modern implementation it is likely that the backup team is using BRMS (Backup Recovery and Media Services) to backup to physical tape.  Typically, the physical tape infrastructure consists of one or multiple IBM 3584 libraries with one or multiple IBM-Ultrium-TD tape drives (LTO-1, LTO-2, LTO-3, or LTO-4).  These technologies do provide some peace of mind but they are only a piece of the bigger picture.

  1. The full picture for disaster recovery includes several items (examples below):
  2. How long does the backup take to complete?
  3. What type of backups are used and how often are they run?
  4. When do tape(s) go off-site in relation to when the backup runs?
  5. How fast is the disaster identified?
  6. How fast can the tapes be retrieved to the DR location?
  7. How long does it take to restore?
  8. How long does it take to make the application accessible to the corporation?

Every step above directly or indirectly corresponds to one element in the disaster recovery equation and ultimately feed the two most important factors for validating (or invalidating) your current backup scheme RPO and RTO.

For our purposes we’ll define the recovery point objective (RPO) as the point in time to which data must (or can) be restored to successfully resume processing of business application data.  In other words, after the disaster, what is the age of the data available when business functions and applications are brought into operational state?

For our purposes we’ll define the recovery time objective (RTO) as the time within which business functions or applications must (or can) be restored including the time before a disaster is declared and the time it takes to perform tasks to restore business functions.  In other words, after the disaster, how long will it take to bring business functions and applications to an operational state?

If you’re looking for an RPO of 36 hours it could feasibly be addressed by a simple tape backup of one full backup per day.  Suppose that the backup takes 2 hours to complete and the tapes are taken off-site within 4 hours.  The ideal disaster (oxymoron?) would happen moments after the 2 hour backup and allow you to restore directly to the production system using the tapes (before they go offsite) or the local tape copy.  Worst case, the disaster is identified soon after the occurrence and for some reason your production system is not available for the restore. The disaster recovery process now kicks in and your backup team makes the appropriate phone calls to retrieve tapes to the off-site location. Having practiced the disaster recovery process many times (sarcasm) they bring the system online as it existed from 2 (ideal but dreaming) to 36 (desired) hours ago.  The time it takes from identifying the disaster to the application once again servicing the organization is the “RTA” (recovery time achieved) and hopefully that matches the business needs for RTO or better (RTA <= RTO).


What if, soon after the backup completed (with a small delay for transfer) your backup data was already off-site at the same site where your DR iSeries hardware waits?

What if the backup replica is already sitting in a library pre-attached to the DR iSeries?

What if,  you could simply recover the iSeries LIC directly from the backup replica?  (“boot from SAN” or in this case “boot from tape” iSeries nomentclature is “IPL” “initial program load”)

To meet these “what if” scenarios you would need a virtual tape library (VTL) system that could emulate one or multiple IBM 3584 libraries with one or multiple IBM-Ultrium-TD tape drives (LTO-1, LTO-2, LTO-3, or LTO-4).  The VTL would need to provide a means to replicate virtual tapes to an off-site VTL.  Additionally, this VTL would need to allow the presentation of a tape set over fibre channel to the awaiting DR iSeries hardware and allow for LIC recovery directly from the replica backup.

Chi Corporation can make this VTL design a reality and allow you to drastically cut backup times, tape-handling logistics, human error, RPO and RTO for your company’s iSeries implementation.

Have more questions about OS/400 and iSeries Backup and Disaster Recovery? Feel free to email me or call me at 440-498-2300 x232.

-Rob Kinney

Drobo – A Glimpse into the Future of Storage

droboSome may disagree (or refuse to admit) that today we are dead in the midst of a significant paradigm shift as it relates to data storage.  What I am referring to primarily is “general purpose” data storage, the type of storage that 60% – 70% of the data at any given company could reside upon.

For years the storage industry was driven by the manufacturers.  As drive technologies improved, capacities increased, and companies began listening to the iSCSI story, storage manufacturers engaged in a bloody battle to develop and bring to market more intelligent, user friendly storage solutions.  We then had the SCSI and Fibre Channel drive guys touting that SATA is not reliable enough as primary storage, so more advanced levels of RAID were introduced to offer increased protection against disk failures.  Then along came the “feature set” wars over whose technology has better snapshots, more efficient thin provisioning, dynamic expansion, replication, etc.  This was coupled with the arguments around which protocol was “faster”, Fibre Channel or iSCSI.

Well, as the dust settled what did we learned?  We learned that there is multi-billion dollars worth of end users who are tired of paying premiums for overkill storage solutions and want lower cost, more simplistic storage.  And the evolution of iSCSI and ATA/SATA drive technology had offered that, somewhat.  After all, bundled in that “low cost, all inclusive” iSCSI SAN array you purchased back 2005 were inexpensive $2,000 500GB SATA disk drives.

Today, there are two significant events quietly developing within the industry:

  1. Storage manufacturers are losing their control to the end user
  2. As disk drives continue to get larger and larger, the concept and benefits around traditional raid as we know it today are diminishing.

End users, more than ever are demanding lower cost storage.  They are beginning to recognize that the big picture is more about the management of the data, rather than the storage itself.  Because of this, manufacturers have lost the control they once had and companies are paying less for storage.  This has resulted in storage becoming a commodity.

The other significant event that is occurring is that with drive capacities already as large as 2TB, and with 3TB and 4TB drives just around the corner, traditional RAID is presenting some great challenges for storage administrators.

Take a typical 14-drive RAID set for instance.  With 3TB drives you are looking at 42TB of storage.  Can you imagine how long it would take to rebuild that 42TB RAID set?  A week, maybe two?  And you haven’t even begun to restore data yet!

With the size of a disk drive today, the benefits of traditional RAID are going away.  This is why it’s an absolute must for companies to begin taking a closer look at the management of data, and depend less upon the reliability of the storage itself.

For example:  if you are running an active archive solution such as FileTek’s StorHouse, that enables functions such as transparent file integrity audits and block level file repair, you don’t have to rely as heavily upon RAID because StorHouse is protecting the data.  Furthermore, if a production disk should fail you have immediate access to a copy of the data directly from within the active archive; therefore you completely eliminate the need to rebuild the RAID and then perform a restore procedure just so you can access the data.  The data is available for immediate access while you rebuild the system.


The industry buzz (or perhaps better defined as rage) over Data Robotics DroboElite exemplifies this change in the storage landscape, and offers a glimpse into where storage technologies are heading.

When I talked to one of my customers who was interested the Drobo I really knew nothing about the product.  He saw the system at a trade show the a few days later called me up wanting to purchase one.  So to be truthful, I was a bit of a skeptic.  It was actually somewhat odd because here I am talking to a customer and he is selling me on the box that I am about to sell him!

As I started talking with other customers I learned that there is this huge cult-like following of this Drobo appliance.  I started learning more about this system and took a look at it myself, and I immediately “got it”.  Here is a closer look…

First, let’s get the important stuff out of the way.  A fully loaded 16TB DroboElite has an MSRP of less than $6,300.  No, that is not a mistake!  For that price, there isn’t a company in the world that doesn’t have some sort of use case for a DroboElite, whether it’s departmental use, cheap D2D backup, public file storage, archive, vmware, video storage, and many others for sure.

The DroboElite is an 8-bay storage device that can be rackmount or desktop and has dual iSCSI interface ports.  This system supports up to 16 iSCSI clients and up to 255 volumes (Smart Volumes).  The Smart Volumes can pull storage from the common pool of disk rather than specific physical drives.  This eliminates the need for features such as Thin Provisioning (mentioned above) because you no longer need to manage capacity at the volume level.

The system is designed upon their BeyondRAID technology.  This is the “secret sauce”.  BeyondRAID eliminates many of the inherent shortcomings of traditional RAID that I mentioned earlier by taking standard RAID algorithms and applying them on top of a flexible storage virtualization architecture.  For instance, with BeyondRAID you can mix/match various drive capacities in the same system and the Drobo utilizes the full capacity.  With traditional RAID this is impossible.  All of the drives would be formatted to the smallest drive size in the group.

The DroboElite also has self-healing technology built in.  The Drobo will monitor and repair bad drive blocks transparently without any performance degradation.  Additionally, if a drive should fail the Drobo automatically redistributes the data across the available drives at that time to maintain the highest level of protection and performance.  NOTE: This was very cool.  I was watching a movie and pulled a drive out right in the middle of it and didn’t miss a beat.  In fact the only difference was the flashing LED’s that indicate the drive went bad and the data was being redistributed on the fly.

Another cool feature is the “on-the-fly” expansion.  You can insert a new drive, or even replace a drive with higher capacity, and the system automatically expands the available storage and redistributes the data across the available drives.  Not even a single mouse click!

BeyondRAID supports single or dual drive redundancy (equivalent to RAID 5 and RAID 6).  However, unlike traditional RAID where you need to reformat the array to change parity levels, the DroboElite allows you to switch between single and dual drive redundancy with the data in tact by the click of a button.

The bottom line is the DroboElite is a very cool product at an unbelievable price point.  I realize it may sound as though I’m blowing some smoke (or perhaps inhaling).  Don’t get me wrong.  Like any other storage device, the DroboElite has its target market and I’m certainly not suggesting that you go out and replace your tier one primary SAN with a bunch of DroboElite’s.  However, the concept around the Drobo technology is very appealing and the simplistic approach to the technology is what the future holds for general purpose storage solutions.   If you don’t believe me, call me and I would be happy to send you out a free demo system to evaluate for yourself.

For the record, I would bet the farm you have some data that is taking up disk space on your expensive tier one storage that can absolutely be “Drobo’d”!

Rob Oddo

(800) 828-0599 Ext 225


Skype: rob.oddo

Extending the Value of Data DeDuplication

Most IT departments understand the value of data deduplication but we’ll recap them here:jthome

Less Disk – Data de-duplication reduces disk utilization by 90 percent or more.

Less Cost – With less disk, logically you spend less on disk but the savings extends to reduced power, space, and cooling requirements… and backup administrators spend less time managing backups

Faster backup, faster recovery, less media failures – For backups and virtualized environments, deduplication reaps huge, obvious benefits as backups and VMDK files are typically copies of data already. Since writing and reading from disk is faster than tape, backup windows shrink and recovery is much faster.

Faster replication – deduplication reduces data locally so your replication will also be faster since you are sending less data.

These are the common benefits of deduplication no matter which vendor you choose. Now consider how you can extend the benefits of deduplication and further reduce cost and complexity:

AutoMAID: For most backup to disk environments, the backup window is no more than 8 – 12 hours. Does it make sense to spin that disk at full power for 24 hours a day? AutoMAID (Automatic Massive Array of Idle Disks) energy saving technology (which is included at no extra cost with the Nexsan DeDupe SG) transparently places disk drives into an idle state to vastly reduce power and cooling costs. AutoMAID delivers the cost-effecdtive benefits of MAID 2.0 without the limitations of slow access times and special host software.

Global DeDuplication: Many IT environments have demanding backup schedules with narrow backup windows. The leading backup deduplication vendors address this by simply adding more deduplication appliances to handle the greater throughput. The problem with this approach is that these appliances are only aware of their own data  and are not aware of the data in the other appliances. If you add more appliances, each one is an independent silo of deduped data and the consequence is a dramatically reduced deduplication ratio. For instance, if your effective deduplication ratio on a single appliance is 20:1 and you add a second appliance, your potential total dedupe ratio could be 10:1.  Add a third appliance and your total deduplication ratio could be 6.6:1. Global DeDuplication is a feature included with the Nexsan DeDupe SG that addresses this challenge. Each node is aware of the data in the other which provides true global deduplication.

Have more questions about deduplication? Feel free to email me or call me at 440-498-2300 x223.

-John Thome, Jr.