Put Your Data on the Storage & Backup Systems Diet


Data is growing at a rate of 50% per year. Years and more, which requires you to buy and consume more and more disk. You can now be under considerable pressure to manage these huge increases in data stored in production environments and stop capital spending on additional disks.

Using storage as efficiently as possible should now be a critical goal for your organization. Follow this 5 step plan to regain control of your data storage.

  1. Do not remove any business related files
  2. Archive old or rarely used data
  3. Remove duplicate files
  4. Remove duplicate data
  5. Minify the remaining data you have

1) Do not remove any business related files

As a 7-day detox, this can provide a very quick solution, use a tool like File Insights from Brocade (brocade.com/support/fileinsight.jsp) that scans both Windows file servers and non-Windows NAS devices and can generate reports about the file amount, age, size, types and other movie metadata statistics to identify candidates for removal. However, be aware that deleting all MP3s can also remove your CEO’s favorite podcast or HR’s training library, the key is to understand and identify the various data. Analyze your findings and make sure you understand your user’s data requirements before applying any company-wide policy. Discuss with your system integrator which Storage Resource Management (SRM) or HSM (Hierarchical Storage Management) Tools that can help you automate the process.

2) Archive old or rarely used data

Once you analyze which data is old, orphaned (ie, no longer available files where applications have been withdrawn), there is rarely access, non-mission critical, etc. Create an archiving policy to remove this data for cheaper tier 2 disk or tape. Filing and email archiving is well-served with market-proven solutions such as EMC disk extender and email extender, Symantec Enterprise Vault, etc. But before choosing a solution, it is important to perform the following tasks:

  1. Specify a data storage policy
  2. If possible, integrate data storage policy into an archiving system
  3. Enforce data retention based on a published central storage document
  4. Store data in classes appropriate to the data age and its access requirements
  5. Move inactive data to archive
  6. Keep application transparency for users wherever the data is located
  7. Get your filing methodology approved as legal in each country you operate in by the appropriate agency.

Once defined above, work with your system integrator to ensure that the solution offered meets your exact requirements

3) Remove duplicate files

A simple solution is to remind users to store documents in a shared area rather than having a personal copy of a commonly used document in their own user directory. How many copies of the same organizational chart, business overview PowerPoint, spreadsheet template for expenses do you have ?.

Your future investment in storage infrastructure should consider how it handles file deduplication, also called Single Instance Storage (SIS), which compares a file to be saved, backed up, or archived with those already stored by checking its attributes against an index . If the file is unique, it is saved and the index is updated; if not, only one cursor is saved to the existing file. The result is that only one instance of the file is saved, and subsequent copies are replaced with a “stub” pointing to the original file. Sellers, e.g. NetApp, now adds this feature to their file systems as a free option.

4) Remove duplicate data

Advanced block-level data reduction and data duplication techniques have been used in the secondary storage environment for cost-effective resolution of the avalanche of backup and archive data that organizations need to store, providing more than 2x compression and in many cases over 20X compression for backup and archive data. However, the algorithms and techniques used are not well suited to the primary storage environment.

Primary storage requires something very different: real-time compression and optimization without degrading performance and without the need to change existing infrastructure. New approaches and algorithms are required to bring the same level of optimization to the production storage environment.

When evaluating deduplication at the block level, it is important to understand the various methods that include:

(1) In-Line Processing Data Duplication

(2) Processing of data duplication of data

(3) Parallel processing I / O duplication

Make sure your system integrator explains which method is best suited to your requirements before recommending a particular vendor solution.

5) Compress the remaining data you have

The data you need to be available on primary storage can be further slimed by compressing them to take up less space.

There are two main types of data compression: lost and lost.

(1) Loss of data compression. After using lost data compression, the file can never be accurately restored as data is lost. This type of compression has its applications in audio, video, graphics and image files. An example of tabular compression is MP3 format, which removes high and low frequencies that the human ear cannot hear to reduce the file size. This compression method is clearly not acceptable for text-based files.

(2) Loss without data compression. Losses without data compression work by finding repetitive patterns in a message and effectively coding those patterns. Losses without data compression are ideal for text.

Traditionally, capacity optimization has focused on secondary inventory management techniques such as de-duplication and high-density disk architectures such as SATA. But a new software class is changing the old equation by optimizing primary storage as well as secondary. Products such as the Storwize STN-6000 can compress primary storage data by up to 20: 1

Benefits of the storage and backup system diet

Not only will you now be able to manage the increases in data stored in your production environment and stop the capital expenditure on extra disk, you use your inventory as efficiently as possible. You will also want to:

  • Reducing footprint
    • Reduces data center costs – Rack, hosting, renting
  • Reduces power consumption (disk and air conditioner)
    • Smaller disk requires less power, creates less heat and reduces cooling requirements
  • Potentially faster data recovery
    • Smaller amounts of better structured data allow for faster recovery in case of loss or disaster.


All of these steps are essentially the foundation of a formal data processing policy. In practice, some of the steps can get quite complicated and require a great deal of thought and planning.

To ensure this is a once-only exercise and provides a platform for future data storage and management acquisitions, you need to partner with a partner who will take the time to not only perform a capacity-based assessment of their storage infrastructure, but will understand their complete environment and develop a strategy that can be implemented according to your business requirements.