Factors for data-transfer performance

This article provides insights into available technology for storage devices and interfaces and illuminates the possibilities and consequences of different configurations.

Storage technology

In practical use there are basically four technologies used:

  • hard disk drives (HDD)
  • solid-state drives (SSD)
  • flash memory cards (CF, SDHC, SxS etc.)
  • RAID systems

HDDs have been and still are the main storage devices for computers. They consist of several stacked, magnetic platters, that rotate in a metal enclosure. A movable arm reads and writes information by detecting and changing the magnetic properties of tiny areas of the platters. HDDs are available in 3.5” and 2.5” size, that mainly differ in capacity, speed and power consumption. They are cheap and large and thus often used for transporting large amounts of data. Typical HDDs have a read and write performance of around 100 MB/sec (megabytes per second). Higher rotation speeds of the platters (e.g. 5400 U/min) lead to higher performance.

SSDs are not based on any mechanically moving parts but use electronic circuits (mainly flash memory) for storing information. SSDs offer the same interfaces as HDDs such as SATA to be compatible and interchangeable in existing devices. They are much faster than HDDs in most scenarios but are also more expensive. Typical SSDs can be as fast as 300 MB/sec.

Flash memory cards use similar technology as SSDs, but have a form factor and interface made for use in non-computer devices such as cameras. Depending on price, used components and interface these memory cards offer a wide range of performance. Also the card readers and their interfaces influence the performance. So for example a typical SxS card used in an Alexa camera is as fast as 150 MB/sec, but exists a card reader by Sony that reads only 30 MB/sec from SxS cards due to the USB2 interface used for that card reader.

RAID systems (“redundant array of independent disks”) integrate multiple HDDs or SSDs into one logical device to increase capacity, speed and/or reliability of the entire RAID system. Standalone RAID systems again offer the same interfaces as HDDs or SSDs. RAID systems are used if a very large amount of data has to be stored or if the safety of the data has to be improved. RAID systems can theoretically add up performance of the used drives, but the used RAID level and the power of the controller can limit that.

Differences between HDDs and SSDs

The first difference of SSDs and HDDs is the price. Currently SSDs still cost around 10 times more for the same data capacity of HDDs. You will also not find the same sizes of drives either, SSDs are usually much smaller in capacity than HDDs. Commons sizes of HDDs go up to 4TB, while SSDs are available at up to 256 GB (in 2012).

The most important difference is the data transfer performance: SSDs are much faster than HDDs in many aspects – that all can influence the overall performance of an on-set copy setup.

HDDs offer the best performance when reading or writing processes read blocks linearly – this performance can come close to the performance of SSDs. But that performance of a HDD drops heavily (by a factor of 10 or more) if either

  • several, parallel processes read and write data, or
  • the data of one file is fragmented over different places on the disk
as the head physically has to move on the platters between read or write accesses.
The performance of a HDD also decreases if the drive gets full – as the platters can’t store that much information on the inner parts of the disks (HDDs are filled from the outside to the inside of the platters). SSDs in contrary offer almost the same performance for random and parallel access to data than for linear access.

Last but not least SSDs are much more reliable when exposed to environmental factors such as shock and vibration. SSDs are very resistant and even can be dropped from table height without problems (if housed properly), while HDDs may suffer severe damage at much lighter shocks.

A comprehensive list of differences between SSDs and HDDs can be found here:
http://en.wikipedia.org/wiki/Solid-state_drive#Comparison_of_SSD_with_hard_disk_drives

RAID levels

Depending on the number of HDDs/SSDs that are used in a RAID system, the RAID system can offer different improvements over a standalone drive.

1 drive: 

  •  no RAID possible

2 drives or more:

  • RAID 0 (“striping”): Improvement of speed and capacity (no increase of reliability). Information is divided into small blocks and each block is in turns stored on only one of the drives. This increases the performance as both drives can be used to access the data of one file.
  • RAID 1 (“mirroring”): Improvement of reliability (no increase in speed or capacity). Information is divided into small blocks and each block is stored on both of the drives. If one drive fails, the entire information is available on the other drive. There is no performance gain.

3 drives or more:

  • RAID 5 (“striping with parity”): Improvement of speed, capacity and reliability. Information is divided into small blocks and each block is in turns stored on only one of the drives. In additional parity information is stored distributed. The capacity of one drive is used for the parity information (e.g. 4x 1TB drives RAID 5 can hold 3TB of data while 1TB is used for parity) while the striping still increases the performance. If one drive fails, the information can be recreated (“rebuild”) with the stored parity information if the fault drive is exchanged by a new one.

4 drives or more:

  • RAID 6 (“striping with double parity”): Improvement of speed, capacity and reliability. Similar as RAID 5 but two drives are used for parity. So two drives can fail until the data of a RAID is lost.

More complex setups combine RAID systems to new RAID systems by striping. So for example three RAID 5 systems that are striped together are called a RAID 50 (“5+0”). More on RAID systems and technologies can be found here:
http://en.wikipedia.org/wiki/RAID#Problems_with_RAID_5_in_enterprise_environments

The RAID functionality (creating parity etc.) can either be implemented in hardware (for external RAID devices) or in software by the operation system. For hardware RAIDs the used controller heavily influences the performance of a RAID system. Cheaper RAID controllers can heavily limit the overall performance in comparison to the combined  theoretical data rate of the used drives.

Interfaces

  • USB2: Around for some time, very high compatibility, very low performance of around 25 MB/sec.
  • Firewire (FW) 400: Mostly available on Macs, can be daisy chained, performance around 40 MB/sec.
  • Firewire (FW) 800: Mostly available on Macs, can be daisy chained, performance around 80 MB/sec.
  • SATA 2: Internal interface for drives, up to 300 MB/sec
  • USB3: Quite new, same plug as USB2, performance around 350 MB/sec.
  • SATA 3: Internal interface for drives, up to 700 MB/sec.
  • Thunderbolt: Mostly available on Macs, can be daisy chained, theoretical performance up to 1 GB/sec.
  • FibreChannel: Optical technology, for large RAID / SAN systems, up to 1.6 GB/sec with one 16GBit link

Daisy-chaining means that you can attach a chain of devices at just one port on the computer. The performance of the interface is shared among the attached devices. So for example writing to two daisy-chained Firewire 800 drives simultaneously results in a maximum write performance of 40 MB/sec per drive (80MB/sec divided by two drives).

Within the computer several interfaces of the same type are usually handled by one controller. This means that attaching devices to the interfaces of the same controllers might also limit the maximum overall performance.

For strategies how to analyze and improve data-transfer performance see the document http://kb.pomfort.com/?p=1581: “Analyzing and improving data-transfer performance”.