HPR4341: Transferring Large Data Sets
Hacker Public Radio - A podcast by Hacker Public Radio
 
   Categories:
Transferring Large Data Sets Very large data sets present their own problems. Not everyone has directories with hundreds of gigabytes of project files, but I do, and I assume I'm not the only one. For instance, I have a directory with over 700 radio shows, many of these directories also have a podcast, and they also have pictures and text files. Doing a properties check on the directory I see 450 gigabytes of data. When I started envisioning Libre Indie Archive I wanted to move the directories into archival storage using optical drives. My first attempt at this didn't work because I lost metadata when I wrote the optical drives since optical drives are read only. After further work and study I learned that tar files can preserve meta data if they are created and uncompressed as root. In fact, if you are running tar as root preserving file ownership and permissions is the default. So this means that optical drives are an option if you write tar archives onto the optical drives. I have better success rates with 25 GB Blue Ray Discs than with the 50 GB discs. So, if your directory breaks up into projects that fit on 25 GB discs, that's great. My data did not do this easily but tar does have an option to write a data set to multiple tar files each with a maximum size, labelling them -0 -1, etc. When using this multi volume feature you cannot use compression. So you will get tar files, not tar.gz files. It's better to break the file sets up in more reasonable sizes so I decided to divide the shows up alphabetically by title, so all the shows starting with the letter a would be one data set and then down the alphabet, one letter at a time. Most of the letters would result in a single tar file labeled -0 that would fit on the 25 GB disc. Many letters, however, took two or even three tar files that would have to be written on different disks and then concatenated on the primary system before they are extracted to the correct location in primaryfiles. There is a companion program to tar, called tarcat, that I used to combine 2 or 3 tar files split by length into a single tar file that could be extracted. I ran engrampa as root to extract the files. So, I used a tar command on the working system where my Something Blue radio shows are stored. Then I used K3b to burn these files onto a 25 GB Blu Ray Disc carefully labeling the discs and writing a text file that I used to keep up with which files I had already copied to Disc. Then on the Libre Indie Archive primary system I copied from the Blu Ray to the boot drive the file or files for that data set. Then I would use tarcat to combine the files if there was more than one file for that data set. And finally I would extract the files to primaryfiles by running engrampa as root. Now I'm going to go into details on each of these steps. First make sure that the Libre Indie Archive program, prep.sh, is in your home directory on your workstation. Then from the data directory to be archived, in my case the something_blue directory run prep.sh like this. ~/prep.sh This will create a file named IA_Origin.txt that lists the date, the computer and directory being archived, and the users and userids on that system. All very helpful information to have if at some time in the future you need to do a restore. Next create a tar data set for each letter of the alphabet. (You may want to divide y
