Keeping historical data. How do you approach accessing things that you "have backed up somewhere around here"?

I’ve been doing computers for a while now. Each machine has had one to 5? hard drives. As my life gets more and more digital, I’ve become concerned about the life of hard drives, mechanical or solid state, they can all fail. I’m also concerned about viruses, ransomware and all other kinds of malware. So, I have tried to make backup copies of the machines that I use.

My current strategy is to have a stack of 1 TB external hard drives next to each machine. Only one of which is connected to the machine at a time. I’m running Acronis True Image and have it programmed to do a backup in the middle of the night, when the machines are normally not doing much except gathering e-mail and a maybe a few other continuous processes. When the backup drive gets full, I cycle to the next one in the pile. I’ve had good luck restoring from True Image, the few times that I needed to. I feel pretty good about the quality of the backups.

In a good year, I might upgrade (essentially replace) the hardware or the software of one or more computers. I have a pile (well, several piles really) of hard drives, both internal and external.

Since I retired, I theoretically have more time to do the kinds of things that I want to do. One thing that I’m still dreaming about doing, is to resume refactoring and studying BCX, an open source BASIC to C translator. Sometime around 15 years ago, I had started diving into BCX and had written a few utilities to rip a BCX BASIC source file into many pieces and then put them back together in some other arrangement. I would like to try and build on all that programming effort, but first, I need to find the files.

I recently purchased a 6TB external hard drive, and have copied all the files from most of the drives in the piles onto it. Some were serial ATA, some were PATA, some were USB 2.0. I ran out of room on the new drive, and was not able to access a few of the drives, but I was able to copy the majority of them onto the new drive. I copied each drive to its own folder. I’m sure there were a bunch of duplicate copies of files, which I don’t really need.

I did some searching on the internet, and found a free (open source, I think) duplicate file locating program. It compares files byte by byte and can generate a report of the duplicates and their location, in the directory structure, of the drives they are on. It has the ability to mark and delete files, but with the number of duplicate files that I have, it would take a very long time to select each duplicate file and delete it, so I exported the duplicate data to a text file and wrote some VBA code to delete all but one of the copies of any duplicate files.

I also wrote some VBA code to do a recursive directory listing, of the files on the drive, and sort them by file extension. For the above mentioned task, I was mostly interested in .bas files, so that helped a lot.

I more recently purchased another 6TB drive to continue the process. This time, I want to expand the True Image backups and remove the duplicates files from them, or just save the unique files. I’m hoping this will eliminate a lot of programs and operating system files, as it did in the previously described effort.

I’d also like to gather all the old e-mail files and process them to get personal messages and pictures. I used Outlook for a number of years. For the last few years, I’ve been using FOSSAMAIL. I guess that will complicate the process some.

I’m wondering if anyone has a better approach to saving and being able to locate data from years ago.

1 Like

I have a vague idea that DevonThink3 might be helpful here , but unsure as I haven’t used it . Will mention this with regular users , to see if they have an idea . Hope you find your solution soon !

This is a damn good question and I appreciate the level of detail you put into how you’ve experienced the problem - mainly because it’s a problem a lot of people face.

Firstly, I’m going to add a ‘Backups’ section to the wiki if we don’t already have one, because data storage is the rubber where knowledge management hits the road.

Secondly, if I couldn’t find any decent software out there for the kinds of data loads we’re talking about (too big for regular file managers, too small for enterprise solutions), I’d try to use Python to make a file and directory inventory in JSON or JL and then use Elastisearch to give me a decent ‘Find’ function for it. This is definitely an “I would”, not a “you should” situation. I need the practice with Python lol.

1 Like

Back in the day, when my data was backed up on floppies, I purchased a shareware program that would read and catalog the contents of the disk. I named the disk and printed a corresponding adhesive label to affix to is so i could physically identify and organize them.

I recall using the program to inventory backup tapes as well. I don’t recall the details of how that worked.

I’m interested in practicing programming as well, I’m not sure Python would be my first choice, but it’s on the list. :wink:

2 Likes

I would seriously consider Python as your first language. It covers every aspect of what you might want to do, gives instant feedback, has the broadest range of libraries and has a syntax that is more readable than almost any other language.

Full Disclosure, I’m a Python Dev.