Sunday, March 17, 2019

Excelling at data Access

Well, it's been a bit. Since the last post, I've been doing a lot of thinking about music- my music collection specifically, and how to best quantify it. I had decided that I would use either Access or Excel for this project.  After much thought, and several trial-and-error tests, I came to the conclusion that the best solution for my particular application would be to use both.

Why? After all, it seems to be a bit counter-intuitive to create multiple copies of the same dataset and have it in several places. However, in terms of music, when quantifying, I not only mean artists and albums and similar facts (which belong in 42, my database), but I also want to create a sort of "Top 42" group of lists which will track usage, rather than ownership. In other words, Access will store data on each album- title, artist, format (CD, mp3, etc.), UPC (if applicable) and related inventory type data. The flat database (spreadsheets) will be used to store the usage data, such as number of plays, last play, plays per artist... and any other minutiae that pretty much only I would be interested in.

As for duplicated data, I expect there will only be two datasets that are duplicated- one will be the artist/album dataset, and the other will be the album/song dataset. The former will be my master list of albums, and the latter will be a list of those songs that are a part of a collection, compilation or soundtrack, or individual mp3's that were not part of an album when acquired.

I'm very much looking forward to the results of course, but before this grand Top 42 launches, I've got a great deal of data entry and design to do. The data entry is proceeding slowly but steadily, with twenty-one albums and their tracks. Track (song title) and total times still need to be added.

A corollary project is the cleaning of the music players on two computers. Over the years, several standards were partially applied to these databases, and as a result, the organization is... well, it isn't very organized. As these will be major sources of quantitative input for the Top 42 flat database, they need to be cleaned up, standardized and partially deduped.

As always, I am hochspeyer, blogging data analysis and management so you don't have to.