Monday, September 29, 2014

A cautionary tale of fixed values used in calculating extrapolated fields

Since writing A Tale of Two Databases, I've dusted off my Excel worksheet that I use to keep tabs on blog readership. Yes, deep, deep, way deep down inside of me- somewhere- there's a tiny bit of statistician. And this statistician, who lives somewhere on the Continuum-, er, that is, the Spectrum- likes to keep track of statistics about the blog.

A bit of background is in order at this juncture. I've long been a fan of collecting facts and figures- at least since high school. The year I was a freshman in high school, a new store opened that was on my way to school. The shop was named "The Sutler's Wagon", and Jeff Urquhart (the store has long since closed) was the proprietor. It was a hobby shop. Even back then, I LOVED hobby shops.

Prior to The Sutler's wagon opening, there were three shops I frequented whenever I could get my Dad to take me to them. One was Pilot. I don't remember the whole name of the business, but Pilot was more of a hardware store with a pretty bustling O scale and S scale tinplate selection and HO scale model railroading. Pilot was actually pretty close to Jeff's shop, and was derisively referred to as "the Pirate". Stanton in Chicago was a fairly large store, stocking everything from N to G scale railroading, as well as rockets, RC, plastic models, flying balsa planes... pretty much anything one could ask for in hobbies. The last store was Hill's in Park Ridge, which seemed to have a product range similar to Stanton but smaller selection.

The Sutler's Wagon was different. No trains. A couple of rockets. Some model cars. Everything else was military. Books. Models. Boardgames. Paint. Brushes. Miniatures. Miniatures in 15mm, 25mm, 54mm, 1/35th, 1/72nd, 1/144th, 1/700th, 1/1200th  and 1/2400th scale. The 15's, 25's, 1/144th, and 1/2400th were generally diecast metal; the others were generally plastic. He had rules for the games- these miniatures were not just for looking pretty- they were meant for action!

And this has what to do with anything?

I've mentioned this game before. In a very real way, Fletcher Pratt's Naval Wargame taught me long division. While playing the game is fun. I think I had almost as much fun making the ship (data) cards. In the game, each ship has a point value. The point value is based on a number of data which need to be factored into an equation. That's the easy part. From the total point value, one must then create a damage table which shows how a ship's performance is degraded as she takes damage. So, say a ship has a point value of 10,000  and a max speed of 20 knots. The ship takes 1000 points of damage. 10,000/20= 500, so for every 500 points of damage, the ship would lose 1 knot of maximum speed.  It's not perfect, and is designed to work best with WWI and WWII vessels, but it does work rather well. There are other factors that can make the game quite involved, but overall it is great fun.

From this point, I started looking at music. On the same block as The Sutler's Wagon was a music store called RPM records. Along with Rolling Stone Records in Chicago, I would scour their bargain "cutout" bins. I discovered a lot of great music in these bargain bins, and by the time I got to college, I had a few hundred albums.  My earliest music databases were handwritten on loose leaf paper- I counted an album as being played when I had listened to at least one side.- this worked for both LPs and cassettes. I did have some 8-tracks, but I did not track these.

Flash forward many years. On a normal day I'm on a PC or two at work, and three PCs (simultaneously) at home. In all honesty, a lot of what I do currently is online gaming, but I also write this blog, do a bit of research, maintain a few spreadsheets, and work on my database, Sometimes, I even take a bit of time off to play a game or two on Steam (user= drehenthalerhof).

Here's a lesson for all of you budding data analysts, data scientists, statisticians and anyone else who uses spreadsheets and databases: if the datum doesn't look right, don't assume it is. Check it- even if you are the one that entered it!

So there I was Saturday night, updating the Excel blog readership spreadsheet. This is not a particularly complicated spreadsheet- fifty rows and thirteen columns. As I've repeatedly mentioned, Google's free analytics aren't too deep- they're pretty much stats, and they only show the Top 10. And- because they're real time, if I miss a day or two, I can potentially miss some data. So I've had extrapolate to get a more accurate picture. At some point, though, I accidentally overwrote a formula- and my beautiful table of extrapolated data suddenly was being calculated by a constant instead of a variable! I figured out the error quickly enough, but one can't count on an easy fix every time. When I suspect a problem, or when I need to do a QC on an Excel worksheet, CTRL+~ is my goto.

As always, I am hochspeyer, blogging data analysis and management so you don't have to.

No comments:

Post a Comment