programming

Optimizing a Result Set Pager

Sat, 2014-01-04 11:15 -- admin

It's ubiquitous on data driven web sites: the result set pager.  We've all used them whether we built them from scratch or used one provided by the framework.




Pagers are by nature performance suckers because we're asking the database to re-run the same query for each "page", slicing off just one set of contiguous rows for each page. If your result set is 10,000 rows long but you're only paging through them 10 rows at a time, that's potentially 1,000 database requests to view the entire set.

But it's worse than that because in order to provide those nifty pager controls, like those in the image above, the software has to know how many rows are in the larger result set so it can do the math to populate the navigation for those page numbers.  In other words, using the above example the software needs to know that there is a Page 14 to jump to.

A little background first.  Internally, garden variety pagers are pretty much the same.  They request a fixed number of rows to display per page, like 10.  That becomes your LIMIT filter in the database query:

SELECT * FROM people LIMIT 10;

To create the page navigation you need to do some math to generate the OFFSET.  For instance, using a page size of 10, the query for a Page 3 display would look like this:

SELECT * FROM people OFFSET 20 LIMIT 10;

Remote SSH Filesystems on OSX

Tue, 2013-10-29 14:33 -- admin

Developers, particularly web developers, have a need to work on external computers, often not within their local networks.  Over the years I've employed everything from FTP to SFTP/SCP to Samba to NFS to VPNs to cranky Novell networks.  All have their downsides, particularly with regard to security.

I have a MacPro and originally ran NFS to connect to machines on my LAN.  But as Apple released new versions of OSX it became more hostile to NFS, to the point where it because unusable with my Ubuntu-hosted web server.  I retreated back to Samba but it was always a PITA because every time I rebooted the Mac I had to manually remount those network shares.   Half the time they wouldn't appear in Finder so I'd have to do it again.

When I got my new MacBook I decided to spend some extracurricular time sorting out this problem.  My research led me to OSXFUSE.   OSXFUSE is a library that allows foreign filesystems to integrate with OSX's own.  One of those is SSHFS, a GitHub project that allows foreign filesystems to be mounted over a secure socket layer.  This sounded exactly like what I wanted.  There was virtually no setup required on the host other than a functioning SSH account.   While I doubted that it would be a particularly fast filesystem, I'm not streaming media with it, mostly just pushing files through my programming editor which unfortunately lacks SFTP support of its own.

Database Meets Revision Control

Thu, 2011-12-01 14:27 -- admin

Any developer who has worked with HIPAA compliancy knows that the law is murky at best and the fed doesn't publish a programmers' guide to make your life any easier.  However, one of the cardinal rules is the requirement to keep track of who sees HIPAA data, who modifies it and when this was done.  Another is that if you delete/update patient data you need to log what was deleted/updated in order to provide an audit trail, if only for the lawyers.  Failure to do so can subject a company to some pretty draconian penalties.

This creates a challenge on the database side because SQL UPDATE obliterates a record's history.  There are a few potential solutions, such as maintaining a changelog which such updates are written based on table triggers.   I've done this but the log of atomic changes can grow immense.  It's also difficult to reconstruct a large record based on potentially dozens or even hundreds of changes to records which must be retained for up to six years. That's how a traditional RCS would handle rollbacks but it's not practical inside the confines of a database.

Nevertheless, a resource control system (RCS) approach is what's needed, where a SQL UPDATE would maintain a copy of the pre-updated record and freeze it from further changes.  RCS does its work by storing just the changes, or diffs, made to a document.  While it would be technically possible to do this with a database record -- for instance, using a BLOB in a sibling table -- there's a simpler and more practical method that also maintains relational integrity.

Finding duplicate records in a database: the SQL HAVING clause

Sat, 2011-10-01 15:55 -- admin

One issue I run across occasionally is a table with duplicate entries such as two entries for the same company in an accounts payable system.  This can create embarrassing problems with billing if ACME Inc #1 is 90 days overdue because someone posted a payment, and now a credit, to ACME Inc #2.

Using Google Maps with the Garmin Zumo 660

Sun, 2011-05-22 18:36 -- admin

One of the best things about motorcycling is spending a nice day exploring twisty back roads far from home.  One of the worst things is getting lost while running low on fuel and being clueless not only about where you are but where to find gas.  Fortunately, technology comes to the rescue again with a variety of water-resistant GPS units built especially for motorcycles.  Last week, I became the owner of a Garmin Zumo 660 GPS.  I also purchased a bluetooth headset kit for my helmet so I could hear the unit.

After a rocky start, which included trying to charge the Sena SMH10 bluetooth device through the MP3 port (doh!), I spent a couple of rainy nights familiarizing myself with the Garmin.  It's an impressive device, as it should be with a retail price over $700.  But it had one very disappointing handicap: it can only import route sheets from Garmin's MapSource software.  Not only do Google, MapQuest and Bing Maps make MapSource look like a relic from the Windows 3 era, MapSource is only available for Windows, not my Mac.

Yes, the Garmin will let you import destination points from the more popular online map vendors but the Garmin will construct the route for you.  That's fine if you want to get to Point B as quickly as possible but I usually don't want the fastest route.  I usually want one of the least efficient paths -- one which will take me through small villages and across covered bridges with a stop at my favorite hot dog stand which may 20 miles off the optimum route the Garmin creates for me.  (For instance, my favorite route from Washington, DC to NYC is through West Virginia's Monongahela Forest).

Garmin recognizes this so it allows you to construct a route based on waypoints, or mini-destinations within the larger route.   You can do this in MapSource or in the Garmin itself.  But you can't do this using Google Maps because it doesn't export the GPX-formatted file that the Garmin wants to construct a route based on waypoints.

Ironically, or perhaps justifiably, Google gave me the solution to fix this: a handy browser plug-in for Firefox called GMapToGPX.  What this does is read your current Google Maps route and create a GPX list of ordered waypoints in GPX format which can be imported into MapSource, from which it can then be exported to your Garmin.   Here's how it's done.

Returning latest/highest record from a set of records

Sun, 2011-04-10 14:09 -- admin

A common reporting requirement in database applications is selecting the latest purchase, or latest job, or oldest chlld for each person in a table.  Any beginning student of SQL knows how to return the most recent date in a table:

SELECT MAX(rec_date) FROM sales;

              max
-------------------------------
 2011-03-15 18:04:45.178057-04
(1 row)                                                           

But that doesn't tell us which person owns that record.  We can do this using a sub, or nested, query on the same table:

SELECT cust_name FROM sales WHERE rec_date = (SELECT MAX(rec_date) FROM sales);

  cust_name |          rec_date
------------+-------------------------------
 Dwight Schrute       | 2011-03-15 18:04:45.178057-04
(1 row)                                                 

This could return multiple records with the same rec_date.  But let's say we're not interested in who owns the most recent record.  Instead, we want to know the latest record for everyone in the table, such as the last sales date for every customer. Here's one of the possible solutions for that:

Subscribe to RSS - programming