Here's some work on getting images out of MDL Reflections for the MDL+Flickr project. MDL Reflections is a CONTENTdm system and it does not provide a simple URL for pulling out images at a given resolution. This work is to develop a simple image retrieval method from CONTENTdm.
Last year I worked up a few examples of how this retrieval was possible, in theory. But I need something that is much more generalized and could, eventually, be part of a batch process.
I like working in PHP since I can use the browser for quick-turnaround testing and eventually I can implement the resulting methods as either command line scripts or browser based utilities. I found a toolkit for pulling metadata out of JPEGs called the PHP Metadata Toolkit. Really this toolkit focuses on getting metadata out of a few image format, JPEG included. The good news is this eliminates the need for any real image processing software in this workflow.
First demonstration
This is my first demonstration of how such retrieval can work.
I want the image to have a max dimension of 1024 pixels. But CDM won't let me ask for an image by dimension. I can only ask for the image to be scaled to a certain size. To calculate the scale at which I ask Reflections to return the image I have to go through a two stage process. First I pull the image I want at a known scale (10%), then I look at the resulting image dimensions (width and height), find the longest dimension, and then figure what scale would turn that longest dimension into 1024. The formula:
( target size * initial scale ) / longest initial dimension = desired scale
This almost works. Since the initial image from stage one is reduced in size, and pixels are integer values, some compromise was made to scale it in the first place. This compromise is aggravated by the formula, so that the result will be close to, but not exactly, the right size once the system returns another scaled image in the final step. The resulting images will be very close to the target value, but they may be slightly bigger or smaller. You will see some of that variation if you play with this first demonstration.
Second demonstration
This time I am actually storing a copy of the image to the server, as a step toward batch processing. Note that the URL of the image presented not a Reflections URL (you can prove this to yourself by right-clicking the image and asking your browser to show it to you in its own window). I have also combined this with my OAI retrieval and XML parsing work so that the metadata is also shown.
Third demonstration
Here we go! All the pieces are together for the first time. This version will actually add a record to Flickr. It is using a test account I set up (check with me if you want the userid & password). If you already have a Flickr account, send me your userid on Flickr so I can add you as a "friend" of the mdltests account, that way you can see how this stuff looks at Flickr. Give it a whirl!
Fourth demonstration
You can now drag this
javascript bookmark to your bookmarks bar and go to MDL Reflections. Just find any item record, then click the "MDL+Flickr" bookmark in your browser.
Testing on hosted system
Test, for example, with this item record.
Next steps
I'd like to refine the one-off workflow in the fourth demonstration above. This would be cool, maybe even the right long-term way to go, even though it would not be a batch process.
For batch processing I still have to loop through a collection, retrieving images and storing to disk until we run out of images. How will I know the difference between errors in retrieval and the end of ranges? Maybe do OAI lookup first, since those provide a positive error report for items not found? How should I report errors?
Comments
Talk back! Let me know what you think. ...Eric
Jason Roy / 16 March 2009 / 08:02
Very nifty. I am impressed by the math ingenuity you used to force a desired size request. Is this all done dynamically upon hitting the "show me", or are you caching the entire collection somewhere locally on your machine?
efc / 17 March 2009 / 09:55
@Jason Roy: This is all dynamic. Nothing is being saved on the workflow machine ahead of time. The second demonstration is leaving files behind on the server, but only because I don't clean up after myself. A final script would not leave anything behind either.
Keith / 17 March 2009 / 10:17
Very clever and simple, Eric. The one-off method would provide a lot of flexibility in the future, especially as people add to existing Flickr collections. The batch method, I suspect, would serve initial creation of collections in Flickr. The error problem you note in the batch is a conundrum.