dmBridge and MDL

Just a few notes about dmBridge, a potential successor to the hacks Eric devised for comments and Flickr.

Next steps

  • get searches working (they are still reporting errors)
  • try some template building
  • do a trial transfer of MDLSSR comments to dmBridge

Importing comments

I got some help from Alex in writing a script to import comments to dmBridge.

Some examples for discussion:

Still have to resolve an OCLC timeout issue that killed the full upload of comments before it was finished.

January progress to date

Following the holidays I grabbed a fresh copy of dmBridge (release 2179) and reinstalled using a process documented in the readme.txt file on our site. All the basics seem to be working now, and I added a number of issue reports to Alex’s tracking list.

If you want to poke at MDL’s dmBridge, you can give these links a try: our control panel, an XML list of all our collections, an individual record seen through dmBridge, and the dmBridge search form. Note that this last one, the search form, takes a very long time to load because dmBridge is building a list off all the collections a few times over. Clearly Reflections demands a simpler search form to improve performance.

Started to experiment with templates. Created the “ericone” template and attached it to the MHS collection. This mhs/502 record will display it.

December progress report

Following a very productive call with OCLC on 12/17 I got a fresh installation of dmBridge up and running in our hosted environment. OCLC had added the necessary PHP components and taken steps to improve PHP performance on the hosted system. They gave us access to a “FastCGI” enabled version of Minnesota Reflections at http://dmbridgedev.contentdm.oclc.org/ and soon after the call added support for SQLite as well. This allowed me to really try out all the features of dmBridge and give Alex some feedback.

November progress report

Got an initial instance of dmBridge running in our CONTENTdm hosted environment. While it more or less worked, there were some missing PHP components that made the Control Panel inoperable and performance was overall terrible. I contacted OCLC about these issues and a call was set for December, after OCLC completed the CDM 5.2 rollout in the hosted environment.

OCLC CONTENTdm Support

Early testing notes

Hosted CONTENTdm seems to be missing some PHP libraries that would improve the dmBridge experience. In particular cURL (http://us2.php.net/curl) and PDO (http://us.php.net/pdo). This can be seen by visiting this test page: http://reflections.mndigital.org/custom/efc/info.php.

Hosted CONTENTdm is also very very slow, maybe because of the use of CGI for PHP.

[Dreamliner] ~ % date;curl -o dmb.html "http://localhost/~efc/dmb/dmtemplates/basic/?r=hughes/87";date
Fri Nov  6 14:16:56 CST 2009
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  8987    0  8987    0     0   2233      0 --:--:--  0:00:04 --:--:--  2234
Fri Nov  6 14:17:00 CST 2009
[Dreamliner] ~ % date;curl -o dmb.html "http://localhost/~efc/dmbmdl/dmtemplates/basic/?r=jhs/87";date
Fri Nov  6 14:17:27 CST 2009
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  6809  100  6809    0     0     64      0  0:01:46  0:01:46 --:--:--  1660
Fri Nov  6 14:19:13 CST 2009

That’s 4 seconds at Nevada, 106 seconds hosted (over 26 times slower)!

UPDATE: on 091217 I learned that OCLC was experimenting with FastCGI in its hosted environment. They kindly provided a link to a version of MDL Reflections running on a FastCGI-enabled server and the results are very encouraging.

[Dreamliner] ~ % date;curl -o dmb.html "http://localhost/~efc/dmbmdl/dmtemplates/basic/?r=jhs/87";date
Thu Dec 17 14:27:09 CST 2009
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  6812  100  6812    0     0   1479      0  0:00:04  0:00:04 --:--:--  1891
Thu Dec 17 14:27:14 CST 2009

It appears the OCLC hosted environment now matches UNLV.

Another example, hitting the API directly…

[Dreamliner] ~ % date;curl -o dmb-nv.xml "http://digital.library.unlv.edu/api/1/objects/hughes/87.xml";date
Fri Nov  6 14:25:51 CST 2009
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4509  100  4509    0     0   7632      0 --:--:-- --:--:-- --:--:--  9675
Fri Nov  6 14:25:52 CST 2009
[Dreamliner] ~ % date;curl -o dmb-mn.xml "http://reflections.mndigital.org/dmapi/?r=objects/jhs/71";date
Fri Nov  6 14:27:54 CST 2009
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3825    0  3825    0     0    637      0 --:--:--  0:00:06 --:--:--     0
Fri Nov  6 14:28:00 CST 2009

That’s under a second for Nevada, about 6 seconds for hosted.

UPDATE: on 091217 I also tested this with OCLC’s experimental FastCGI enabled server…

[Dreamliner] ~ % date;curl -o dmb-mn.xml "http://dmbridgedev.contentdm.oclc.org/dmapi/?r=objects/jhs/71";date
Thu Dec 17 14:29:03 CST 2009
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3835  100  3835    0     0  10525      0 --:--:-- --:--:-- --:--:-- 14363
Thu Dec 17 14:29:03 CST 2009

Again, very encouraging results, in line with UNLV.

Comments

Alex Dolski / 06 November 2009 / 23:16

I was looking into PHP on IIS (which I know nothing about) and it and it looks like there are two ways to run it - CGI mode or ISAPI mode, which I guess is kind of like mod_php for Apache. But ISAPI mode is gone as of PHP 5.3, so it’s not a long-term solution.

I consider the HTTP API to be slow even on our server. I have some more interesting performance data for you.

??? / Windows / IIS / PHP 5.2 / CGI / CONTENTdm 5.1
http://reflections.mndigital.org/dmapi/?r=1/collections/army
<querySeconds>2.2529</querySeconds>

UltraSPARC 1.5GHz / Solaris 10 / Apache 2 / PHP 5.2 / mod_php / CONTENTdm 4.3
http://digital.library.unlv.edu/api/?r=1/collections/uw
<querySeconds>0.16856</querySeconds>

Athlon 64 4400+ / Solaris 10 / Apache 2 / PHP 5.2 / mod_php / dmulator 0.1
http://soldev/dmapi/?r=1/collections/test
<querySeconds>0.0382</querySeconds>

Core 2 Duo 2.16GHz / OS X 10.6 / Apache 2 / PHP 5.3 / mod_php / dmulator 0.1
http://localhost/dmapi/?r=collections/test
<querySeconds>0.02027</querySeconds>

The last two were achieved with dmulator, my contentdm PHP API emulator (http://code.google.com/p/dmulator/) that loads dummy data from XML files instead of from CONTENTdm. So somewhere in the hardware/software stack of the OCLC web server is the explanation as to why it is 110 times slower than my 3-year-old iMac.

efc / 19 December 2009 / 14:26

I just ran this test again with the new FastCGI enabled test machine at OCLC…

??? / Windows / IIS / PHP 5.2 / FastCGI / CONTENTdm 5.2
http://dmbridgedev.contentdm.oclc.org/dmb/api/?r=1/collections/army
<querySeconds>0.17616</querySeconds>

So, this looks like it is now down under 10 times as slow as the old Mac. And remember the old Mac is not really running CDM, but rather a CDM emulator. This looks like good news.