Monthly Archives

June 2009

Metadata, Photography and Workflow for the Web

GeoBlog, How To, Photography, Wisdom

Metadata (n, pl): data about data. Any questions?

Recently, there has been a lot of discussion in photography circles about metadata: what is it, how to manage it, what is it good for, etc. Some of the photographers I follow in the blogosphere and more recently on Twitter have interesting things to say on the matter (look to the right for links to some of these guys). I decided to offer some comments about how I use metadata, in the hope these might be useful to other photographers. Who the hell am I and why do my comments matter, you wonder? Good question. I do not have much of a profile among photographers, which is somewhat intentional, but I do have a website that does well with the one search engine that really matters. By way of introduction, here is a short bio about me and about how my website developed over the last 11 years. During that time I have learned how to leverage photographic metadata on a photography website (at least search engines seem to like my site) and am willing to share some of what I have learned. As an aside, other than maintain a website I do no marketing whatsoever, nor do I send out submissions anymore. All of my licensing activity comes either because a client contacted me via my website, or through a couple of old-fashioned photographer-representative-type agencies I am with. Revenues stemming from my website outnumber the agency revenues about 8:1. I attribute this to the effective use of metadata on my website.

If your goal is to develop a stock photography website that shows up in search engine results, metadata about your photographs is crucial. Text, in particular metadata accompanying photos, is all that search engines are able to grab and hold on to as they try to index and spider a website. If your site displays beautiful images with little metadata to accompany them, your site stands a good chance of not appearing in meaningful search engine results. Except for specialized search engines that index image data directly (e.g., Tineye), search engines use the textual information on your site when evaluating it. This goes for images too — search engines will consider the text associated with an image when trying to categorize an image. If you have organized that text information well, and made sure it includes meaningful metadata about the image(s) that are displayed on that web page, that image or page at least has the potential to show up well in search results.

In my workflow there are three types of metadata that I am concerned with:

  • EXIF: shooting parameters, recorded by the camera
  • GEO: geographic data, if I am geocoding the images
  • IPTC: user-supplied information, describing characteristics and business matters related to the image or me.

Following is a description of my photography workflow, from the time the images are downloaded to a computer until my website is updated to include the most recent images. The percentages are the relative time it takes for each step, not including the selections, editing and Photoshop work which take place at the very beginning and which are independent of the metadata side of things.

Step 1: EXIF, The Default Image Metadata (5%)

First I edit the shoot down to keepers. Typically, each keeper is a pair of files: one raw and one “master”. The raw file automatically contains EXIF data about the shooting parameters, copyright information, etc. The master file, usually a 16-bit TIFF or high quality JPEG that is a descendent of the raw file having been processed in a raw converter and or Photoshop, contains the EXIF data as well. At this point nothing special has been done about metadata. The EXIF metadata that is already in the images was placed there by my camera, requiring no work on my part and is what I consider “default metadata”.

I back up my RAW keepers at this point. They have not been touched by any digital management or geocoding software; they are right out of the camera. These go on a harddisk and on DVD disks, and are set aside for safe keeping in case the RAW file is somehow corrupted later in my workflow. It has not happened to me yet, knock on wood, but one never knows…

Step 2: Geographic Metadata, Geocoding (optional) (5%)

If I have geographic location data, it is added now. I often geocode my images, which is the process of associating GPS information, e.g., latitude, longitude and altitude, with the image. I use a small handheld GPS to record the locations as I shoot, and these locations are added to the images by a geocoding program. Conceptually, geocoding gives the image some additional value, since it is now associated with a particular place at a particular time. Sometimes the accuracy of this geocoding is as tight as 20′ (6m). It usually just takes a few minutes to launch the geocoding application, point it to the images and the GPS data, and have it do its thing.

Having GEO data in the image, and later in the database that drives my website, allows me to do some interesting things with my images and blog posts, such as presenting them with Google Earth at the location where they were shot. For example, this photo of the Wave in the North Coyote Buttes is geocoded, and can be viewed in Google Earth by clicking the little blue globe icon. The same goes for most of the blog posts I have: they can be viewed in Google Earth at the right place on the planet. Here is another example. If you have Google Earth installed on your computer, you should be able to click on both of the next two links, which will open into Google Earth. One will display a track and the other will overlay photos, both from a recent aerial shoot around San Diego:

Yes, somewhat crude, but we are in the early days of geocoding and there will be more interesting things in the future we can do.

I’ve written a fairly lengthy post describing how I geocode images: How To Geocode Your Photos. At present, I use a free application named “GPicSync” to add GEO data into each image. This application will update the EXIF information in my RAW and master images to include latitude, longitude and altitude.

A bit of opinion: my belief is that having GEO data associated with your image, on your website, is almost certainly a good thing. Even if no person ever looks at it, there are new technologies coming online constantly that look for, index, spider, collate and retrieve images and web pages based on their GEO data. Those images and web pages that are lacking in GEO data will not see any of the advantages that these new technologies offer. I admit I am no expert on this, and the entire geocoding world along with the entities out there that are indexing geocoded webpages, is all rather new to me. However, I am certain that there will be visitors to my site, and probably already have been many, that arrive as a result of the GEO data that is present alongside my images and blog posts. Having the GEO data embedded in the metadata of the photograph is the first step in this process.

Step 3: Import Images into Digital Asset Management Software (5%)

I import the keeper images, both RAW and master, into Expression Media, which is the software I use for “digital asset management” (whee, yet another acronym buzzword: DAM). I’m no fan of Microsoft, but I do like Expression Media and am used to it (I formerly used its predessor, IView). In particular, Expression Media allows programs (scripts) to be written in Visual Basic. The scripting feature alone is worth its weight in gold as I will point out in the last step of my workflow, and is what makes my processing of images so automated now. I’ve written a dozen or so scripts. It’s quite easy. I have had no training, and have never read any manual for the software. I just based my scripts on examples I’ve found on the internet from other Expression Media users, modifying them to meet my own workflow needs. They carry out mundane tasks and really speed the process up, for example:

  • Set baseline IPTC metadata, including copyright notice, name, address, email, website.
  • Set baseline “quality”, based on the camera model information in the EXIF. In this way I can rank certain images higher on the website if they shot on a better camera, other factors being equal. I normally don’t want images shot with a point and shoot to appear before those shot with a 1DsIII. I’ve come up with a baseline ranking scheme to differentiate the following image sources relative to one another in terms of typically quality (not in this order however): Canon 1DsIII, 1DsII, 1DIIn, 5D, 50D, 30D, Nikon D100, Panasonic Lumix LX3, LX2, Nikon Coolscan LS5000, LS4000, various drumscans. I can easily fine tune this later for individual images, increasing or decreasing the “quality” of each image so that certain images appear first when a user views a selection of photos.
  • Determine the aspect ratio (3:2, 4:3, 16:9, custom) and orientation (horizontal, vertical, square, panorama) of the master image, which may be different than that of the raw image(s) from which it is sourced. This is important for cropped images and for panoramas and/or HDR images assembled from multiple raw files. The script recognizes the multiple raw files that are used to generate a single master file.

At this point my images have EXIF metadata, perhaps containing GEO data if a geocoding step was performed, and basic IPTC metadata that identify the image as mine, how to reach me, etc. So far all I have done is run some applications and scripts. I really haven’t done any “manual” keywording or captioning yet. If necessary, the images are now ready to place on the web, since they have a minimal set of metadata in them that at least establishes them as mine (DMCA anyone?). However, the most important step is to come.

Step 4: Keywording and Captioning (80%)

It’s time to add captions, titles, keywords, categories, etc. to the image. With my new images already imported in Expression Media, and already containing full EXIF metadata and baseline IPTC metadata, I am ready to begin.

  • Captions. There is no shortcut for this. Each image needs a decent caption. It is common to group images and assign the same caption to all of them, and then fine tune captions on individual images as needed. The notion of a “template” can be used too, and lots of different DAM applications support this. Whatever application you use to caption your images, there is no alternative but to get your hands dirty and learn how to do it, what approach works best for you. A key concept is to caption well the first time, so you don’t feel a need to return in the future and add more.
  • Keywords (open vocabulary descriptors). In general, the same notion as captioning applies here. However, DAM applications often have special support for keywords, allowing you to draw keywords from a huge database of alternatives, facilitating the use of synonyms, concepts, etc. Expression Media allows the use of custom “vocabularies”. A vocabulary is basically a dictionary. For animal images, I developed a custom vocabulary/dictionary of 26,000 species, including most bird and mammalian species, with complete hierarchical taxonomic detail. So, when keywording, I simply type in the latin (scientific) name for a group of images (all of the same species) and up pops a taxonomic record in the vocabulary, showing kingdom, phylum, family, genus, species, etc and a bunch of important scientific-gobbledygook for the species. Hit return and bingo, all the images I have highlighted are all keyworded with appropriate taxonomic metadata. Similar ideas work for locations. I do not do much keywording for “concepts” (e.g., love, strength, relationships, childhood) since I do not pursue that sort of thematic stock, there is enough of that in the RF and micro stock industries already. Here is a list of keywords I currently have among my images.
  • Categories (closed vocabulary descriptors). This is the third area of captioning that I find important. Images in my stock files are typically assigned one or more “categories”, and these categories are stored in the metadata of the image alongside captions and keywords. Some examples are: Location > Protected Threatened And Significant Places > National Parks > Olympic National Park (Washington) > Sol Duc Falls and Subject > Technique > Aerial Photo > Blue Whale Aerial. Here is a stocklist of categories I currently have among my images.
  • Custom Fields for the website. I have a few other metadata fields that are seen by website visitors that I set via Expression Media scripts. For example, once the captions are created, a script can be used to create “titles” for a group of images, which are really just excerpts of the full captions and can be used for HTML titles, headers, etc. For the most part, these additional metadata fields are secondary in importance to the captions, keywords and categories.
  • Custom Fields for Business Purposes. In addition, I use some metadata fields for recording characteristics of the image that I need to track for business reasons. These include licensing restrictions, past uses that affect exclusivity, etc. These metadata are embedded in the image so they are sure to travel with the image as it moves to a client, but they are not presented to the public on the web site.

Note that I consider keywords to be “open vocabulary”, in the sense that any keyword can be used with an image. In other words, I don’t hesitate to add keywords that I have not yet used, its an open set and grows as needed. This is especially true of synonyms, but one doesn’t want to get too carried away with synonyms or it can dilute the search results that a web visitor sees. I often add keywords to images that are already in my stock files at a later date. However, I treat categories as “closed vocabulary”, in that I have a relatively fixed set of hierarchical categories. I will introduce a new category when it makes sense, but usually only when there is a sufficiently large group of images to which it applies, and there is not already a similar category in use.

Once all the metadata for the keepers in my latest shoot are defined in Expression Media, they need to be written out to the images themselves. In other words, Expression Media is aware of these things, but if one were to open one of the images (RAW or master) in Photoshop the new metadata would not be there. This last step in Expression Media is referred to as “syncing” the annotations. (“Annotations” is Expression Media’s word for metadata. I guess “metadata” is scary to people.) I highlight all the files for which I have been adding metadata, then Action -> Sync Annotations -> Export Annotations To Original Files and click “OK”. All the metadata is now stored in the images themselves, and will flow into any derivative images that are created, such as the thumbnails and watermarked JPGs that go onto my web site. (Think DMCA!).

Step 5. Downsteam, or, “Go Forth My Minions” (5-10%)

If I have defined the metadata once there is no need to do it ever again. The metadata, which is now contained in the DAM application but also in the header of each image, “flows downstream” with no further effort. For my purposes, “downsteam” can mean a submission of selects sent to a client, or a submission of images to an agency, or an update of my website.

Downsteam to Clients

There is not much to say here. Best practices in delivering images to clients include using metadata properly. If you are sending out images to clients, or to stock agencies (the old-fashioned kind that actually represent their photographers) or to, for shame for shame, stock portals (RF, micro, they are all evil), then you should have rich, accurate metadata embedded in your image. It is the only way to ensure that the information travels with the image. I’ve received submission requests from potential clients who simply wanted JPGs submitted as email attachments, with the proviso that if a JPG did not have caption and credit embedded in the metadata it would be immediately discarded without consideration.

Downstream to the Web

For many photographers, the final step in processing a new shoot is to update one’s website. In other words, get the new images along with all their metadata (captions, keywords, GEO locations, categories, etc.) onto the web so that they can be seen by the entire world.

For photographers who are using a “gallery” of some kind to host their web site (such as Smugmug, Flickr, PBase, or any of the freely available installable gallery softwares, etc.), simply uploading the images into a new (or existing) gallery is usually all that is necessary. Provided you have managed your metadata in step 4 properly, the metadata will be present in the headers of your new images. As these images are uploaded to the gallery, the gallery software peeks into the header of each image for metadata and, if it is found, extracts the metadata and prepares it for display alongside the image. The details of what metadata are used (caption, keywords, location, GEO, name, copyright, restrictions, EXIF, etc.) differ somewhat from one gallery provider to another, but the general idea is the same.

However, see the final notes at the end of this post for a few caveats about how gallery software may alter your metadata as it processes your image.

My situation is conceptually the same. My website software is essentially a “gallery” including a pretty extensive search feature. However, the software was hand written by me and does not extract metadata from image files automatically like the big-boy galleries do. (Perhaps someday I’ll figure out how to do that.) As I described a few days ago, my web site evolved to be written entirely in PHP and MySql. Underneath the website there is a database that contains information about all 25000 images in my collection. Basically, this database **is** the metadata for my images, or a summarization of those metadata. The database has one record per image. Each record stores the metadata for that image: caption, keywords, image name, location, GEO data, categories, orientation, etc. etc. That said, the issue for me is: how to create this database? The gallery software in the previous paragraph does this automatically, but my home-brewed web software does not.

The beauty of using Expression Media for DAM in my workflow is that with a single click, Expression Media can create this database for me. (Although I have not used other DAM applications, I am sure they are similar.) Expression Media has a few ways of doing this. I could use Expression Media’s built in export functions (Make -> Text Data File or Make -> XML Data File). But after doing this for a while I decided to write a BASIC script within Expression Media that creates the database while doing some fine tuning and error checking on the metadata fields as it does so. Either way, if I use a script of my own or Expression Media’s built-in export features, the database is easily created. Then it is simply a matter of uploading the database along with the images when it is time for a website update.

The point here is that once the work is done in the DAM application, it should be a very quick process to upload the images and metadata to the web and get the images out there for the world to see. Then, if all goes well, the phone rings.


After all that work defining the metadata for your images, and ensuring that it is embedded properly in each image, you would think you are home free, right? Well, there are a few provisos you should know.

Metadata Can Be Stripped By Gallery Software

Some stock portals, gallery hosting services, or install-yourself gallery software (usually written in PHP) will strip metadata from an image. That’s right, they will strip it right out of your image! Why? They claim the reason is to shrink the JPGs that are displayed on the web, in an effort to reduce bandwidth. While this is true, it is a big mistake in my opinion, and is one of the principal reasons I am not involved in any of the stock portal sites or popular photo hosting services. I want my metadata to stay with the image wherever it goes, to all derivative versions of the image. The few extra bytes of storage required for this are trivial compared to the importance of this data being preserved. Think DMCA! Think Orphan Works!

Metadata Can Be Stripped By A Thief

When a thief, or some unwitting schoolkid, makes a copy of your image off the web, the chances are quite good the metadata will be stripped. If the image is taken via a screen shot, the metadata will disappear. If the thief/kid uses “right-click and Save As”, the metadata should remain in the image. But in the end, if the thief/kid alters the image in Photoshop and uses “Save For Web” to save a new copy, the metadata will probably be stripped out. (Yes, Save For Web can optionally preserve metadata, but it is easy to configure Photoshop so that it strips metadata from the image in “Save For Web”, and older versions of Photoshop do not offer the option to override this.)

Too Much Metadata Can Be Displayed

The photo hosting sites seem to display the EXIF fields (shooting data) of your photo’s metadata. This may or may not be what you want. Among hobbyists there is little concern about making the date, time of day, and technique (ISO, shutter speed, aperature) known. Indeed, it is one of the ways that we learn, by understanding what others have done. But often pros have good reason to keep this information to themselves. So, the caveat here is: if you are using a photo hosting service and you don’t want the EXIF data in your image available on the web, you may need to take steps to prevent it.

The Evolution of

General is a natural history stock photography website that first appeared in 1998 as an exercise to learn what the world wide web and websites were, learn to write the HTML to bring a site into being, get it hosted and see if the world thought anything of it. Considerable thanks is owed to Mike Johnson, a good friend and skilled photographer with sublime images of pelagic animals and blue whales, who offered much early advice about the entire process. For the first few years, the only photos on that were worth looking at were blue whales (and even the descriptive “worth” is questionable). The pages were static and created either by hand or with primitive tools such as NetObjects Fusion.

As inbound links to began to accumulate and the resultant traffic (mostly from AltaVista and later Google) built, more images were added to the site and publishers began to contact me to license them, usually for use in editorial books, magazines and news publications. I realized that had become a defacto stock photography enterprise, and was actually one of the first of its kind for marine and natural history photographs on the web. I was represented by a couple small agencies but had to learn how to field requests and license images properly on my own. Sometime around the turn of the century, armed with about 1000 images and a need to search by keywords (open vocabulary) and hierarchical relationship (closed vocabulary), I decided to learn PHP and MySql in an effort to create what has now become a powerful, well-indexed and comprehensive online image search program. The result is so effective, in fact, that many of the subjects of which I have coverage now appear quite high in Google rankings. For example, Google “kelp forest photo“, “Guadalupe Island“, “blue whale photos” or “Carcharodon carcharias photos“; as of January 2005 (and October 2007, and June 2009), these all show up in the top 3 or 4 Google results, some of them via, a companion site of mine that is driven by the same self-authored PHP/MySql/search code. Alas, it is inevitable that as better photographers than I shoot these same subjects, my pages are bound to lose traction in the Google ranks. But at the same time my setup allows new subjects to quickly gain traction and show up in Google, e.g., Mobius Arch, The Wedge, Silver Salmon Creek Lodge. While there are exceptions, in general most of the animal and plant subjects for which I have coverage will appear on the first page of Google results when searched by their latin/scientific names, e.g., Zalophus photos, Corynactis photos, and often by their common names as well.

The last 6 years or so have seen an acceleration in the process of making photos, getting them on the web and in front of photo researchers and publishers, and licensing them. I am adding about 4000 new images to the library each year, using Canon digital cameras (Canon EOS-1Ds Mark III & II) with lenses like the 500 f/4, 400 f/5.6, 300 f/2.8, 70-200 f/2.8, 24-70 f/2.8, 16-35 f/2.8 II and 15mm fisheye (all killer lenses).

The image search, keywording and categorization aspects of the photo library are now highly automated and need little further work, so that as new images are added to the stock files they appear online with rich metadata in a few days, and are eventually indexed and have the potential to appear in Google search results rapidly. The addition of textual (non-image) content naturally requires more time. Some photographers hand-build individual pages for their subjects. I just don’t have the patience for this, so instead I use weblogging software to add new text content to the website. Currently, I use WordPress that I have customized in a number of ways. There are 650+ posts so far, as of June 2009.

At present, has a Google rank of 6 (update: looks like it just changed to 5, huh?) and receives about 5000 unique visitors (omitting robots and crawlers) each day. Sure, there are other measures of a web site’s traffic and relevance. However, I think Google’s opinion of my website is more important than anyone else’s, and counting the unique visitors to a site is a no brainer. These numbers are quite good for an individual photographer’s web site, and I think they are attributable primarily to smart use of metadata, longevity, inbound links from people who have found my site worthy, and simple HTML design. Note that I have never placed any advertising on my site, and probably never will. All the traffic is organic; I have never resorted to link exchanges or any of those get-ranked-quick gimmicks. By the way, I found a tool that can help one fine tune a website for SEO and web presence, and described it in a post entitled Post Up … Shoot … Score.

Fluid Carpet, Abstract Photo


OK, I admit it, my youngest daughter shot this image. I let her loose with our uber-mikro-digi-kamera one day while we were crashed at a hotel. I set the camera up for long exposures, and this is one of the images she came up with. Kids don’t know any rules when it comes to photography. Most of them don’t even remember film. They just know that it costs nothing to snap an image, so they snap and snap and snap. They don’t care about the junk shots, only about the one image out of the many that has something going on and captures the viewer’s interest. Kids try things with a camera that we (insert: real photographers, old people, you, me, GenXers, boomers, the ancients) might never think of. Sarah held the camera just a few inches above the carpet as she walked down the hallway to get this smeared image. I liked her shot so much I stole her idea and got a keeper of my own. Today’s abstract photo, #13 of 15.

Patterns in carpet blurred into abstract by time exposure

Patterns in carpet blurred into abstract by time exposure.
Image ID: 20570

Layers, Abstract Photo

Abstract, La Jolla

Another abstract cloud photo. I like using a medium telephoto lens to isolate landscape elements, and patterns in clouds are no exception. This was probably shot with a 70-200 on Velvia film, vintage. Moments after the green flash, orange skies over La Jolla. Today’s abstract photo, #12 of 15.

Clouds and sunlight, La Jolla, California

Clouds and sunlight.
Image ID: 04818
Location: La Jolla, California, USA

Sunset Booby, Abstract Photo

Abstract, Galapagos Diaries

Our days at Darwin Island in the Galapagos islands have been fantastic. On each of our trips we spent several days, sometimes almost a week, at this usually spectacular, remote and wild place. The diving can be, of course, unsurpassed which is one reason that virtually all visitors to Darwin Island are divers. Too bad, since the place is insanely dense with bird life. Birders would love this place, but I doubt many ever see it since the island has no approved land visits (that I know of). We spend lots of time between dives during the day and while sipping margaritas on the rooftop deck at sunset, watching the hordes of birds come and go. Upon waking each morning one naturally steps out on deck to see how the day is shaping up. Towering columns of birds lit by the sunrise, soaring on the warming updrafts and moving out to sea by the thousands, rise above the sheer sides of the island. The cacophony of bird sounds is impressive. Throughout the day frigatebirds and boobies perform their neverending parts, with boobies diving for food offshore and frigates trying to spook them into disgorging their catch as they fly back to land. This bird, likely either a blue-footed booby (Sula nebouxii) or Nazca booby (Sula granti), is blurred as it is seen against the pastel hues of sunset. Today’s abstract photo, #11 of 15.

Booby in flight, motion blur, Darwin Island

Booby in flight, motion blur.
Image ID: 16686
Location: Darwin Island, Galapagos Islands, Ecuador

Clouds on Fire, Abstract Photo

Abstract, Hawaii

Resuming the series of abstracts (before it was so rudely interrupted with bird pics): today’s abstract photo is an image of clouds on fire, taken from the lanai of Skip’s seaside surf pad in Napili, looking out over Lanai and Molokai. Wow, did we ever have some epic sunsets in those years we were doing whale research on Maui! #10 of 15:

Clouds and sunlight, Maui

Clouds and sunlight.
Image ID: 05640
Location: Maui, Hawaii, USA

Northern Cardinal Photo

Arizona, Birds

Northern cardinal (Cardinalis cardinalis). This was the other small songbird that I hoped to see in Arizona. It is very similar to the Pyrrhuloxia (see yesterday’s post). In fact the female cardinal looks a lot like the male Pyrrhuloxia at first glance, although the shape of the beak (among other things) is diagnostic.

Northern cardinal, male, Cardinalis cardinalis, Amado, Arizona

Northern cardinal, male.
Image ID: 22891
Species: Northern cardinal, Cardinalis cardinalis
Location: Amado, Arizona, USA

Northern cardinal, female, Cardinalis cardinalis, Amado, Arizona

Northern cardinal, female.
Image ID: 22929
Species: Northern cardinal, Cardinalis cardinalis
Location: Amado, Arizona, USA

Shot at Bill Forbes’ Pond at Elephant Head, which I visited and described recently.

Pyrrhuloxia Photo

Arizona, Birds

Pyrrhuloxia (Cardinalis sinuatus). This was one of the two birds I was hoping to see in Arizona. In general, we do not see small colorful birds like this in Southern California (except for escaped exotics like parrots). At first I thought Pyrrhuloxia was the latin (scientific) name for this bird, but then I learned the latin name is Cardinalis sinuatus. So I guess the Pyrrhuloxia is closely related to the Cardinal (see tomorrow’s post). Regardless, it’s a pretty little bird.

Pyrrhuloxia, male, Cardinalis sinuatus, Amado, Arizona

Pyrrhuloxia, male.
Image ID: 22894
Species: Pyrrhuloxia, Cardinalis sinuatus
Location: Amado, Arizona, USA

Shot at Bill Forbes’ Pond at Elephant Head, which I visited and described recently.