PASS Workshop at Harvard
by Jen Golbeck
The PASS (Provenance Aware Storage Systems) Workshop held at Harvard May 31 covered issues related to provenance and file systems.
We began with Margo Seltzer presenting an introduction to the PASS system (http://www.eecs.harvard.edu/syrah/pass/) and how it stores the full provenance of files. This includes all of the operations performed, libraries opened in creating the file, time, user, kernel information, etc.
Yong Zhao from the University of Chicago presented the provenance system that is used with the grid work conducted there and at Argonne labs. http://vds.uchicago.edu/twiki/bin/view/VDSWeb/WebMain
Kiran-Kumar Muniswamy-Reddy from Harvard gave more information about the BDB used with the PASS system and explained what tradeoffs were involved with choosing that database for provenance. He addressed that provenance is more complicated than metadata for a file because provenance persists after a file is deleted and it can be referenced when a different file is being looked at.
Later in the afternoon, I gave a talk on how trust can be used with provenance and Uri Braun from Harvard discussed security issues. The idea behind my talk was to introduce how social features could be used to limit or grant access to provenance information when security is not critical, but when privacy is still a concern. I got a lot of interested questions, and one of the more interesting points that came up was using trust in the reverse of how I described it in this talk. Instead of using trust/social networks to determine who can see information, we can use it to determine if we trust the information. I think both are good ways to use trust with provenance. If I made a document, I may only want some people to see it. If I receive a document, even if I can’t see who worked on it, I may see that only people I trust at 7 or higher worked on it and that is also useful. Uri’s talk was also about access control, but more role based and secure than what I proposed in the social network.
The final discussion was on standardization efforts on provenance. This was given by Luc Moreau, and was similar to discussions we had at the iPaw conference earlier this month. He also presented a provenance challenge with provenance-aware java. More details on that are available here: http://twiki.grimoires.org/bin/view/Challenge/FirstProvenanceChallenge
Overall, the workshop was very interesting and useful. Most workshops are organized as small conferences, but this was very interactive and given in the context of an ongoing project at Harvard. That gave it a context that we could frame our arguments with. I am convinced that there are many provenance issues that will benefit from the semantic web. Particularly as the semantic web and grid computing come together, these issues of provenance will become more pressing.
