Main Page: Difference between revisions
(New text for main page explaining the concept for this wikibase instance) |
(No difference)
|
Revision as of 17:58, 10 March 2023
This is an experimental Wikibase instance where I am exploring a new method for handling the data that the Environmental Data and Governance Initiative's (EDGI) Environmental Enforcement Watch (EEW) project works to make sense of. This mostly comes from the U.S. Environmental Protection Agency's (EPA) Enforcement and Compliance History Online (ECHO) system.
EEW works to make best sense of the data the EPA puts online, developing various regular analytical reports that translate what ECHO reports on the legal environmental compliance of various facilities into an accessible form. They also develop code notebooks that help other groups, such as investigative journalists, understand and work with these data.
I've been interested in how we might regularize and institutionalize this same concept for all kinds of government data where we often have, especially now under various open government and open data policies, massive amounts of potentially useful government data flowing online. Just because those data are "open" doesn't mean they are accessible and usable. We often have all kinds of obscure codes and disconnected bits of data that make perfect (maybe) sense in the context of producing and working with those data within their source agencies but make no sense to anyone else without a whole of of deciphering.
I'm exploring what that deciphering might look like if we take more of a knowledge organization approach to the whole problem rather than a data integration approach. What if we try to fit as much of that massive amount of public data into the global knowledge commons as being pursued by the Wikimedia Foundation and its raft of contributors, especially through Wikidata. I'm particularly excited about the latest developments here with Wikibase instances in the cloud that are essentially being designed to be adjacent knowledge graphs focused in a particular domain and context.
Having experimented for the last couple of years with Wikidata itself, I can say with certainty that it's really quite hard to work legitimately within that whole system. It's a nontrivial process to be careful about semantic and structural alignment of concepts and not just throw things into a giant database. You have to figure out what other people mean in the properties available for use and the items used to classify other items. If that's not clear in the definition of those items, you have to dig into the history behind them to understand if they align with your own intent. If not, the responsible thing to do is to jump into the conversation and try to influence things in a direction that is a result of community consensus. That takes a lot of time and energy!
As an alternative, or more of a stepping stone, we can start with a clean instance of the same knowledge organization technology and work things out within a specific context. The responsible way to go about that still involves examining both the very messy global knowledge commons (Wikidata and related things) as well as other ontologies and sources of explicit semantics. As we build out own thing, we should establish linkages to other things, complete with notes on what those relationships mean and how they might be exploited in future. I tend to think we'll end up with a global knowledge commons that is more about pop-up knowledge graph indexes, perhaps using developing tech like Weaviate and others, that reach out and exploit these relationships to develop efficient point-in-time renderings that are optimized for use.
Part of the reason I'm interested in this dynamic for the EEW project is that they are like a lot of groups that really don't have the capacity to engineer and operate a bunch of big data tech. What if, instead, there was some tech that a small non-profit like this could push things to, focusing more on the code to do the work (which they have to anyway) and less on the foundational infrastructure for where the data goes? If that data infrastructure is also fully in the public domain and part of a well-established global organization dedicated to building the global knowledge commons, then we have a pretty good chance of developing something truly lasting.
Disclaimer: I'm a sometimes volunteer with the EDGI-EEW, but I spun this wikibase instance up on my own initiative and time. If it proves interesting to carry forward in some more official capacity with the organization/project, we'll recast it at that time.