Archive for the 'OWL' Category

Ontology Management on the Gartner Hype Cycle!

September 28, 2016

The Gartner hype cycle is an analytical construct (of sorts) which tries to capture the relation between a technology and the expectations we have for that technology. It’s based on the pretty reasonable observation that esp with new technology, there’s a tendency for expectations to outrun the current or even potential benefits. Everyone wants to use the new glittery magic, so vendors and specialising consultants do very well for a while. But it turns out that the new technology isn’t magic, so people find that they’ve spent a bunch of money and time and energy and they still have the problems the tech was supposed to magically solve. This leads to a crash in expectations and a backlash against the tech. But lots of new tech is actually useful, used appropriately, so some of the new tech, its shiny worn off, finds a place in our toolkit and tech landscape. The Gartner hype cycle is a pretty iconic graph with fun-ish labels:

(The y-axis gets different labels over time.)

And people try to operationalise it:

Hype-Cycle-General.png

But I’m skeptical about a lot of this as being rigorously evaluate.

Of course, sometimes a tech takes off and doesn’t really stop. It goes pretty straight from trigger to productivity. The iPhone/iPhone style phones comes to mind. It Just Grew. It may level off as it hits saturation, but that’s a completely different phenomenon.

This is all pretty banal stuff, but Gartner takes it very seriously (they’ve branded it!).

ANYWAY, this year’s hype cycle, excitingly, includes ontology management for the first time! WE’RE ON THE MAP!

  • 16 new technologies included in the Hype Cycle for the first time this year. These technologies include 4D Printing, Blockchain, General-Purpose Machine Intelligence, 802.11ax, Context Brokering, Neuromorphic Hardware, Data Broker PaaS (dbrPaaS), Personal Analytics, Smart Workspace, Smart Data Discovery, Commercial UAVs (Drones), Connected Home, Machine Learning, Nanotube Electronics, Software-Defined Anything (SDx), and Enterprise Taxonomy and Ontology Management,

Alas, if you look at the graph, we’re on the downslope into the Trough of Disllusionment:

And it has a “more than 10 years” to mainstream adoption label.

Ouch!

This is discouraging and perhaps hopeful. Remember that the hype cycle doesn’t tell you much about the qualitymaturity, or utility of the technology, only the perception and influence of perception on the market. (To the degree you believe it at all.) 10 years to mainstream adoption is not 10 years from being a boon for your business or a viable business itself. It means you will often have a hard sell, because people are skeptical.

Update: Oh WordPress. Picture management please.

Recent Keynotes

January 7, 2016

It’s not often that I can have a post about my doing any sort of keynote…depending on how you count, I’ve only done one or maybe two before 2015. Then there were two in 2015!

The first was at OWLED. This year I went heavily (again) for HTML based slide decks, so you can browse my reveal.js slides (with an embedded timeline!). This was, obviously, about the Web Ontology Language past and possible future.

The second was at SWAT4LS and it was a call to take on the challenge of representing all medical knowledge.

Both were very well received. I hope to do a video of the SWAT4LS one as I had several requests to replicate the full experience.

There’s a fair bit of overlap between the two because I am currently fairly obsessed with the problem of representing and sharing evidence.

Both of these, OF COURSE, involved all nighters. The writer’s block thing is tough.

But, I give a pretty good talk! I would do more keynotes (hint hint).

So many keynotes are just horrific. I got to the point where at many conferences, like ISWC, I just don’t go because they are so insultingly terrible I lose it. (Ron Brachman gave an epically awesome on at DL. Marvin Minksy gave an epically awful one at AAAI.)

Keynotes can be great because they give space to do something different. You don’t have to report on a paper. You can explore an wacky idea. Or synthesise some history. This is actually a cool part of dissertations: You have the room and the right to put things down that don’t fit anywhere else. Keynotes should be like that. I don’t want a fluff piece or giant pile of ego boo. I want something interstitial…vicera which pull the research organs into place. It doesn’t have to be profound. It doesn’t have to be controversial. (Though that can be good.) But it should be distinctive and preferably fun!

(See this earlier post about my speaking anxiety.)

More New OWL Syntax

July 30, 2010

Thus far, I like my New OWL Syntax (though it needs a name). I think the key refinement though is in the counting quantifiers. The slashes just didn’t work, so we have curly brackets with the numbers paired with the right or left one depending on whether you want max, min, or both.

So, a rough grammar

Sub ::= =>
Equiv ::= =
AxiomConnective::= Sub | Equiv
Quantifier ::= Existential | Universal | Counting
Existential ::= <Role>
Universal ::= [Role]
Counting ::= {number Role} 
      | {Role number} 
      |  {number Role number}
Restriction ::= Quantifier Concept
ConjOrDisj ::= Concept (& | v) Concept (parens if needed)
Negation ::=  ~Concept
Concept ::= Restriction | ConjOrDisj | Negation | name
TBoxAxiom ::= Concept AxiomConnective Concept.
ABoxAxiom ::= name:Concept | <name, name>:Role
Role ::= name.

So this gets us ALCQ. An example (some axioms ripped from Koala, I don’t have nominals yet):

Parent = Animal & {1 hasChildren}Thing.
DryEucalyptForest =>Forest.
Koala => Marsupials & <hasHabitat>DryEucalyptForest.
Marsupials => ~Person.
Animal => {1 hasHabitat}Thing & {1 hasGender 1}Thing.
StudentWith3Daughters = Student 
               & ([hasChildren]Female 
                                  & {3 hasChildren 3}Thing)
fluffy:Koala.
<fluffy, sandy>:hasChildren.

I’m not thrilled by my stealing of the nominal constructor ({}). I thought about reusing <> and [] (just adding numbers). This works really well for the existential (since a min N really is N somes, plus a little), but max isn’t that close to the universal for most people. Another problem is that using & and angle brackest means using the corresponding entities in HTML or XML which is a common typing place for me.

Binary and and or can be annoying as well.

New OWL Syntax

July 28, 2010

I’ve been obsessing a bit about artificially generated satisfiability problems (KSAT). The work in propositional logic really is outstanding and, upon revisiting, the work in modal and description logics is pretty interesting (after some mis-steps). Pavel and I have been doing similar experiments for probabilistic SAT (both propositional and SHIQ based). People are often annoyingly dismissive of such experiments. (Grumble about one review that killed our last paper.) But that’s not what I’m after today.

As I wish to do experiments, I need generators (there’s analytic work as well, of course). So I’m writing some. Or starting to. Since I want to use OWL reasoners, it makes sense to target OWL. This is a bit tricky as OWL doesn’t directly accommodate concepts by themselves (you need axioms). Now, of course, we can use the expression syntax of OWL/XML or even of OWL/RDF (shudder), but these won’t be legal OWL files. One can, of course, hack around this using class assertions or the right subsumption axioms (and I plan to do all that). But it’s awfully verbose and not very descriptive (e.g., no tag says “clause”). So, I thought about designing a new XML format which I could then write translators over.

It’s verbose. Nastily nastily verbose. If you look at the DIMACS SAT problem format, you get a sense of what a mess this all is. Of course, DIMACS just uses numbers for variable names, But Still.

I could use Manchester Syntax, but it’s a bit awkward on a number of fronts. I want something typeable and which looks reasonably like so-called “German” DL syntax. So, here’s a sketch, first restricting to ALC. An example:

Person = Parent v ~Parent.
Parent = <hasChild>Person.
[hasChild]Happy & Parent => HappyParent.
Parent != Childless.
Childless = ~Parent.

So this is a simple TBox. The sentential operators are the usual v, &, and ~ (for or, and, and not, resp.). The quantifiers are modeled after the modal possibility (diamond, existential) and necessity (box, universal) operators with their “grading” (property) inside. The conditionals, I’m less sure of. I think I prefer “=” for equivalence to “” or “”. It reads naturally. But then implication feels a touch hacky. Perhaps “->” would be better. “!=” is sugar for disjointness and I want that for the n-ary case.

Quantifiers bind tightly so if you want HappyGrandparent, you need parentheses:

[hasChild](Happy & Parent) => HappyGrandParent.

For the ABox, I’m torn between classic so-called German style and first order functional style:

maria:Parent.
<maria,bijan>:hasChild.

Parent(maria).
hasChild(maria,bijan).

German style has a minor readability advantage for class assertions (maria isa Parent). I guess we could mix them.

Things get interesting with the counting quantifiers. I want to keep the quantifier style consistent, which has led to the following design using the delimiters /../ for at least, \..\ for at most, and |…| for exactly:

/2 hasChild/Hyper => FrazzledParent.
|1 hasChild|Person = ParentOfOnlyChild.
BelowReplacementFamily = \2 hasChild\Person

Yeah. The exactly seems fine. The intuition I’d give for min and max is that if the slant is rising that indicates a fan out of successors, whereas if it is descending, it indicates a cap.

I suppose I could use the pipe + “min” and “max” (or even > and < but then we need equals and it gets messy or confusing).

Explanation of “Progress?”

December 22, 2009

So, my post Progress? was snark and a bit insidery at that. Given that I’ve already had one person give me the puzzled comment about it, it seems right to give in and just dissect the damn frog already.

In ontology circles, esp. of the description logic variety, “explanation” is a key feature of knowledge based systems. Sometimes you hear the story of how the early expert system, Mycin, would diagnose with high reliability but doctors wouldn’t accept those diagnoses. Then they added an explanation facility and suddenly doctors would accept it. Thus, explanation is necessary to trust the system. (There’s actually some reasonable empirical evidence to show this for even modern expert systems, e.g., for financial analysis.)

Since early forms of explanation were essential “traces” of the system reasoning, it’s an easy step to say that explanations are “proofs”. This also hooks up with Hempel’s deductive nomological model of scientific explanation (i.e., a deductive argument with at least one scientific law as a premise).

However, when we talk ontology engineering, and esp. various debugging tasks, the most successful form of explanation has been so-called (and so-called by me, alas) justifications, that is, minimal subsets of the original theory which suffice for an entailment to hold.

Stefan Schlobach (and friends) almost certainly gets priority in the 2000, with loads of antecedents in and out of the DL community. Aditya, Evren, and I sorta reinvented them when trying to produce something from Pellet for the InferenceWeb system, and never looked back.

Borgida (the fellow I’m pretty sure I was quoting) had some work on explanation that was more proof oriented. This rather recent paper picks that up. At a DL workshop (where the first, more categorical paper was given) he got some push back (I’m pretty sure from me). The later paper is less categorical.

In the end, I really hope that people get away from the “Proofs are explanations, the best explanations, and the only explanations” mindset. I don’t think it’s fruitful.

On Facebook, Enrico again tried to frame everything in terms of proofs in some (perhaps varying) system or partial proof in a given system (the latter is super weak given we have decision procedures…every entailment has a finitely findable proof, so any string of lemmas is trivially convertible into a partial proof, in some sense).

But this is really analytically bogus. It’s unclear what benefit comes from reducing everything to proofs, partial or otherwise. (And what about model based explanations, esp. for non-entailments? Or analogical explanations such as with case based reasoning? And justifications really really are “just the premises”.) Does this clarify anything? Don’t we just reinvent the distinctions we want inside the framework (e.g., with partial proofs)? I certainly do not see how it would help Borgida et al, after all, there is no dialectical point in stressing that explanations are proofs when bolstering a fairly specific kind of proof as explanation if the first occurrence of proof provides almost no constraints.

In the end, I’ve no doubt that proofs can serve, even in the ontology engineering setting, some explanatory purposes. Heck, I’m researching them for these purposes! I also don’t much care if other people want to be blinkered, or want to bash justifications, or what have you, at least, I don’t see such bashing (directly or slyly) as substantive criticism. I do think it’s worth sitting down and trying to understand what such statements (or beliefs) actually buy us.

My current thought is “Not much, and probably are somewhat harmful.” I’m willing to be convinced otherwise if someone has the goods.

Progress?

November 27, 2009

From, Explanation in DL-Lite presented May, 2008:

As far as explanations are concerned, it is almost universally accepted that they are formal proofs, constructed from premises using rules of inference.

From, Explanation in the DL-Lite Family of Description Logics presented November, 2008:

It is widely accepted that an explanation corresponds to a formal proof. A formal proof is constructed from premises using rules of inference.

From Progress? published November, 2009:

Proofs? Schmoofs!

What are the units to test in an ontology?

November 27, 2009

Every now and again someone suggests that we need or should have “unit tests” for ontologies. This isn’t surprising since ontology engineering generally looks to software engineering for inspiration and unit testing is still reasonably hot. We don’t know a lot about testing ontologies, so picking a hot testing methodology is an easy way to go.

(Of course, ontologies have some testing built in, e.g., checks for consistency and class satisfiability. So it’s not all that dire.)

It is, however, not clear that unit tests, per se, make sense for ontologies, and even less clear that the “test first” methodology associated with them is appropriate. The main issue that cropped up for me is what is the unit we should test? The Art of Unit Testing says that a unit is a function or method. The Wikipedia article says that a unit is “the smallest testable part of an application.” These definitions seem to line up. A class or module is too big for focused tests and a line of code is too small to test at all, but a function or method is just right. Or right enough.

What’s the smallest testable part of an ontology? I don’t know! Perhaps an entailment? But entailments seem more like program outputs or behaviors. A term? (In OWLspeak, an entity?) I don’t know how testable terms are. We do test classes for satisfiability systematically. That’s good. Is there anything else to test about them?

I suppose we could test for key entailments, e.g., whether they are subsumed by something else. Of course, classification already “tests” for this. But really classification merely determines whether an atomic subsumption holds, for all possible atomic subsumptions. It doesn’t check for non-atomic subsumptions, nor does it throw a warning if a desired subsumption fails to hold or a desired non-subsumption falters.

So, perhaps we should write our desired subsumptions (and non-subsumptions, though we’ll want a short hand for that) separately, and match the results of classification with the “should subsume” and “should not subsumes”?

If this is true, then coverage is going to fly out the window, I think. It’s not just that we won’t, as a matter of course, not write enough tests, but that if we did, we’d wreck the ontology development process and its key advantages. Essentially, we’d destroy discoverability. We’d have to specify up front exactly all the things we generally hope the reasoner will tell us!

This isn’t to say that supporting unit tests of key entailments for key classes isn’t worth automating. After all, we do sanity check key subsumptions, just not in a very nice way (i.e., by remembering to inspect, or redoing queries in the DL Query tab in Protege 4). Having a format for tests and a nice test harness would allow us to repeat our testing more easily and to share them with other people.

The simplest format I can think of is just another OWL ontology where each test entailment is written as an axiom in that ontology with an annotation “should hold” or “should not hold”. The test runner simply loops over those axioms and checks to see whether they follow from the tested ontology.

Eventually, we may want to write more sophisticated tests using, e.g., SPARQL/OWL.

OWLED? No, *Show*!

October 23, 2009

OWLED 2009 has started and, for the first time in its history, I’m not there. Which is sad, because I really enjoy it. But, Zoe is releasing her new CD, Bonfires* (on which I appear! again!), and I’m attending the release concerts. Which are a total blast. We had the first last night and it rocked. There’s another tomorrow night in Takoma Park. It’s great to see Zoe with a backing band.

* Buy a copy and not only will you have an excellent and beautiful album, but I’ll be your friend!

OWL 2: A Medical Informatics Perspective

October 4, 2009

This is my second “publicity” article about OWL 2. The first didn’t really get into OWL 2 per se instead of OWL altogether. I’m trying to stay within 1000-1500 words. The first focused the Semantic Web angle whereas this one is focused on the use of OWL and ontology for bio-health applications.


Medicine has a huge and rapidly growing vocabulary even before you get to the complex scientific names. Is your pain shooting, or stabbing, or throbbing? Is your cough dry, wet, brassy, or barking? There is a huge difference between a “broken leg” which is a “greenstick fracture of the femur” and an “displaced transverse fracture of the patella”. There are dozens of brand names for plain aspirin, including, in some countries, the term “aspirin” itself. (To avoid registered trademarks entirely, you must say “acetylsalicylic acid”.)

Furthermore, there’s lots to know about each term and what you need to know varies by context. For example, acetylsalicylic acid is an analgesic (pain fighter), and an antipyretic (fever fighter), and a NSAID (that is, a non-steroidal anti-inflammatory drug, a kind of inflammation fighter).

There are a lot of concepts in medicine (analgesic, acetylsalicylic acid, pain) and even more terms (that is, “acetaminophen”, “paracetamol”, and “para-acetylaminophen” all are names for the same thing, Tylenol) corresponding to those concepts. When different care providers use different terms for the same thing there is the possibility for miscommunication. The more variation in terminology in health care records, the harder it is to analyze those records (for example, to monitor potential medical errors or to find candidates for clinical trials).

A core challenge in medical informatics is managing the huge, evolving terminologies that permeate all aspects of medicine. Most of these terminologies have a complex hierarchic structure, e.g., breast cancer is a kind of cancer is a kind of disease is a kind of pathology. The problem is that there are hundreds of thousands of terms in any reasonable terminology with a rats nest of connections between them. And these terminologies grow very fast. For example, The NCI Cancer Thesaurus grew from around 20,000 terms in 2004 to over 50,000 terms today. Each term corresponds to a potentially complex concept of specialized medical knowledge which is related to many more concepts in a variety of ways. Various problems emerge with manual curation of such terminologies: sometimes there are wrong connections between terms, or missing connections, or the text defining the term is out of date, confused, or just garbled. None of these errors are detectable from natural language or from simple, graph based representations of the terminology. These are semantic errors, that is, gaps between what the curators wrote down and what is true.

One way to improve terminology development is to write down the meanings of terms in a language that a program can understand. That way, we can run a program (an automated reasoner) to sanity check what we wrote and to find new connections that a person would recognize if they read all the definitions and didn’t get tired. Such a language is an ontology language and representations of a terminology with the definitions written so a program can reason with them is called an ontology. Ontologies and ontology languages have a rich history in computer science, artificial intelligence, and bio-medical informatics. A popular family of ontology languages are build on so-called description logics which allow people to reasonable express the definition of their concepts while still being amenable to state of the art automated reasoning techniques. Description logics form the basis of the standardized ontology language, OWL and its latest version, OWL 2.

The first version of OWL (the “Web Ontology Language”) was standardized by the W3C in 2004 and proved a rousing success in providing a common default language for ontology development. Key bio-medical ontologies, such as SNOMED-CT and the NCI thesaurus migrated to OWL and to an OWL based toolchain, allowing them to move from proprietory languages and their vender locked in toolchains.

In 2009, the W3C announced the finalization of the next generation of OWL, OWL 2. OWL 2 is based on continuing research from the Universities of Manchester and Oxford into all aspects of ontology engineering. Professor Ian Horrocks, whose early work in reasoning with description logics at Manchester made them a feasible technology, co-chaired the OWL 2 working group. [Ugh. it’s at this point I suck out 🙂 I hate writing people pimpage, even if true!]

OWL 2 addresses key expressive and computational limitations of OWL 1. By adding new constructs to the langauge, OWL 2 more directly supports medical applications. For example, so called “role chains” allow ontologists to express the connection between spatial relations and part-whole relations, e.g., that if a fracture is located on a bone which is part of a leg, that fracture is a fracture of that leg. General reasoning with such constructs in the presence of other OWL features was an open problem solved by Ian Horrocks and Prof. Uli Sattler (of Manchester).

OWL 2: A Semantic Web Pitch

October 4, 2009

OWL 2 is about to go to Recommendation. This is a PR moment and I want Manchester to take advantage of it. To that end, I’m trying to produce material that will be helpful for promoting OWL and Manchester. This is a general piece that tries to sell OWL 2 (and Manchester’s contribution) from a Semantic Web perspective. It’s a bit didactic, thus gets to OWL 2 fairly late in the game. Feedback is more than welcome, esp. if you are a lay reader. (It’s 1100 words.) I know it needs linkification, but I’m trying to pound out text that might go into press release type things, so linkfication is delayed. Volunteers welcomed!


The World Wide Web had humble beginnings: It was intended as a simple way to share fairly simple information (notes, room schedules, phone directories) within an organization (CERN) and perhaps between like minded organizations. It quickly out grew its humble beginnings to encompass the sharing of all sorts of information between all sorts of people and much more, besides. Today, people do much which feels far different than reading and publishing documents. They chat, play games, shop, or use progams such as word processors or spreadsheets.

The technology underlying the Web is recognizably descended from that of its salad days: The Hypertext Markup Language (HTML) is still the lingua franca of the Web; Cascading Style Sheets (CSS) allow designers to style HTML to make beautiful sites; and Javascript has come into its own as the key client side programming language for complex browser based applications. Aside from being (ever more) capable technical foundations for the dazzling Web sites and applications we see everywhere, the Web trio are both standardized (by the W3C) and, unlike competing technologies such as Flash and Silverlight, not controlled by a single vendor.

As amazing as the Web is, it remains, at heart, a Web primarily for end user consumption. HTML allows authors to describe the structure of a page in limited, mostly document oriented ways. HTML, of course, allows for hyperlinks and we need to look no further than Google to see how this human generated, human intended information can be exploited to the benefit of human readers: By interpreting links to a Web page as a vote for that page’s importance, the search engine can prioritize its results far better than any prior attempt.

The idea of the Web as a collection of documents has served us very well. But there are places where that idea creaks. Not everything published on the Web is published for humans first, last, and always, and somethings published for humans first also has a useful life as pure data. For example, while online maps are typically provided by one Web site (e.g., Google maps), people want to mix them with data from another. Whether one is plotting locations from Craig’s List’s real estate ads or finding where a photo from your last vacation was taken, the ability to get at the data of a Web page in a form amendable to programmatic manipulation is critical to making such “mash-up” robust, reliable, and easy to produce. The alternative, to wit, “scraping” data out of human oriented HTML is rather difficult and fraught with pitfalls. Essentially, the consuming program has to filter out the irrelevant parts of the page (ads, or narrative text), interpret the HTML as the sort of data in question, typically by reverse engineering the generating program. Since both the irrelevant parts of a page and the presentational structure of the data on the page tend to change a lot between pages and on the same page over time, the consuming program is faced with a Sisyphean task. Fortunately, there are several popular, well-supported formats for data exchange and web site publishers are increasingly socialized to provide “data” views of their websites.

Thus, we now have data “on” the Web, but this move does not fundamentally move us away from the Web as a collection of documents. Each piece of data is like a little document, and thus suffers the problems of data in HTML on the Web: The data aren’t hyperlinked (so the data are on the Web, not “of” the Web) and a programmer has to interpret the data in order to write a program that uses it sensibly.

There is an alternative conceptualization of the Web that aims to overcome these problems; that is, to make a true Web for programs that is on a par with the Web for people. This conceptualizaion is known as the Semantic Web. Like the Web, the Semantic Web has enabling technologies addressing the key goals of linking and meaning: The Resource Description Framework (RDF) is a “Web native” data model that incorporates hyperlinking deeply. The Web Ontology Language (OWL) is a “Web native” ontology language that extends RDF with the ability to write logic based descriptions of things so that an automated reasoning tool can draw conclusions about data incorporating these descriptions.

The W3C has just standardized a new version of OWL, OWL 2. OWL is based on a family of logics, so-called “description logics” which have played a prominent role in the field of knowledge representation, esp. bio-medical informatics, for over 30 years. The University of Manchester has played a key role in the development of these logics from the theory, to the implementation, to the application and in their standardization in the form of OWL and OWL 2. Prof. Uli Sattler, in collaboration with Prof. Ian Horrocks (while he was at the University of Manchester; he is now at Oxford University) designed the logic and reasoning techniques for the description logics underlying both OWL (the logic “SHOIQ”) and OWL 2 (the logic “SROIQ”). The difficulty in logic engineering for ontologies is allowing sufficient expressivity to be useful (e.g., so that modellers can say things like a foot has 5 toes which are part of it and it, in turn, is part of a leg) but where the reasoning procedure is computationally reasonable (so that we can write reasoners that can figure out that each toe is also part of a leg, and do so before the heat death of the universe). FaCT++, an OWL 2 reasoner from Manchester, is the practical realization of their design and can handle such enormous and complex ontologies as the Systematized Nomenclature of Medicine — Clinical Term (SNOMED-CT), a key component of many national clinical information management systems.

Trying to build something on the scale and nature of the Web, especially when the Web already exists, is a grand, perhaps grandiose, project. The Web, of course, grew more than was built and perforce so will the Semantic Web. Like an economy, the Web is the result of millions of people performing millions of interactions focused on their specific interests and needs. Similarly, languages like OWL 2 will succeed if they meet specific needs and do so well. As the bio-medical community standardizes on OWL 2 and pushes its boundaries, we can see in that microcosm what the Semantic Web might one day be like. And, unlike utopian fairy-tales, that microcosm can make a valuable direct contribution to human welfare.