Jason Hunter On XQuery

Discussions

News: Jason Hunter On XQuery

  1. Jason Hunter On XQuery (32 messages)

    IDN spoke with TSS presenter Jason Hunter to find out how XQuery's ability to provide smarter search functions may offer devs new career options, and even help publishers take eyeballs back from Google. IDN also previews Hunter's talk, where he will share technology insights, use cases and the career potential for developers who learn XQuery.
    “Java is great for crunching against relational databases, but with XQuery we’re not crunching against relational data: We are searching against an XML content store of articles, books and information,” Hunter said. “So, that’s a different problem, but with so many web sites out there that provide access to content stores – not just relational data – it’s a fun problem. And, there’s a lot of opportunity out there.”

    ...

    Hunter’s XQuery is all about getting at valuable content in a smart, simple, question-asking way: “Wouldn’t it be cool to just ask a question about something that is in your library and be able to get all the answers – right from the pages of those books?”

    Taking that vision – and not the traditional SQL-to-XML enterprise developer vision of XQuery – is the key to helping devs unlock their own vision of XQuery’s powerful possibilities. Hunter suggests devs simply ask themselves: ‘How valuable could a “smart search” feature be to them in their own lives?’

    “Imagine that you could search all the books on your wall for examples of how to use the ‘Shift’ function in Perl." Hunter then suggests, “With Google you can get results that say ‘Here are some results with the word shift and the word Perl is near it. Is that good enough?’ Well, frankly, that isn’t good enough, so you want some way to get more intelligent results. And that’s what XQuery can do.”

    "XQuery is more declarative than Java. Because XQuery was born of the XML world, you don’t deal with objects (like DOM and JDOM), and you don’t deal with events (like SAX), and it’s not like Strings.” The way to think about XQuery, Hunter suggests is simply to think about being able to do queries against a repository, which in this case is a content repository – not a relational database.

    During his session at The Server Side symposium, Hunter will walk Java developers through the basics of XQuery and highlight some use cases. Notably, Hunter will provide a great on-ramp for helping Java devs think about XQuery as a useful server side technology.

    Threaded Messages (32)

  2. XQuery is great[ Go to top ]

    But just wait to the whining about 'its better to do everything in java' and 'ooh, language paradigm shifts hurt my head' :-)
  3. Whoa![ Go to top ]

    Interesting thoughts! I agree wholeheartedly with what he's saying.

    Except replace XQuery with SPARQL.

    Oh, and replace XML with RDF.

    But other than that I agree with what he's saying! Definitely. Interesting times ahead.
  4. Whoa![ Go to top ]

    Interesting thoughts! I agree wholeheartedly with what he's saying.
    Except replace XQuery with SPARQL.
    Oh, and replace XML with RDF.
    But other than that I agree with what he's saying! Definitely. Interesting times ahead.

    Couldn't resist making a comment about RDF and SPARQL. RDF is a joke and basically pointless. SPARQL is an ugly hack of RuleML. Even though the RuleML group and many users recommend/asked for LISP/CLIPS style non-monotonic rule language, what did W3C produce? A huge pile of fertilizer. I don't like XQuery, but compared to RDFQuery and SPARQL, XQuery looks great. I'm totally bias, but I just don't get the point of XQuery. Look at what has been built around RDF to date and most of it is of poor quality including JENA2 and various RDF rule engines that blow.

    let the flames begin.

    peter
  5. Yes!!![ Go to top ]

    RDF is a joke and basically pointless. SPARQL is an ugly hack of RuleML.
    I agree!!! RDF is terribly funny! Hilarious even. I mean, look at something like this:
    http://simile.mit.edu/solvent/screencasts/solvent_screencast.swf

    .. and tell me you don't laugh yer buns off! I mean, what a frickin' waste of time, being able to just clickety-click on a page, get RDF data from that, and then (get this junk!) visualize it using Google Maps. And then, to think, that one would be able to take that data and put into some damn repository that noone uses and create SPARQL queries that relate it to god-knows-what, probably something entirely insignificant like Amazon or eBay or whatever. I mean, WTF were they thinking!??!?
  6. Yes!!![ Go to top ]

    RDF is a joke and basically pointless. SPARQL is an ugly hack of RuleML.
    I agree!!! RDF is terribly funny! Hilarious even. I mean, look at something like this:http://simile.mit.edu/solvent/screencasts/solvent_screencast.swf.. and tell me you don't laugh yer buns off! I mean, what a frickin' waste of time, being able to just clickety-click on a page, get RDF data from that, and then (get this junk!) visualize it using Google Maps. And then, to think, that one would be able to take that data and put into some damn repository that noone uses and create SPARQL queries that relate it to god-knows-what, probably something entirely insignificant like Amazon or eBay or whatever.

    I mean, WTF were they thinking!??!?

    http://www.w3.org/TR/rdf-sparql-query/
    http://www.w3.org/TR/xquery/

    Having read both xquery and RDF specs in the past, xquery makes much more sense. My impression is the RDF semantic web people are doing lots of hand waving, but ultimately someone else will figure out a way to link meta-data with data so that it's easily searched and used. Oh wait, Google has already done that.

    I'm sure someone out there will disagree and say RDF spec makes more sense and the whole RDF graph thing is great. That is until they see RDF graph is just recycle function dependency grammar ideas, but wrapped in XML.

    peter
  7. Cool![ Go to top ]

    My impression is the RDF semantic web people are doing lots of hand waving, but ultimately someone else will figure out a way to link meta-data with data so that it's easily searched and used. Oh wait, Google has already done that.
    Cool! Where can I read about how Google has linked meta-data with data so that it's easily searched and used? I only know of the free-text search they have. If they have a semantic search engine thing as well, that would be excellent.
  8. Cool![ Go to top ]

    My impression is the RDF semantic web people are doing lots of hand waving, but ultimately someone else will figure out a way to link meta-data with data so that it's easily searched and used. Oh wait, Google has already done that.

    Cool! Where can I read about how Google has linked meta-data with data so that it's easily searched and used? I only know of the free-text search they have. If they have a semantic search engine thing as well, that would be excellent.

    I seriously doubt Google would ever open their Metadata stuff to the public. Some semantic web researchers have demonstrated it's feasible to use Google as a black box semantic database. I'm too lazy to look up the link, but I'm sure Google will find it.

    peter
  9. Cool![ Go to top ]

    I seriously doubt Google would ever open their Metadata stuff to the public. Some semantic web researchers have demonstrated it's feasible to use Google as a black box semantic database. I'm too lazy to look up the link, but I'm sure Google will find it.peter
    What I would be fascinated by is understanding where Google got all their semantic information from. I mean, if all they do is spider the web, then how on earth are they going to be able to make sense of that without humans tagging what information means what!? That would be some serious magic!

    From your answer it doesn't seem like they have done so, even though your initial response hinted at it. Oh well.
  10. Cool![ Go to top ]

    I seriously doubt Google would ever open their Metadata stuff to the public. Some semantic web researchers have demonstrated it's feasible to use Google as a black box semantic database. I'm too lazy to look up the link, but I'm sure Google will find it.peter

    What I would be fascinated by is understanding where Google got all their semantic information from. I mean, if all they do is spider the web, then how on earth are they going to be able to make sense of that without humans tagging what information means what!? That would be some serious magic!From your answer it doesn't seem like they have done so, even though your initial response hinted at it. Oh well.

    Links can be considered a form of metadata. Clearly Google has a semantic system built into their search architecture. If they didn't, doing a search for the phrase "apple orchard" would result in poor results. Looking at the RDF specification and the first example.

    Subject (Resource) = http://www.w3.org/Home/Lassila
    Predicate (Property) = Creator
    Object (literal) = "Ora Lassila"

    Say I query for all pages written by "Ora Lassila" using XQuery, chances are it will return any page with that key word. In the RDF model, the pages written by Ora Lassila would be tagged with the appropriate metadata. That approach is fine, but what if a person posts to forums, mailing lists and other places? I would argue an approach that requires explicit tagging to be of limited value. If I want to find all public writings by a given individual, I want results that include all forms. Chances are, I've already found the person's books, blogs, etc, so what I want to find is content by the person, but not tagged as such.

    XQuery keeps it's focus and is useful, even if I don't like the syntax. RDF is overly broad and rather unfocused. It also suffers from some form of pathology, which doctors haven't been able to identify.

    I genuinely like the idea of having metadata infrastructure, but the explicit approach proposed by W3C is flawed. The idea of a "closed world" is nice as an academic research project, but totally impractical outside neat research environment.

    peter
  11. Cool![ Go to top ]

    I genuinely like the idea of having metadata infrastructure, but the explicit approach proposed by W3C is flawed. The idea of a "closed world" is nice as an academic research project, but totally impractical outside neat research environment.

    RDF is not founded on the closed-world assumption <
  12. Cool![ Go to top ]

    Oops, the link didn't come through. See here.
  13. it's just lip service from W3c[ Go to top ]

    Oops, the link didn't come through. See here.

    If memory serves me correctly, the RDF spec people made that change after a couple years of people badgering W3C over the original closed world assumption. The problem is this, the spec states it people shouldn't assumed a closed world, but the design of RDF is still the same. So just because W3C pays lip service to major criticism from the public doesn't mean that RDF is built on open world assumptions. I could be totally wrong, since I stopped following the progress back in early 2005. Alot could have changed since then, but as far back as 2002, W3C was adamant on the closed world assumption.

    peter
  14. Perhaps your memory isn't serving you correctly. I've followed RDF and related stuff for the last 4 years or so, and I have never heard anything but that RDF assumes an open world. Perhaps it was not once the case (I don't know), but it certainly has been for a long time, and is fundamental to all of the W3C RDF-type activities.

    (The first version of that document (August, 2002) I referenced is here, and it has the exact same wording.)
  15. Have you tried CWM[ Go to top ]

    Perhaps your memory isn't serving you correctly. I've followed RDF and related stuff for the last 4 years or so, and I have never heard anything but that RDF assumes an open world. Perhaps it was not once the case (I don't know), but it certainly has been for a long time, and is fundamental to all of the W3C RDF-type activities.

    (The first version of that document (August, 2002) I referenced is here, and it has the exact same wording.)

    I started following since 2K and gave up in 2003 and one of the original assumption was based on CWM (closed world machine). Many of the RDF engines refer to CWM, including pychinko and jena. I know some of the people involved in the effort since 2K and that is not my impression. If anything, my impression of W3C RDF team is they don't want to listen to the others. The founders of RuleML offered to donate their work to RDF, but RDF rejected in favor of RDF rules. Most of this is in the mailing list for people to see.

    I do hope a semantic web is built in the future, I just disagree that RDF approach is right or even feasible. I also don't like the W3C definition of semantic web. Any semantic web approach that requires users to explicitly tag everything is going to have huge hurdles.

    peter
  16. Have you tried CWM[ Go to top ]

    Yeah, I've tried CWM a little, but not for a while.

    Despite the name, it does not assume a closed world. It's probably a confusing name, but here is Tim Berners-Lee's explanation.
    one of the original assumption was based on CWM (closed world machine). Many of the RDF engines refer to CWM, including pychinko and jena.

    You're confusing the "open world assumption", which is fundamental to RDF, OWL, and related recommendations, with CWM's name, which is just the name of one of the early reasoning engines (which despite the name, also assumes an open world). When other apps/frameworks like Jena refer to CWM, they're just referring to the app, and nothing else.

    I agree that forced manual tagging is doomed to failure. That is why it is so important to have the tools people use to create web resources do the 'tagging'.

    Anyway, we're way off-topic, so I'll leave it at that.
  17. Think again[ Go to top ]

    Yeah, I've tried CWM a little, but not for a while. Despite the name, it does not assume a closed world. It's probably a confusing name, but here is Tim Berners-Lee's explanation.
    one of the original assumption was based on CWM (closed world machine). Many of the RDF engines refer to CWM, including pychinko and jena.

    You're confusing the "open world assumption", which is fundamental to RDF, OWL, and related recommendations, with CWM's name, which is just the name of one of the early reasoning engines (which despite the name, also assumes an open world). When other apps/frameworks like Jena refer to CWM, they're just referring to the app, and nothing else.

    I agree that forced manual tagging is doomed to failure. That is why it is so important to have the tools people use to create web resources do the 'tagging'.

    Anyway, we're way off-topic, so I'll leave it at that.

    I don't claim to understand RDF, but my understanding is based on RDF specification. RDF spec clearly states it is based on Monotonoic reasoning. Therefore in my mind, RDF is based on closed world assumptions. Here is a short paragraph on monotonic vs non-monotonic:
    Traditional logics are based on deduction, a method of exact inference with the advantage that its conclusions are exact - there is no possibility of mistake if the rules are followed exactly. Deduction requires that information be complete, precise, and consistent. By contrast, the real world requires common sense reasoning in the face of incomplete, inexact, and potentially inconsistent information.

    A logic is monotonic if the truth of a proposition does not change when new information (axioms) are added to the system. In contrast, a logic is non-monotonic if the truth of a proposition may change when new information (axioms) is added to or old information is deleted from the system.

    I pasted it from this page http://cs.wwc.edu/~aabyan/Logic/Nonmonotonic.html. This is what I meant when I said, W3C is paying lip service to open world assumptions. Perhaps your definition of open vs closed world is different. My understanding is based on Logic and AI theory.

    peter
  18. Think again[ Go to top ]

    RDF spec clearly states it is based on Monotonoic reasoning. Therefore in my mind, RDF is based on closed world assumptions.

    That may be true in your mind, but it doesn't follow. More importantly, it isn't what you originally said:
    ...but the explicit approach proposed by W3C is flawed. The idea of a "closed world" is nice as an academic research project, but totally impractical outside neat research environment.

    A reasonable interpretation of your words: W3C approach is based on the assumption of a "closed world", which is fine as an academic research project, but totally impractical in the real world.

    Like I said, the approach of W3C is not in any way based on the assumption of a closed world, as you said. That is all I was correcting you on originally, so that others wouldn't take it as truth. You've since redefined what you were talking about several times, including pointing to an application that happened to have "closed world" in its name as if that were evidence of something.

    And now you're saying you were really talking about monotonicity all along, although you just mentioned it for the first time.

    If that were the case, and your beef is that RDF/semweb isn't based on a non-monotonic logic, why didn't you say that? Instead you implied that the W3C semantic web activity is founded on something it's not, and that it is based on assumptions (CWA) that are academic and not suited to the real world.

    Here is a concise definition of "closed world assumption" from a reputable knowledge representation book (Knowledge Representation and Reasoning (The Morgan Kaufmann Series in Artificial Intelligence), by Ronald Brachman, Hector Levesque):
    Unless an atomic sentence is known to be true, it can be assumed to be false [p. 210].

    These guys are very well known in the "Logic and AI theory" community. If you have some alternate definition of "closed world assumption" from a reputable source that says it is merely a synonym for a monotonic logic system, I'll stand corrected.

    There is enough sem-web FUD out there already without having TSS people think that the semantic web vision of W3C is "anything not explicitly true by proof of already known facts is assumed false", which is what you said.
  19. Think again[ Go to top ]

    RDF spec clearly states it is based on Monotonoic reasoning. Therefore in my mind, RDF is based on closed world assumptions.

    That may be true in your mind, but it doesn't follow. More importantly, it isn't what you originally said:
    ...but the explicit approach proposed by W3C is flawed. The idea of a "closed world" is nice as an academic research project, but totally impractical outside neat research environment.

    A reasonable interpretation of your words: W3C approach is based on the assumption of a "closed world", which is fine as an academic research project, but totally impractical in the real world.

    Like I said, the approach of W3C is not in any way based on the assumption of a closed world, as you said. That is all I was correcting you on originally, so that others wouldn't take it as truth. You've since redefined what you were talking about several times, including pointing to an application that happened to have "closed world" in its name as if that were evidence of something.

    And now you're saying you were really talking about monotonicity all along, although you just mentioned it for the first time.

    If that were the case, and your beef is that RDF/semweb isn't based on a non-monotonic logic, why didn't you say that? Instead you implied that the W3C semantic web activity is founded on something it's not, and that it is based on assumptions (CWA) that are academic and not suited to the real world.

    Here is a concise definition of "closed world assumption" from a reputable knowledge representation book (Knowledge Representation and Reasoning (The Morgan Kaufmann Series in Artificial Intelligence), by Ronald Brachman, Hector Levesque):
    Unless an atomic sentence is known to be true, it can be assumed to be false [p. 210].

    These guys are very well known in the "Logic and AI theory" community. If you have some alternate definition of "closed world assumption" from a reputable source that says it is merely a synonym for a monotonic logic system, I'll stand corrected.

    There is enough sem-web FUD out there already without having TSS people think that the semantic web vision of W3C is "anything not explicitly true by proof of already known facts is assumed false", which is what you said.

    my own fault for being unclear. I'm frequently guilty of that. I was thinking of monotonic reasoning the whole time. I've had these debates with RDF people over the last 4 years, so I stupidly assumed others would think of monotonic reasoning.

    The statement about a sentence being true is also commonly referred to as "negation as failure", which I believe RDF does not support. I know first hand many people tried to convince RDF lead to use non-monotonic reasoning, but they were unsuccessful. Numerous people have complained RDF does not support negation as failure. The commonRules website has a good definition http://www.alphaworks.ibm.com/tech/commonrules/faq.

    I'll glady admit I'm wrong, if you can show me where the spec explicitly address "negation as failure". For example, there was a working group for RDF query and rules, which proposed negation as failure (http://www.w3.org/2001/11/13-RDF-Query-Rules/terms#what), but I don't believe it ever made it into the official spec. I tried searching the official RDF spec when it was released, but did not see it. I could have missed it, since I make mistakes frequently. I've probably read the RDF spec over 20 times and I still don't understand 70% of it.

    I doubt anyone outside of the RDF working group or RDF inner circle understands the entire thing. I'm completely bias, so my opinions are of little value to RDF supporters.

    peter
  20. Think again[ Go to top ]

    Now that I have a better idea what you don't like about rdf, I don't know that I definitely disagree with you (about whether rdf should have been non-monotonic). My gut instinct though is that nonmonotonic reasoning belongs somewhere higher on Tim Berners'Lee's famous "layer cake". Rdf is just one of the low-level building blocks. The fact that it can't be expressed in rdf doesn't mean that a higher-level language can't represent such reasoning, and it would probably still be expressed in rdf but just not accessible to the rdf processor (only to the processor for the higher level).

    By the way, if you haven't seen this collection of quotes and links about monotonic/nonmonotonic reasoning and the semantic web before, I think you'll find it pretty interesting: http://robustai.net/papers/Monotonic_Reasoning_on_the_Semantic_Web.html I came across it yesterday while doing some reading and have found some fascinating stuff in there.

    Regarding negation as failure, I think we're in agreement there. That's a standard nonmonotonic 'default'-type feature, and I wouldn't expect to see it in rdf, which is pretty strongly based on classical logic. I haven't ever looked at "rdf-query-rules" that you referenced, to tell you the truth, so I can't really comment on that.

    Anyway, thanks for the discussion.

    p.s. Definitely nobody but the inner circle understands all the rdf specs now (especially the semantics one http://www.w3.org/TR/rdf-mt/ ;)).
  21. recent presentation on RDF flaws[ Go to top ]

    In case you haven't seen this presentation.

    http://www.ruleml.org/events/ruleswsw3c/RDF_RuleML_Interoperability_Talk.ppt

    or Google html version of the same powerpoint

    http://72.14.203.104/search?q=cache:DLDCERXNB88J:www.ruleml.org/events/ruleswsw3c/RDF_RuleML_Interoperability_Talk.ppt+does+RDF+support+failure+as+negation&hl=en&gl=us&ct=clnk&cd=6

    I've never taken a CS course in my life, so I tend to not use formal terminology. In my mind, a robust semantic framework needs to handle both strong negation and negation as failure and combination of the two.

    I can see plenty of cases where one would want to scan the content and deciding how reliable it is. And other cases where a system may prefer to rate the reliability of the content based on the author or URI. Then there's cases where mixed mode is desirable. I definitely have no clue what the solution is, but from my limited understanding of RDF, it sure doesn't look like a feasible solution to me.

    peter
  22. recent presentation on RDF flaws[ Go to top ]

    I totally agree with you about a robust framework needing to handle both kinds of negation. I don't think rdf itself was ever intended as the full framework though. There is owl on top of rdf(s), which adds more expressiveness (some form of negation at least, but not negation as failure still), and there are nonmonotonic extensions that are being proposed as extensions to owl (http://www.mindswap.org/2005/OWLWorkshop/, and see the first link at http://www.google.com/search?q=nonmonotonic+owl). Maybe we'll get there yet. Sometimes it makes more sense to take many small steps rather than a few large steps, and the layered approach seems to be working in terms of adoption and buy-in to the lower levels (which would certainly have been less likely to happen if they were more complex).

    Thanks for the presentation link. I've been meaning to look at stuff like RuleML and SWRL for a while. My experience is more with rdf/owl as a rich data model rather than the rules and reasoning side of things, but I'd like to learn more.
  23. recent presentation on RDF flaws[ Go to top ]

    I agree with many of your points, but here is the thing that confuses and irritates me. Many people debating RDF have different definitions of monotonic/non-monotonic logic within the RDF stack. I have a decent understanding of rule engines, pattern matching and inference engines and I can't make heads or tails.

    Thanks for pointing out the monotonic reasoning link, I have seen it before, but it just makes me think even between those contributing to the spec have different perspective. I read over the old page with Tim B Lee's quotes about semantic web and I'm left more confused. He says the internet is an open world and therefore semantic web is an open world assumption by default. But the primary mode of reasoning is monotonic. My train of thought is this.

    1. if I can retract a fact or conclusion, that means I have to trust the initial deduction, or I have to have all the facts at the beginning.

    2. given that I can't have all the facts, chances are the result of the reasoning is wrong.

    3. if the system crawls/searches the web for a specific fact, how can I resolve inconsistencies as sir TBL states in one of his famous quotes.

    4. if a new fact counters the previous RDF Query results, the system cannot resolve the incosistency. If that is the case, it is a closed system and inflexible.

    Depending on which papers one reads about RDF, the view turns out quite different. I agree that one could extend RDF Query and RDF rules, so they support non-monotonic reasoning. My question is this, "why force everyone to extend RDF to wedge non-monotonic reasoning into a system and therefore create incompatabilities?"

    There are parts of RDF that are good. For example, I like the tagging part and having an URI/IRI to indicate the producer of the content, but beyond that, I don't like the RDF Graph approach because it's too rigid. Plus, 20 years of research into grammar based NLP shows it's too computationally intensive and very difficult to scale. In contrast, probablisitic approaches to NLP work better. My thought is that RDF should take a probablistic approach to semantic web, because although one can't trust the content to be truthful, one can "rate" the useful of the data and arrive at some useful result. Hopefully, one day, someone else will produce a framework for semantic web that really is useful and simple enough that a decent programmer can understand it without spending 5 years.

    thanks for humoring my troll and debating RDF with me. It inspired me to re-read a lot of old stuff and parts of the spec.
    peter
  24. recent presentation on RDF flaws[ Go to top ]

    Hi Peter,

    I guess we took over this thread ;).

    Just a few quick points.
    I read over the old page with Tim B Lee's quotes about semantic web and I'm left more confused. He says the internet is an open world and therefore semantic web is an open world assumption by default. But the primary mode of reasoning is monotonic.

    As I understand this, the reasoning (no pun intended) is as follows: the fact that it is monotonic and open-world by assumption means that they you cannot always get an answer to your query. If you always got an answer, even when there wasn't sufficient information to know for sure (presumably using some kind of default or negation as failure type mechanism), then it would be nonmonotonic, because stuff that you learn later could force a change to something you had earlier derived. Being monotonic means that that you can't really derive something until it is totally absolutely derivable, and so there is no way to change that later (unless you reverse an assertion, which would be a contradiction already unless you removed the prior contradictory assertion, in which case there wouldn't be a problem). Taking a standard rdbms, for example, it is monotonic because it is a closed world, so it can assume that if something can't be proven true then it can be considered false. On the web however,we can't assume that, because information is necessarily incomplete -- the resource that has the link in the chain you need to establish something might be down for maintainance when you need it. So on the web, the choice is between giving up monotonicity, in which case you can always get a answer but it might be different tomorrow or the next day (and that has some big problems; I at least want to be able to specify that some reasoning has to be as if an open world, and don't tell me false unless it's really really false), or maintaining monotonicity, in which case you don't always get an answer but don't have to worry about the answers that you do get changing.
    My train of thought is this.

    1. if I can retract a fact or conclusion, that means I have to trust the initial deduction, or I have to have all the facts at the beginning.

    2. given that I can't have all the facts, chances are the result of the reasoning is wrong.

    3. if the system crawls/searches the web for a specific fact, how can I resolve inconsistencies as sir TBL states in one of his famous quotes.

    4. if a new fact counters the previous RDF Query results, the system cannot resolve the incosistency. If that is the case, it is a closed system and inflexible.

    I'm not totally sure I understand your example, but wrt 1, I don't think you can retract a fact. Wouldn't retraction be for the nonmonotonic cases where you have made a conclusion by some default rule, or concluded something was false because you couldn't prove it true? If you mean remove a real assertion (rather than something changing due to new information that allows an actual derivation and not conclusion by default or NAF), then I'm not sure I see the problem.

    I think the TBL quote you're thinking of may be <http://mydomainname#calvin... was born in one year, and doc a says the same individual was born in a different year (same calendar system, etc.). What can a nonmonotonic system do with that? Can it do anything other than say I have inconsistent data and can't make any conclusions, which I assume the monotonic system would detect and report too. At least with a monotonic system though, you don't constantly have to be rechecking your prior conclusions based on addition of new data, and when some new information comes in that allows you conclude something new, you're actually concluding something new rather than reversing a prior conclusion. I would much rather not have an answer to a question that has an unchanging answer than have a system tell me 2 different things on different days. There are cases where the other alternative is desirable though, so it all depends. I guess if anything, the monotonic version just seems safer and more conservative, which seems like a good thing to me for a building block technology (rather than a complete and ready for everything technology).
    Depending on which papers one reads about RDF, the view turns out quite different. I agree that one could extend RDF Query and RDF rules, so they support non-monotonic reasoning. My question is this, "why force everyone to extend RDF to wedge non-monotonic reasoning into a system and therefore create incompatabilities?"

    Yeah, it does seems like there is a lot of confusion surrounding RDF, and there are so many specs (original 2, now 6, plus ongoing stuff for rules). But anyway, to play devil's advocate for a moment to your question, an alternate viewpoint might rephrase your question as "why force in to the lower level of the stack complexities that aren't relevant for many use cases and that can be expressed at higher levels"? It seems like everybody agrees that nonmonotonic reasoning is required for some use cases, but there is just a different opinion on how fundamental those use cases are. The production rules and rules engine people think it's fundamental, because it is for them, and the knowledge representation people (not sure what the best label is for that other side) think it's not fundamental, or is but can be handled adequately at a higher level, which keeps the lower levels simpler, which is good for everybody.
    There are parts of RDF that are good. For example, I like the tagging part and having an URI/IRI to indicate the producer of the content, but beyond that, I don't like the RDF Graph approach because it's too rigid. Plus, 20 years of research into grammar based NLP shows it's too computationally intensive and very difficult to scale. In contrast, probablisitic approaches to NLP work better. My thought is that RDF should take a probablistic approach to semantic web, because although one can't trust the content to be truthful, one can "rate" the useful of the data and arrive at some useful result. Hopefully, one day, someone else will produce a framework for semantic web that really is useful and simple enough that a decent programmer can understand it without spending 5 years.

    Do you have any references about the general approach not scaling? I ask because my understanding was that OWL-DL, which is the OWL dialect that gets the most action both from users and implementers, was chosen specifically because the computational costs were very well understood, and they had precise knowledge of what the exact tradeoffs were between various features of Description Logics (the DL in OWL-DL). That research goes back into the 80's, from my recent reading (in that book I referenced earlier), with the KL-ONE system in '85, and there has been tons of research since. My understanding is that there is huge amount of knowledge now about the computational complexity of lots of different dl-variants, and that it was essentially a solved problem for dl-expressiveness with guarantees about complexity and thus scalability. Perhaps the types of use cases that you're interesting in though are not the ones that the description logics are best suited for, which sort of brings us back to the very beginning of our conversation.
    thanks for humoring my troll and debating RDF with me. It inspired me to re-read a lot of old stuff and parts of the spec.


    Thanks for the discussion. I've gotten lots of food for thought and stuff to read and follow up on, and putting things into words has helped clarify my thinking a little and also realize how unclear it all is to me.

    -calvin
  25. oops typo[ Go to top ]

    I just noticed a typo. I meant to say "if I can't retract a fact."

    From my very limited understanding, even within the knowledge base camp, the use of monotonic vs non-monotonic isn't clear. My impression is there's still a lot of debate. I can definitely see business cases where a closed world is the right choice. Searching the internet using a semantic web agent on the otherhand, I can see cases where one would want a result, even if it's likely it will change in the future. It's definitley not an easy problem to solve.

    I used to have some papers around comparing grammar based parsing to statstics based NLP. To my naive eyes, generating a RDF Graph is identicle to producing a dependency grammar graph. Both attempt to take some data and define the relationship between object and subject. Stanford has a statistics based parser that out performs traditional grammar based parsers by a significant margin. Often my ramblings are rather incoherent, but I'll attempt to explain it as best as I can.

    My understanding is that knowledge is parsed and/or defined by an RDF Graph, which is then used by RDFQuery or RDFrule to derive some conclusion. Given the knowledge is converted to a graph, I question the accuracy of the graph produced and cost of producing it. For trivially simple examples, parsing knowledge into graph is probably simple, but parsing actual web content seems to be much harder. I could be totaly wrong, but it's the same thing as parsing a sentence to figure out what it means.

    If someone explicitly defines the semantic relationship as a graph using OWL or something else, then performance isn't an issue. In that case though, who is going to do the work to tag all the content at a low level? So whether it is done by hand, or automatically appears to hit a scalability limit pretty quickly. Assuming one uses OWL, and there's sufficiently detail OWL ontology, performance should be fine. I've looked at few of the OWL examples, which look trivially simple. Handling arbitrary content from the internet is clearly much harder.

    Again, I have no answers, since my knowledge of these domains if far too shallow. I really do hope a semantic web becomes reality. If RDF manages to prove what isn't feasible, then it will have done a great service.

    peter
  26. oops typo[ Go to top ]

    My understanding is that knowledge is parsed and/or defined by an RDF Graph, which is then used by RDFQuery or RDFrule to derive some conclusion. Given the knowledge is converted to a graph, I question the accuracy of the graph produced and cost of producing it. For trivially simple examples, parsing knowledge into graph is probably simple, but parsing actual web content seems to be much harder. I could be totaly wrong, but it's the same thing as parsing a sentence to figure out what it means.

    Like said, the semanticWeb brings rule based programming to the web. Thus semweb implemetations use RuleEngines. Every information in a RDF Graph (even Shema) consists of triples (Subject, Property, Object) and this triples are serialised into the RuleEngine as Facts.

    Accesing Facts and Reasoning on Facts depends from the implementation. JESS for instance reuses concepts from the DB domain and supports hashes and various trees to access facts quickly. Also an efficient pattern matching (fact - rules) algorithm is very important for scalability.

    P.S.: I don´t contribute to JESS or anythink, I have just very positive experience with this ruleEngine.

    Take a closer look on RuleEngines: try Drools (object orientated and open source) and also Jess (trial version available). Also the Jess in Acion book contains some very nice introductions into the Topic.

    cheers,
    Andreas
  27. oops typo[ Go to top ]

    My understanding is that knowledge is parsed and/or defined by an RDF Graph, which is then used by RDFQuery or RDFrule to derive some conclusion. Given the knowledge is converted to a graph, I question the accuracy of the graph produced and cost of producing it. For trivially simple examples, parsing knowledge into graph is probably simple, but parsing actual web content seems to be much harder. I could be totaly wrong, but it's the same thing as parsing a sentence to figure out what it means.

    Like said, the semanticWeb brings rule based programming to the web. Thus semweb implemetations use RuleEngines. Every information in a RDF Graph (even Shema) consists of triples (Subject, Property, Object) and this triples are serialised into the RuleEngine as Facts.

    Accesing Facts and Reasoning on Facts depends from the implementation. JESS for instance reuses concepts from the DB domain and supports hashes and various trees to access facts quickly. Also an efficient pattern matching (fact - rules) algorithm is very important for scalability.

    P.S.: I don´t contribute to JESS or anythink, I have just very positive experience with this ruleEngine.

    Take a closer look on RuleEngines: try Drools (object orientated and open source) and also Jess (trial version available). Also the Jess in Acion book contains some very nice introductions into the Topic.

    cheers,
    Andreas

    My definition of semantic web is a bit different than just adding rule based programming to the web. this isn't the W3C definition, but it's what I would like to see. A semantic web should provide the facilities to reason over unstructure data on the web in a way that facilitates intelligent searches. By intelligent, I mean this. Say I search for "recipes with chicken, peppers, tomatoes, basil and salt" the agent needs send requests to search engines and then scan the results to produce a subset of URL's that are most promising.

    This means that if a recipe uses pork, but chicken is a suitable substitute, I'd like the result included. Asserting a know fact into a rule engine is the easy part. The hard part is figure out how to reason over the facts with rules and how those rules match on patterns. In the recipe search example, the agent needs to look at my search parameters and match them against the results.

    I've been using JESS since 2K and the new drools3 core is based on a clean RETE implementation I donated last year. JESS in action by ernest is an excellent book and I've communicated with ernest in the past. There are RDF and OWL tools for JESS, so I'm aware of the various tool for semantic web.

    When the facts are well defined, asserting the facts and arriving at results isn't all that hard. What semantic web is attempting do is much larger than just applying rule technology to the web. that's my bias perspective.

    peter
  28. oops typo[ Go to top ]

    My definition of semantic web is a bit different than just adding rule based programming to the web. this isn't the W3C definition, but it's what I would like to see.

    True, the definition of the W3C goes a step forther. Having intelligent agents and the so called of "web of data" where everythink is glued like magic, requires a fully functional SemanticWeb Stack (RDF -> OWL -> Custom Rules -> Trust/Validation).

    Agend computing, repositories and data integrations are nothing new at all. New is the ability to exchange Rules (Semantic) on Models, in a readable, simple and appliable way.

    Today if you give someone a peace of (classical) XML the reciever has to implement all the semantics of the document himself. Take an Ant file for instance (a more EE example :-) ), where you have depencies between taks. If you would like to list all depencies of a task you will need to travel all the way through the depency properties and save everythink into a List.

    Using a semWeb enabled Concept you would not need to implement this algorithm yourself. The depency would be specified as beeing Transitiv (using OWL). Quering the model with enabled inferecing, you would get what you want.

    Anyway there is a lot to do. RDF and OWL offer some usefull semantics but the crucial part is Custom Rules (over RuleML or SRWL).

    best regards,
    Andreas
  29. Rule interchange is new[ Go to top ]

    I agree that rule interchange is new, but I'm not the meaning of this statement.
    Using a semWeb enabled Concept you would not need to implement this algorithm yourself. The depency would be specified as beeing Transitiv (using OWL). Quering the model with enabled inferecing, you would get what you want.

    By algorithm, are you referring to the semantic relationship defined with OWL, or is it a combination of RDF Schema + OWL?

    Say I define the object model in RDF Schema for recipes. Then I define categories of cuisine, techniques and regions in OWL. With these two pieces data, I should be able to "just query" for recipes with certain ingredients. Depending on the granularity of the RDF description, querying for all recipes with "chicken, tomatoes and basil" may not return any results. Say for example, the recipes are tagged with just author and cuisine on one site. On a different site, the ingredient list is tagged along with the quantity.

    Now, say I write a rule that says, "if the recipe explicitly lists chicken, or contains phrases with "chicken" preceeded by a quantity term (aka cup, pound, lb, ounces, kg), then add the recipe to the list."

    So in this case, having facilities to deal with strong negation and failure as negation allows the software to return a larger set of results. I can then pass the results with low probability rating through additional rules.

    That's a pretty straight forward and simple case. In reality, sites like foodtv do not tag their content with RDF. The average user posting a recipe to a forum or mailing list isn't going to aggressively tag the posting, so chances are most of the data has to be parsed with some kind of NLP engine.

    OWL is a necessary part of the equation, but when one considers the bulk of the content on the WEB is unstructured, RDF does very little to address those issues.

    peter
  30. Keep it simple[ Go to top ]

    Thanks for pointing out the monotonic reasoning link, I have seen it before, but it just makes me think even between those contributing to the spec have different perspective. I read over the old page with Tim B Lee's quotes about semantic web and I'm left more confused. He says the internet is an open world and therefore semantic web is an open world assumption by default. But the primary mode of reasoning is monotonic. My train of thought is this.1. if I can retract a fact or conclusion, that means I have to trust the initial deduction, or I have to have all the facts at the beginning.2. given that I can't have all the facts, chances are the result of the reasoning is wrong.3. if the system crawls/searches the web for a specific fact, how can I resolve inconsistencies as sir TBL states in one of his famous quotes.4. if a new fact counters the previous RDF Query results, the system cannot resolve the incosistency. If that is the case, it is a closed system and inflexible.

    The SemanticWeb Stack has still a long way to go, but imho arguing about mon. vs non-mon. Reasoning is senseless.

    Both the facts and the rules will be non-monolithic / distributed. This is what the SemanticWeb is all about. Now, whether you face inconsistencies or not has absolutely nothing to do with theoretical discussions, but with implementations.

    Simply spoken, you dont need to think globaly if you like to see inconsistent usecases. Take the following (local) Knowledgebase:

    (a, isRelated, b)
    (b, isRelated, c)

    Now you apply the following rule on this facts:
    (X, isRelated , Y) and (Y, isRelated, Z) -> (X, isRelated, Z)

    and you get the following KB as a result:
    (a, isRelated, b)
    (b, isRelated, c)
    (a, isRelated, c) //<
    Now if you delete from this KB lets say Fact #2 you will have the following remained:

    (a, isRelated, b)
    (a, isRelated, c)

    Is it "reasonable" to assume that a isRelated to c ? No, because the required Fact#2 is missing.

    So, how can we avoid such problems ?? Like I said, it is a question of implementation / or usage of existing RuleEngines.

    1) save infered Facts in another Place
    2) if you add or delete new Facts (plain Facts) then you must reactivate your rules (semantics)

    to 1) This is a crucial requiment. If you store infered Knowledge in another place, you always know which is your original Knowledge. Important if you would like to apply a different ruleset on your original Knowledge. It would be a mess if it gets mixed up with previous deductions, which you dont need now.

    to 2) like showen in the example above removing and even adding new Facts can lead to new (or less) assertions. Thus you must refire your rules to get your state-of-the art knowledge.

    People will ask: Can this behaviour scale ?? And the answer would be the same like bevore. Take a close look at Databases and on db View´s for specific.

    I don't like the RDF Graph approach because it's too rigid.

    Applying RuleBased computing to the Internet = SemanticWeb.
    Having people linking sources, you always have a graph. You can even respresent local DB´s as a Graph. Representation and implementation are different shoes.

    best regards,
    Andreas
  31. RDF[ Go to top ]

    Whoever compares RDF with XML, hasn´t understood RDF at all.

    First of all, RDF has nothink to do with XML (directly). XML is only one possible serialization format for RDF.

    RDF is an abstract Model and its purpose is to provide an infrastructure for describing Rules on a defined Model. There are plenty of RDF (and SemWeb in general) Tutorials/Blueprints/Papers out there, so Im not gonna fight with you guys whether RDF is helpful or not. Take a closer look and try to understand what RDF tries to solve and judge for yourself. If you still think it is crap, then you are free to do so.

    Regarding current Implementations you are correct. Scalability and response times of current open source implementations is not enough, but it is save to say that it will improve in the near future. Commercial RuleEngines such as JESS are already very well optimized and successfully implemented in high scale szenarios.

    cheers,
    Andreas
  32. RDF[ Go to top ]

    Whoever compares RDF with XML, hasn´t understood RDF at all.

    First of all, RDF has nothink to do with XML (directly). XML is only one possible serialization format for RDF.

    RDF is an abstract Model and its purpose is to provide an infrastructure for describing Rules on a defined Model. There are plenty of RDF (and SemWeb in general) Tutorials/Blueprints/Papers out there, so Im not gonna fight with you guys whether RDF is helpful or not. Take a closer look and try to understand what RDF tries to solve and judge for yourself. If you still think it is crap, then you are free to do so.

    Regarding current Implementations you are correct. Scalability and response times of current open source implementations is not enough, but it is save to say that it will improve in the near future. Commercial RuleEngines such as JESS are already very well optimized and successfully implemented in high scale szenarios.

    cheers,
    Andreas

    Same old tired excuse for RDF. Yeah, RDF only uses XML for serialization. Isn't that true of most XML based langauges? RDF isn't really an abstract model. RDF is broken up into several pieces

    RDF schema - sorta like XML Schema, but some how we need another schema language.

    RDF graph - creating a graph of some data, which is "suppose" to tell a system what the data relates to. Basically tagging things with Subject-predicate-object.

    RDF Query or SPARQL - a rule/query language which allows a system to query against the schema and graphs.

    I haven't got the slightest clue what RDF is about, but one shouldn't need to step into a reality distortion field to understand that RDF is combination of Knowledgebase reason + functional dependency grammar. Though I doubt the W3C RDF people will ever change their tune and listen to the public.

    My apologies for hijacking the thread. To keep things slightly on topic, I think Oracle 10g and SqlServer 2K5 might have some XQuery support. Oracle has xml like query since 8i and SqlServer 2K has sqlxml.

    peter
  33. What's new???[ Go to top ]

    I don't get why he's comparing XQuery to Java? It's like comparing SQL and/or some stored proceedure language to Java or any other general purpose programming language. Or course XQuery is better at querying XML, it was specifically designed to do so??? It's a domain specific language, design for a purpose.

    Ilya Sterin