Opened 4 months ago

Last modified 4 months ago

#1177 new Bug Report

Bad URI path resolution with internal ".." path segments

Reported by: kasei Owned by: David Robillard
Priority: major Component: Serd
Keywords: Cc:

Description

In working my way through an implementation of JSON-LD and using Serd's URI-related functions, I ran across an issue where URI resolution seems problematic. If there are ".." path components that appear after other (non-dot components), the removal of those segments doesn't happen:

$ cat test.ttl 
@base <http://example.org/foo/bar/> .
<./../../../useless/../../../still-root> <> "not ok" .
$ serdi test.ttl 
<http://example.org/useless/../../../still-root> <http://example.org/foo/bar/> "not ok" .

As a sanity check, this acts differently than rapper:

$ rapper -q -i turtle test.ttl 
<http://example.org/still-root> <http://example.org/foo/bar/> "not ok" .

Jena also produces the same output as rapper, though with a spurious warning (bug report filed here: https://issues.apache.org/jira/browse/JENA-1713):

$ riot -q test.ttl 
12:42:09 WARN  riot                 :: [line: 2, col: 1 ] Bad IRI: <http://example.org/still-root> Code: 8/NON_INITIAL_DOT_SEGMENT in PATH: The path contains a segment /../ not at the beginning of a relative reference, or it contains a /./ These should be removed.
<http://example.org/still-root> <http://example.org/foo/bar/> "not ok" .

Change History (6)

comment:1 Changed 4 months ago by David Robillard

This path just seems broken to me. I would expect an error if anything, except maybe not in lax mode. Why do you think that's spurious? Does the spec actually say this should resolve in that way?

In any case, this would require an option to enable for serdi to allocate and normalize URIs, since URI ref resolution there is normally done in-place.

comment:2 Changed 4 months ago by David Robillard

P.S. I started a json-ld branch a while ago that gets at least part of the way through the test suite, though it's a mess. I plan to rebase this on serd1 and continue it there at some point, serd0 will be EOL'ed soon.

comment:3 in reply to:  1 Changed 4 months ago by kasei

Replying to David Robillard:

This path just seems broken to me. I would expect an error if anything, except maybe not in lax mode. Why do you think that's spurious? Does the spec actually say this should resolve in that way?

My reading of the spec says it should resolve. And FWIW, the linked bug report for JENA has now been resolved to agree with that reading.

In any case, this would require an option to enable for serdi to allocate and normalize URIs, since URI ref resolution there is normally done in-place.

Couldn't this be done in-place as well? There's already a function that's doing dot removal. My impression was that it might be possible to update this function to handle the above case as well.

comment:4 Changed 4 months ago by David Robillard

Theoretically, I guess it probably could, but it would require a pretty drastic API change I think. This means there are any number of places where dots might need to be removed, so some sort of rope (linked list of slices) would be necessary.

It would probably be easier to just optionally pay the cost of normalizing it out of place somewhere. Particularly since 99.999999999% of cases are never going to care about "correctly" handling a case as nonsensical as this example (I can't see any real possibility of this resolving to something useful and intended). It would be useful for paths with several dot segments that actually make sense, though.

comment:5 Changed 4 months ago by David Robillard

Also, in what context are you actually encountering this? This would be relatively easy to fix in any of the functions that actually create URI nodes. The path used when streaming as in serdi just doesn't call those (it doesn't allocate at all, except on the built-in stack).

comment:6 in reply to:  5 Changed 4 months ago by kasei

Replying to David Robillard:

Also, in what context are you actually encountering this? This would be relatively easy to fix in any of the functions that actually create URI nodes. The path used when streaming as in serdi just doesn't call those (it doesn't allocate at all, except on the built-in stack).

In this case, the URIS appear in the JSON-LD 1.1 expansion test suite. For example: https://github.com/w3c/json-ld-api/blob/5457f25ee2d5ad7484444a2761f0be0dfcfba0a2/tests/expand/0029-in.jsonld

Note: See TracTickets for help on using tickets.