sed between two patterns where result contains a third pattern


sed between two patterns where result contains a third pattern



I'm attempting to filter log files for xml responses, and using sed as below, it is fairly easy to find all the xml messages.


sed -n '/<element/,/</element/p' file



Returns:


<element>
<id>12345</id>
...
</element>
<element>
<id>54321</id>
...
</element>



However I have been unable to figure out how to apply a second filter which means only xml responses containing a certain pattern, such as an ID, are returned.



In the above example, how would I filter on ID to only return the first one?





Please add samples of input and sample of output in your post too.
– RavinderSingh13
Jul 2 at 13:34





What samples are needed?
– Andrew
Jul 2 at 13:34





Samples of your question input and output so that we will better understand your question then.
– RavinderSingh13
Jul 2 at 13:35





If that's really needed, then ok.
– Andrew
Jul 2 at 13:36




3 Answers
3



sed is for doing s/old/new/ THAT IS ALL. All of it's wacky single character runic language constructs became obsolete in the mid 1980s when awk was invented.


$ cat tst.awk
/<element>/ { inElt = 1 }
inElt {
elt = (elt == "" ? "" : elt ORS) $0
if ( /</element>/ ) {
if ( elt ~ /<id>12345</id>/ ) {
print elt
}
elt = ""
inElt = 0
}
next
}
{ print }

$ awk -f tst.awk file
<element>
<id>12345</id>
...
</element>



The main benefits of the above over the currently accepted sed solution are:


</element



For example, lets say you wanted to print the first element in the file regardless of it's ID rather than the one containing a specific ID. That'd be a trivial tweak of the above to:


$ cat tst.awk
/<element>/ { inElt = 1 }
inElt {
elt = (elt == "" ? "" : elt ORS) $0
if ( /</element>/ ) {
if ( ++cnt == 1 ) {
print elt
}
elt = ""
inElt = 0
}
next
}
{ print }

$ awk -f tst.awk file
<element>
<id>12345</id>
...
</element>



If you want to print the 27th instead of the 1st element, just change ++cnt == 1 to ++cnt == 27. Try modifying the sed script for such a trivial requirements change and you can look forward to a complete rewrite and having to invoke additional tools. Want to print multiple elements and/or other parts of the file not within element tags? Also absolutely trivial with awk. Hopefully you get the point.


++cnt == 1


++cnt == 27





Only quarrel with this solution is the hard coding of the id being searched for. Replacing elt ~ /<id>12345</id>/ with elt ~ idQuery, and altering the execution to awk -v idQuery="id>12345</id" -f tst.awk file means that it is more appropriate for scripting.
– Andrew
Jul 3 at 10:38





@Andrew I couldn't tell from the question if a specific id value was wanted, or the first element containing any id tag or just the first element no matter what it's contents so I just copied what the accepted sed script was doing so you could see the comparison.
– Ed Morton
Jul 3 at 11:56





Fair enough, thanks for the help.
– Andrew
Jul 3 at 12:30



You can group commands for your ranges:


sed -n '/<element/,/</element/{ /id/p }'



But, you should really consider using XML tools when dealing with XML, such as xmlstarlet.



In order to print the complete entry when searching for a specific ID, you need to accumulate the lines inside the <element> node using the hold space, if you reached the end tag of an <element> node, you can replace the hold and pattern spaces, match for your ID and print it:


<element>


<element>


sed -n -e '
/<element/,/</element/H # append to the hold space
/</element/{
g # replace pattern space with hold space
/<id>12345</id>/p # print if matching ID
s/.*// # clear pattern space
x # clear hold space
b # start next cycle without further output
}' input-file



You see, this gets messy really fast.





This solution doesn't work and produces an output equivalent to using grep based on the id against the output of the original query.
– Andrew
Jul 2 at 13:50





@Andrew I realized that you wanted to print out the whole entry shortly after I wrote the answer. Look at the second sed script which does what you want, I think.
– cbley
Jul 2 at 14:06





I understand it is messy, but that's what happens when you get told "you have to do X in Y way". The files are on a server and are quite cumbersome to deal with, so instead of changing things such that the xml messages are logged in a better fashion, this is the way we MUST do it :|
– Andrew
Jul 2 at 14:16



This might work for you (GNU sed):


sed -n '/<element>/{:a;/</element>/!{N;ba};/<id>12345</id>/p}' file



Use seds grep-like nature by using the -n option which turn off automatic printing of every line. On encountering a line that contains <element> gather up a collection of lines until the end tag </element> is reached. Now check the collection for <id>12345</id> and print the collection if true otherwise the collection is passed over.


-n


<element>


</element>


<id>12345</id>



If instead you want a specific element e.g. the second, use:


sed -n '/<element>/{:a;/</element>/!{N;ba};x;s/^/x/;/^x{2}$/{x;p;b};x}' file



This uses a counter held in the hold space which is incremented on each complete collection and checked for a specific number.



N.B. The range operator , can be used as a flip-flop type command but in general the start address{:a;N;end address!ba; commands on collection} is more useful.


,


start address{:a;N;end address!ba; commands on collection}






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

api-platform.com Unable to generate an IRI for the item of type

How to set up datasource with Spring for HikariCP?

Display dokan vendor name on Woocommerce single product pages