HPR4404: Kevie nerd snipes Ken by grepping xml

Hacker Public Radio - A podcast by Hacker Public Radio

Try Bookbeat 60! days for free, click here

Enjoy a whole world of audiobooks and e-books, everything from new releases to the classics

Categories:

More Command line fun: downloading a podcast In the show hpr4398 :: Command line fun: downloading a podcast Kevie walked us through a command to download a podcast. He used some techniques here that I hadn't used before, and it's always great to see how other people approach the problem. Let's have a look at the script and walk through what it does, then we'll have a look at some "traps for young players" as the EEVBlog is fond of saying. Analysis of the Script wget `curl https://tuxjam.otherside.network/feed/podcast/ | grep -o 'https*://[^"]*ogg' | head -1` It chains four different commands together to "Save the latest file from a feed". Let's break it down so we can have checkpoints between each step. I often do this when writing a complex one liner - first do it as steps, and then combine it. The curl command gets https://tuxjam.otherside.network/feed/podcast/ . To do this ourselves we will call curl https://tuxjam.otherside.network/feed/podcast/ --output tuxjam.xml , as the default file name is index.html. This gives us a xml file, and we can confirm it's valid xml with the xmllint command. $ xmllint --format tuxjam.xml >/dev/null $ echo $? 0 Here the output of the command is ignored by redirecting it to /dev/null Then we check the error code the last command had. As it's 0 it completed sucessfully. Kevie then passes the output to the grep search command with the option -o and then looks for any string starting with https followed by anything then followed by two forward slashes, then -o, --only-matching Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line We can do the same with. I was not aware that grep defaulted to regex, as I tend to add the --perl-regexp to explicitly add it. grep --only-matching 'https*://[^"]*ogg' tuxjam.xml http matches the characters http literally (case sensitive) s* matches the character s literally (case sensitive) Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy] : matches the character : literally / matches the character / literally / matches the character / literally [^"]* match a single character not present in the list below Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy] " a single character in the list " literally (case sensitive) ogg matches the characters ogg literally (case sensitive) When we run this ourselves we get the following $ grep --only-matching 'https*://[^"]*ogg' tuxjam.xml https://archive.org/download/tuxjam-121/tuxjam_121.ogg https://archive.org/download/tuxjam-120/TuxJam_120.ogg https://archive.org/download/tux-jam-119/TuxJam_119.ogg https://archive.org/download/tuxjam_118/tuxjam_118.ogg https://archive.org/download/tux-jam-117-uncut/TuxJam_117.ogg https://tuxjam.otherside.network/tuxjam-115-ogg https://archive.org/download/tuxjam_116/tuxjam_116.ogg https://tuxjam.otherside.network/tuxjam-115-ogg https://tuxjam.otherside.network/tuxjam-115-ogg https://t

Visit the podcast's native language site