Extract URLs from a File (TubeBuddy Backup) using Command Line

RickMakes May 8, 2020May 8, 2020 Leave a comment

https://youtu.be/hCLu6SNIFbM

Extract URLs from a File (TubeBuddy Backup) using Command Line (https://youtu.be/hCLu6SNIFbM)

https://www.tubebuddy.com/rickmakes (Affiliate Link)

Installing Windows Subsystem for Linux: https://youtu.be/KpBVUmMvue0

TubeBuddy Playlist: https://www.youtube.com/playlist?list=PLErU2HjQZ_ZNdQHnpuDW0memweWCc2eS4

View Backup File

less tubebuddy_backup.csv

Extract URLs

grep -Eo "(http|https)://[a-zA-Z0-9./?=_-]*" tubebuddy_backup.csv

Source of regular expression: https://unix.stackexchange.com/questions/181254/how-to-use-grep-and-cut-in-script-to-obtain-website-urls-from-an-html-file

Sort URLs

grep -Eo "(http|https)://[a-zA-Z0-9./?=_-]*" tubebuddy_backup.csv | sort

Get Line Count

grep -Eo "(http|https)://[a-zA-Z0-9./?=_-]*" tubebuddy_backup.csv | sort | wc -l

Find Unique URLs

grep -Eo "(http|https)://[a-zA-Z0-9./?=_-]*" tubebuddy_backup.csv | sort | uniq

Find Unique URLs with Count per URL

grep -Eo "(http|https)://[a-zA-Z0-9./?=_-]*" tubebuddy_backup.csv | sort | uniq -c

Find Unique URLS with Count per URL (Sorted)

grep -Eo "(http|https)://[a-zA-Z0-9./?=_-]*" tubebuddy_backup.csv | sort | uniq -c | sort

Filter Out https

grep -Eo "(http|https)://[a-zA-Z0-9./?=_-]*" tubebuddy_backup.csv | sort | uniq | grep -vi 'https'

Save Results to File

grep -Eo "(http|https)://[a-zA-Z0-9./?=_-]*" tubebuddy_backup.csv | sort | uniq > file.txt

Use awk to Make Links for Web Page

grep -Eo "(http|https)://[a-zA-Z0-9./?=_-]*" tubebuddy_backup.csv | sort | uniq | awk 'BEGIN{print "<html>"}{printf "<a href=\"%s\" target=\"_blank\">%s</a>", $0, $0}{gsub("http:","https:",$0)}{printf " | <a href=\"%s\" target=\"_blank\">https</a><br>\n", $0}' > file.html

Leave a comment

Cancel reply