使用 wget
下载数据后发现文件名全带了链接的 query 符号:
$ ls
download?fn=/PCAWG/clinical_and_histology/pcawg_donor_clinical_August2016_v9.xlsx
download?fn=/PCAWG/clinical_and_histology/pcawg_donor_subtype_cohort_list.xlsx
download?fn=/PCAWG/clinical_and_histology/pcawg_specimen_histology_August2016_v9.xlsx
download?fn=/PCAWG/consensus_cnv/consensus.20170119.somatic.cna.annotated.tar.gz
download?fn=/PCAWG/consensus_cnv/consensus.20170119.somatic.cna.icgc.public.tar.gz
download?fn=/PCAWG/consensus_cnv/consensus.20170119.somatic.cna.tcga.public.tar.gz
download?fn=/PCAWG/consensus_cnv/consensus.20170217.purity.ploidy.txt.gz
download?fn=/PCAWG/consensus_snv_indel/final_consensus_passonly.snv_mnv_indel.icgc.public.maf.gz
download?fn=/PCAWG/consensus_sv/final_consensus_sv_bedpe_passonly.icgc.public.tgz
download?fn=/PCAWG/consensus_sv/final_consensus_sv_bedpe_passonly.tcga.public.tgz
所以最好把前面的内容去掉,sed
可以使用模式匹配进行文本修改,而 mv
可以重命名文件,我们结合两者试试。首先用单个文件测试修改方式是否正确:
$ echo download?fn=/PCAWG/clinical_and_histology/pcawg_specimen_histology_August2016_v9.xlsx | sed -E 's/.*%2(.*)/1/'
Fpcawg_specimen_histology_August2016_v9.xlsx
然后检测下目录下的所有文件都可以这样处理:
代码语言:javascript复制$ ls | sed -E 's/.*%2(.*)/1/'
Fpcawg_donor_clinical_August2016_v9.xlsx
Fpcawg_donor_subtype_cohort_list.xlsx
Fpcawg_specimen_histology_August2016_v9.xlsx
Fconsensus.20170119.somatic.cna.annotated.tar.gz
Fconsensus.20170119.somatic.cna.icgc.public.tar.gz
Fconsensus.20170119.somatic.cna.tcga.public.tar.gz
Fconsensus.20170217.purity.ploidy.txt.gz
Ffinal_consensus_passonly.snv_mnv_indel.icgc.public.maf.gz
Ffinal_consensus_sv_bedpe_passonly.icgc.public.tgz
Ffinal_consensus_sv_bedpe_passonly.tcga.public.tgz
实际改名字需要使用 for
循环进行迭代:
$ for f in `ls`; do echo `echo $f | sed -E 's/.*%2(.*)/1/'`; done
Fpcawg_donor_clinical_August2016_v9.xlsx
Fpcawg_donor_subtype_cohort_list.xlsx
Fpcawg_specimen_histology_August2016_v9.xlsx
Fconsensus.20170119.somatic.cna.annotated.tar.gz
Fconsensus.20170119.somatic.cna.icgc.public.tar.gz
Fconsensus.20170119.somatic.cna.tcga.public.tar.gz
Fconsensus.20170217.purity.ploidy.txt.gz
Ffinal_consensus_passonly.snv_mnv_indel.icgc.public.maf.gz
Ffinal_consensus_sv_bedpe_passonly.icgc.public.tgz
Ffinal_consensus_sv_bedpe_passonly.tcga.public.tgz
上面在实际调用 mv
之前检测一下这样操作不会有问题,然后修改为实际要重命名的操作。
$ for f in `ls`; do mv $f `echo $f | sed -E 's/.*%2(.*)/1/'`; done
$ ls
Fconsensus.20170119.somatic.cna.annotated.tar.gz Ffinal_consensus_sv_bedpe_passonly.icgc.public.tgz
Fconsensus.20170119.somatic.cna.icgc.public.tar.gz Ffinal_consensus_sv_bedpe_passonly.tcga.public.tgz
Fconsensus.20170119.somatic.cna.tcga.public.tar.gz Fpcawg_donor_clinical_August2016_v9.xlsx
Fconsensus.20170217.purity.ploidy.txt.gz Fpcawg_donor_subtype_cohort_list.xlsx
Ffinal_consensus_passonly.snv_mnv_indel.icgc.public.maf.gz Fpcawg_specimen_histology_August2016_v9.xlsx
最后本文可以抽象出来的一个通用操作是:
代码语言:javascript复制for f in `ls`; do <cmd> `echo $f | sed -E <operation>; done
该模板可以应用于其他想要进行先修改文件名然后运行命名的操作。