Skip to main content

How to remove everything after the 2nd underscore but keep the other columns? [Resolved]

my file.txt looks like this

variant_id pval_nominal
1_752721_A_G_b37 2.23485e-05
1_900397_C_T_b37 3.04603e-05
1_928297_G_A_b37 2.12455e-05

I am trying to remove everything after the 2nd underscore in the first column so that it looks like this:

variant_id pval_nominal
1_752721 2.23485e-05
1_900397 3.04603e-05
1_928297 2.12455e-05

The reason why I ask everything after the 2nd underscore in the first column to be removes is that instances in the first column can look like this: 1_1025672_GCA_G_b37

I was trying to use this command:

 awk -F _ '{print $1 (NF>1? FS $2 : "")}'  file.txt > file2.txt

but file2.txt looks like this:

variant_id pval
1_752721
1_900397
1_928297

How to run this command so that 2nd column is returned as well?

Thanks


Question Credit: anikaM
Question Reference
Asked July 18, 2019
Tags: , command-line
Posted Under: Unix Linux
43 views
2 Answers

Try this,

sed 's/_[A-Z].* / /g' file

variant_id pval_nominal
1_752721 2.23485e-05
1_900397 3.04603e-05
1_928297 2.12455e-05

credit: msp9011
Answered July 18, 2019

Leave the main field separator as it is and use awk's split() function on the first field.

$ awk <data '{ split($1,f1,/_/) ; printf("%s_%s %s\n",f1[1],f1[2],$2) }'

credit: Janka
Answered July 18, 2019
Your Answer