Skip to main content

awk for column replacement [Resolved]

I need to replace the column 1 of file 1, with column 1 of file 2, when column 1 of file 1 matches with col 5 of file 2.

file 1

ENSG00000237491    1     714150     745440      2       1  143562      
ENSG00000230092    1     736259     745541      2       1  143562    
ENSG00000269831    1     738532     739137      2       1  143562    
ENSG00000187634    1     860260     879955     26      20  130731 

file 2

AL669831.1      1       738532  739137  ENSG00000269831
SAMD11          1       860260  879955  ENSG00000187634
RP11-206L10.9   1       714150  745440  ENSG00000237491
RP11-206L10.8   1       736259  745541  ENSG00000230092

Output should be like

RP11-206L10.9    1     714150     745440      2       1  143562      
RP11-206L10.8    1     736259     745541      2       1  143562    
AL669831.1       1     738532     739137      2       1  143562    
SAMD11           1     860260     879955     26      20  130731 

I tried the codes below but it doesn't work, prints no output, stops immediately without any error.

 awk 'NR==FNR{a[$1]=$1;next} $5 in a {$1=a[$1];print}' FS='\t' OFS='\t' file1.txt file2.txt  > output.txt

Thank you!


Question Credit: cookiemonster
Question Reference
Asked September 19, 2019
Posted Under: Unix Linux
15 views
1 Answers

Using your file1.txt as above and an edited version of your file2.txt (so that there are some input lines in file2.txt where $5 is in file1.txt):

AL157931.1      13      23551994        23552136        ENSG00000237491
HMGA1P6 13      23708313        23708703        ENSG00000230092
RNY3P4  13      23726725        23726825        ENSG00000207157

Reading file2.txt before file1.txt, we can get the following output:

$ awk 'BEGIN {FS=OFS="\t"}
       NR==FNR {a[$5]=$1; next};
       $1 in a {$1=a[$1];print}' file2.txt file1.txt 
AL157931.1      1       714150  745440  2       1       143562
HMGA1P6 1       736259  745541  2       1       143562

The key to how this works is in realising that we want to build an associative array where the keys are $5 from file2 and the values are $1 from the same file.

Then, when we loop over file1, check if $1 is a key in the array. If so, replace $1 with the matching value (i.e. the corresponding $1 from file2) and print the line.


credit: cas
Answered September 19, 2019
Your Answer