Friday, December 19, 2014

Perl Compare two files search match fields

more file1
798355.2    3188172.8
800105.4    3188210.2
797867.9    3188222.7
798267.9    3188222.7
798667.3    3188222.6

more file2
798355.4    3188172.7   2000.00    1    10
800105.4    3188210.2   3000.00    2    20
797867.9    3188222.7   1684.05    3    30
798267.9    3188222.7   8000.00    4    40
798667.9    3188222.7   1700.46    5    50

output
798355.4    3188172.8    NA    NA    NA
800105.4    3188210.2    3000.00    2    20
797867.9    3188222.7    1684.05    3    30
798267.9    3188222.7    8000.00    4    40
798667.9    3188222.6    NA    NA    NA

#!/usr/bin/perl
my $file1 = 'file1.txt';
my @data1;
open( FILE1, $file1 ) or die "Can't open file '$file1': $!";
while( <FILE1> ) {
chomp;
my @row1 = split;
push @data1, \@row1;
}
close( FILE1 );

my $file2 = 'file2.txt';
my @data2;
open( FILE2, $file2 ) or die "Can't open file '$file2': $!";
while( <FILE2> ) {
chomp;
my @row2 = split;
push @data2, \@row2;
}
close( FILE2 );

for my $i (0..$#data1)   #no row slave
{
for my $j (0..$#data2)   #no row master
{
$dx[$i]=($data1[$i][0]-$data2[$j][0])*($data1[$i][0]-$data2[$j][0]);  #key1 match
$dy[$i]=($data1[$i][1]-$data2[$j][1])*($data1[$i][1]-$data2[$j][1]);  #key2 match
$dz[$i]=($dx[$i]+$dy[$i]); #distance
push @dza,$dz[$i];
}
push @dzt,$dza[minindex(\@dza)];
push @dzt2,$data2[minindex(\@dza)][2];  #get data col 3
push @dzt3,$data2[minindex(\@dza)][3];  #get data col 4
push @dzt4,$data2[minindex(\@dza)][4];  #get data col 5

@dza=();
}

for my $i (0..$#data1)
{
print "$data1[$i][0]","\t";
print "$data1[$i][1]","\t";
if($dzt[$i]==0)
   {
print "$dzt2[$i]","\t";
print "$dzt3[$i]","\t";
print "$dzt4[$i]","\n";
   } else
{
print "NA","\t";
print "NA","\t";
print "NA","\n";
}
}


sub minindex {
  my( $aref, $idx_min ) = ( shift, 0 );
  $aref->[$idx_min] < $aref->[$_] or $idx_min = $_ for 1 .. $#{$aref};
  return $idx_min;
}









Perl: membandingkan dua file

Perl compare two files, calculate the closest value between two fields and output field values from another file
more file1.txt
798355.5    3188172.8  
800105.5    3188210.3  
798267.9    3188222.8  

more file2.txt
798355.4    3188172.7   2000.00
800105.4    3188210.2   3000.00
797867.9    3188222.7   1684.05
798267.9    3188222.7   8000.00
798667.9    3188222.7   1700.46

output:
798355.5    3188172.8    2000.00
800105.5    3188210.3    3000.00
798267.9    3188222.8    8000.00


#!/usr/bin/perl
my $file1 = 'file1.txt';
my @data1;
open( FILE1, $file1 ) or die "Can't open file '$file1': $!";
while( <FILE1> ) {
chomp;
my @row1 = split;
push @data1, \@row1;
}
close( FILE1 );

my $file2 = 'file2.txt';
my @data2;
open( FILE2, $file2 ) or die "Can't open file '$file2': $!";
while( <FILE2> ) {
chomp;
my @row2 = split;
push @data2, \@row2;
}
close( FILE2 );

for my $i (0..$#data1)   #no row slave
{
for my $j (0..$#data2)   #no row master
{
$dx[$i]=($data1[$i][0]-$data2[$j][0])*($data1[$i][0]-$data2[$j][0]);  #key1 match
$dy[$i]=($data1[$i][1]-$data2[$j][1])*($data1[$i][1]-$data2[$j][1]);  #key2 match
$dz[$i]=sqrt($dx[$i]+$dy[$i]); #distance
push @dza,$dz[$i];
}
push @dzt,$data2[minindex(\@dza)][2];  #find index of closest distance
@dza=();
}

for my $i (0..$#data1)
{
   print "$data1[$i][0]","\t";
   print "$data1[$i][1]","\t";
   print "$dzt[$i]","\n";
}

sub minindex {
  my( $aref, $idx_min ) = ( shift, 0 );
  $aref->[$idx_min] < $aref->[$_] or $idx_min = $_ for 1 .. $#{$aref};
  return $idx_min;
}

Sunday, June 29, 2014

Automatic First Break Picker (Perl)

Kode Perl di bawah ini merupakan kode untuk melakukan First Break Picking secara otomatis. Onset First Break diprediksi berdasarkan nilai maksimum dari gradien Akaike Criterion (AIC). 

Berikut ini adalah persamaan Akaike. Dimana intinya mengukur derajat variansi data dari sample pertama sampai sample ke-k lalu membandingkannya dengan variasi dari sample k+1 sampai sample terakhir.

Pada kondisi ideal, onset first break akan ditunjukkan pada nilai global minimum dari AIC. Akan tetapi berdasarkan hasil percobaan yang saya lakukan terhadap data real, pada kondisi tertentu hal ini tidak terpenuhi. Lalu saya mengembangkannya dengan menghitung gradien dari AIC yang mana memberikan hasil yang lebih konsisten. Onset ditunjukkan dengan nilai maksimum dari gradien AIC.

Onset First Break diperdiksi dengan AIC dan Gradien AIC.

Pada gambar di atas terlihat bahwa onset first break tidak ditunjukkan pada nilai nimimum dari AIC. Akan tetapi  ditunjukkan oleh nilai maksimum dari gradien AIC.
Onset First Break  dengan S/N yang rendah diperdiksi dengan AIC dan Gradien AIC.

Gambar di atas menunjukkan stabilitas gradien AIC dalam memprediksi onset first break pada data dengan S/N cukup rendah.

Sebagaimana yang kita pahami, S/N akan bertambah kecil sejalan dengan bertambahnya offset, oleh karena itu untuk kode Perl di bawah ini saya menyertakan "interpretasi" berupa extrapolasi linear untuk far offset, setelah sebelumnya melakukan moving average dan meng-elimminasi outlier picks.



Hasil prediksi firstbreak untuk data real dengan menggunakan kode Perl di bawah ini.

Petunjuk penggunaan:
1.Copy parameter di bawah ini lalu simpan dengan nama param
moving_average 20
reject_bad_picks 12
gradient_window_leb 5
sampling_rate 4
searching_peak_trough 8
polarity -1
offsetmin 400
batasoff 1400
widthlinearinterp 20
window_finetune 8

2. Copy kode Perl di bawah ini lalu simpan dengan nama fbpick.pl
3. Jalankan perintah perl fbpick.pl file.sgy   (shot gather)

#!/usr/bin/perl
#load file containing parameters###################################################
my $file = 'param';
my @data;
open( FILE, $file ) or die "Can't open file '$file': $!";
while( <FILE> ) {
chomp;
my @row = split;
push @data, \@row;
}
close( FILE );
$moveave=$data[0][1];
$errorthres=$data[1][1];
$leb=$data[2][1];
$SR=$data[3][1];
$w1=$data[4][1];
$polarity=$data[5][1];
$offsetmin=$data[6][1];
$batasoff=$data[7][1];
$lebar=$data[8][1];
$w1x=$data[9][1];

##load seismic segy################################################################
my $num_args = $#ARGV;
die "usage perl fbpick.pl file.sgy  \n" if $#ARGV !=0;
$fname=$ARGV[0];
sysopen(IN,$fname,O_RDONLY) or die "Can't open $fname: $!";
$size= -s IN;
open my $fh, '<', $fname or die "Can't open ";
$bytewidth=4;
my $cacing;
read($fh,$cacing,3600,0);
$noofsamples = unpack("n",substr($cacing,3220));
$tracecount=($size-3600)/(240+$bytewidth*$noofsamples);
$tracecount = sprintf("%.0f", $tracecount);
print"Automatic First Break Picker \n";
print"Agus Abdullah,PhD 2014\n";
seek (IN,3600,1);
my $buffer;
my $i=1;

##read segy trace by trace ######################################################
while (sysread(IN, $buffer, 240+($noofsamples*$bytewidth),0))
{
######## unpack trace header ####################################################
$head1=unpack("N",substr($buffer,5-1)); #ident num
$head2=unpack("N",substr($buffer,17-1)); #source loc
$head3=unpack("N",substr($buffer,21-1)); #receiver loc
$head4=unpack("N",substr($buffer,1-1)); #trace num
$head5=unpack("B*",substr($buffer,37-1,4)); #absolute offset!
$off = conv_bit_string_2_ibm32float ($head5);
$offabs=abs($off);

######## unpack trace amplitudes ##################################################
for ($j=0; $j<=$noofsamples; $j++)
{   ##begin sample
$amp[$j]=unpack("B*",substr($buffer,240+$j*4));
$trace_value[$j] = conv_bit_string_2_ibm32float($amp[$j]);
push @amp_trace,$trace_value[$j];
}

####### Calculate Akaike Criterion ###############################################
for ($j=0; $j<=$noofsamples-3; $j++)
$AIC[$j]=$j*log10(variance(@amp_trace[0..$j+1]))+($noofsamples-$j-1)*(log10(variance(@amp_trace[$j+2..$noofsamples]))); 
push @AICarray,$AIC[$j];
}

######## calculate gradient of Akaike Criterion ###################################
for ($j=0; $j<=$noofsamples-3-$leb; $j++)
{
    $gradAIC[$j]=($AICarray[$j+$leb]-$AICarray[$j])/($leb*$SR);
push @gradAICarray,$gradAIC[$j];
}
my $fb = index_of_largest( @gradAICarray );
   $fb = $fb+$leb;
   
######## search peak or  through in a specified time window #######################
for ($k=-($w1/$SR); $k<=($w1/$SR); $k++)
{
push @amp_win2,$amp_trace[($fb+$k)];
push @amp_win3,-1*$amp_trace[($fb+$k)];
}
####### polarity option ##########################################################
if ($polarity==-1) {
$idk= index_of_largest(@amp_win3), "\n";
}
else {
$idk= index_of_largest( @amp_win2 );
}

###### store data into memory ####################################################
$fbfinal=$fb+$idk-($w1/$SR);
push @head_1,$head1;
push @head_2,$head2;
push @head_3,$head3;
push @head_4,$head4;
push @offset,$offabs;
push @offset1,$off;
push @fbtime,$fbfinal*$SR;
push @fborig,$fbfinal*$SR;

##### clear memory ###############################################################
@amp_trace=();
@AICarray=();
@gradAICarray=();
@amp_win2=();
@amp_win3=();
progress_bar( $i, $tracecount, 25, '=' );
$i++;
}

print "\n";
##### moving average #############################################################
for ($i=0; $i<=$tracecount; $i++)
for ($k=0; $k<=$moveave; $k++)
{
push @{$fbave[$i]},$fbtime[$i+$k];
}
$ave[$i]=mean(@{$fbave[$i]});
push @avea, $ave[$i];
}
push @xtra, $_ for 1..($moveave/2);
push @xtra,@avea;


###### reject error ###############################################################
 for ($i=0; $i<=$tracecount-1; $i++)
$differ[$i]=abs($xtra[$i]-$fbtime[$i]);
if ($offset[$i] > $offsetmin) 
   { 
if ($differ[$i] < $errorthres) 
   { 
   $fbtime[$i]=$fbtime[$i];
   } else
   {
   $fbtime[$i]=$xtra[$i];
   } 
   } else
   {
   $fbtime[$i]=$fbtime[$i];
   } 
   push @fbmix,$fbtime[$i];
}
  
##### calculate difference #########################################################
for ($i=0; $i<=$tracecount-1; $i++)
$index[$i]=$head_1[$i]-$head_1[$i+1];
push @ind,$index[$i];
}

###### find index of new ident num##################################################
my $m;
for ( @ind ) {
push @indicies, $m if $_ < 0;
$m++;
}
push @index2,-1,@indicies,$tracecount-1;

###################################################################################
for ($i=0; $i<=$#index2-1; $i++)
{
@pic2=@fbmix[$index2[$i]+1..$index2[$i+1]];
@off2=@offset1[$index2[$i]+1..$index2[$i+1]];

###left offset index###############################################################
my $m1;
for ( @off2 ) 
{
push @inde3, $m1 if $_ < 0;
$m1++;
}
push @index3, 0, @inde3;
@off3=@off2[@index3];
@pic3=@pic2[@index3];

my $m2;
for ( @off3 ) 
{
push @inde3a, $m2 if $_ > -$batasoff;
$m2++;
}
@off3a=@off3[@inde3a];
@pick3a=@pic3[@inde3a];
@seloff3=@off3a[0..$lebar];
@selpic3=@pick3a[0..$lebar];

###left offset linear regression###################################################
$numel=$lebar-1;
for ( $j = 0; $j < $lebar-1; $j++ )
{
  $sumx   += $seloff3[$j];
  $sumy   += $selpic3[$j];
  $sumxsq += $seloff3[$j] ** 2;
  $sumxy  += $seloff3[$j] * $selpic3[$j];

$kiri=( $numel * $sumxsq - $sumx ** 2 );
if ($kiri > 0)
   { 
   $m = ( $numel * $sumxy - $sumx * $sumy )
   / ( $numel * $sumxsq - $sumx ** 2 );
 $b = ( $sumxsq * $sumy - $sumxy * $sumx )
   / ( $numel * $sumxsq - $sumx ** 2 );
   } else
   {
   $m=-0.22;
   $b=250;
   } 
   
   
for ( $j = 0; $j < $#off3; $j++ )
{
$yi1[$j]=$m*$off3[$j]+$b;
push @yil1,$yi1[$j];
}

###right offset index###############################################################
my $m3;
for ( @off2 ) 
{
push @index4, $m3 if $_ > 0;
$m3++;
}
@off4=@off2[@index4];
@pic4=@pic2[@index4];
my $m4;
for ( @off4 ) 
{
push @index4a, $m4 if $_ > $batasoff;
$m4++;
}
@seloff4=@off4[$index4a[0]-$lebar..$index4a[0]];
@selpic4=@pic4[$index4a[0]-$lebar..$index4a[0]];
$numel1=$lebar-1;

###right offset linear regression###################################################
for ( $j = 0; $j < $lebar-1; $j++ )
{
  $sumx1   += $seloff4[$j];
  $sumy1   += $selpic4[$j];
  $sumxsq1 += $seloff4[$j] ** 2;
  $sumxy1  += $seloff4[$j] * $selpic4[$j];

$kanan=( $numel1 * $sumxsq1 - $sumx1 ** 2 );
if ($kanan > 0)
   { 
$m1 = ( $numel1 * $sumxy1 - $sumx1 * $sumy1 )
   / ( $numel1 * $sumxsq1 - $sumx1 ** 2 );
$b1 = ( $sumxsq1 * $sumy1 - $sumxy1 * $sumx1 )
   / ( $numel1 * $sumxsq1 - $sumx1 ** 2 );
    } else
   {
   $m1=0.22;
   $b1=250;
   } 

for ( $j = 0; $j < $#off4+1; $j++ )
{
$yir[$j]=$m1*$off4[$j]+$b1;
push @yir1,$yir[$j];
}
push @yyy,@yil1,@yir1;
for ( $k = 0; $k < $#off2+1; $k++ )
{
if ($off2[$k] < -$batasoff )
   { 
   $fbku[$k]=$yyy[$k]; 
   } elsif ($off2[$k] > $batasoff )
   {
   $fbku[$k]=$yyy[$k];
   } else
   {
   $fbku[$k]=$fbmix[$k];
   }
   push @fbkux, $fbku[$k];
}

############clear memory ###########################################################
@yil1=();
@yir1=();
@yyy=();
@inde3=();
@inde4=();
@index3=();
@index4=();
@off3=();
@off4=();
@pic2=();
@pic4=();
@off2=();
@inde3a=();
@index3a=();
@index4a=();
@off3a=();
@pick3a=();
@seloff3=();
@selpic3=();
@seloff4=();
@selpic4=();
$m1=();
$m2=();
$m3=();
$m4=();
$m=();
$b=();
$sumx=();
$sumy=();
$sumxsq=();
$sumxy =();
$numel=();
$m1=();
$b1=();
$sumx1=();
$sumy1=();
$sumxsq1=();
$sumxy1 =();
$numel1=();
}

######### load segy for fine tuning ###############################################
sysopen(IN,$fname,O_RDONLY) or die "Can't open $fname: $!";
$size= -s IN;
open my $fh, '<', $fname or die "Can't open ";
$bytewidth=4;
seek (IN,3600,1);
my $buffer;
my $i=0;
open OUT," > $fname.picks" or die "$!\n";
#print OUT "IDENT_NUM | NUM_SOURCE | NUM_DETECT | TRACE_NUM | PICKS_RAW | PICKS_MOVEAVE | PICKS MIX | PICKS_FINE  \n";
while (sysread(IN, $buffer, 240+($noofsamples*$bytewidth),0))
{
   print OUT $head_1[$i],"\t"; 
   print OUT $head_2[$i],"\t"; 
   print OUT $head_3[$i],"\t"; 
   print OUT $head_4[$i],"\t";
   print OUT $fborig[$i],"\t"; 
   print OUT int($xtra[$i]+0.5),"\t";  
   print OUT int($fbmix[$i]+0.5),"\t";  
     
######## unpack trace amplitudes ##################################################
for ($j=0; $j<=$noofsamples; $j++)
{   
$amp[$j]=unpack("B*",substr($buffer,240+$j*4));
$trace_value[$j] = conv_bit_string_2_ibm32float($amp[$j]);
push @amp_trace,$trace_value[$j];
}

$fbx=$fbkux[$i]/$SR;
$fb=int($fbx+0.5);
######## search peak or  through in a specified time window #######################
for ($k=-($w1x/$SR); $k<=($w1x/$SR); $k++)
{
push @amp_win2x,$amp_trace[($fb+$k)];
push @amp_win3x,-1*$amp_trace[($fb+$k)];
}
####### polarity option ###########################################################
if ($polarity==-1) {
$idk= index_of_largest(@amp_win3x);
}
else {
$idk= index_of_largest(@amp_win2x );
}
###### store data into memory #####################################################
$fbfinal=$fb+$idk-($w1x/$SR);
$fbku2=$fbfinal*$SR;
print OUT int($fbku2+0.5),"\n";  
@amp_trace=();
@amp_win2x=();
@amp_win3x=();
progress_bar1( $i, $tracecount, 25, '=' );
$i++;
}

print "\n";
#print "\e[?25h";
print "Process done!  \n";
print "Output file: \n";
print "$fname.picks \n";

###### functions ########################################
 sub progress_bar {
    my ( $got, $total, $width, $char ) = @_;
    $width ||= 25;
    $char  ||= '=';
    my $num_width = length $total;
    local $| = 1;
  #  print "\e[?25l";
    printf "|%-${width}s| Predict FB...%${num_width}s traces out of %s (%.2f%%)\r",
        $char x (($width-1)*$got/$total). '>', $got, $total, 100*$got/$total;
}

 sub progress_bar1 {
    my ( $got, $total, $width, $char ) = @_;
    $width ||= 25;
    $char  ||= '=';
    my $num_width = length $total;
    local $| = 1;
  #  print "\e[?25l";
    printf "|%-${width}s| Finetune FB...%${num_width}s traces out of %s (%.2f%%)\r",
        $char x (($width-1)*(1+$got)/$total). '>', 1+$got, $total, 100*(1+$got)/$total;
}


sub bin2dec {
return unpack("N", pack("B32", substr("0" x 32 . shift, -32)));
}
sub conv_bit_string_2_ibm32float {
$bit_string_ibm = shift;
$first_digit_ibm = substr($bit_string_ibm, 0, 1);
if ( $first_digit_ibm eq "0" ) {
$sign_bit_ibm = 1;
} elsif ( $first_digit_ibm eq "1" ) {
$sign_bit_ibm = -1;
}
$bin_exponent_ibm = substr($bit_string_ibm, 1, 7);
$exponent_ibm = bin2dec($bin_exponent_ibm);
$bin_fraction_ibm = substr($bit_string_ibm, 8, 31);  ###31
@bit_chars_ibm = unpack("A1" x length($bin_fraction_ibm), $bin_fraction_ibm);
$place_holder_ibm = -1;
$fraction_ibm = 0;
foreach $bit_ibm ( @bit_chars_ibm ) {
$fraction_ibm += $bit_ibm * (2 ** $place_holder_ibm);
$place_holder_ibm += -1;
}
$ibm_float_ibm = ($sign_bit_ibm ** 1) * (16 ** ($exponent_ibm - 64)) *($fraction_ibm);
return sprintf("%.4f", $ibm_float_ibm);
}
sub mean { # mean of values in an array
  my $sum = 0 ;
  foreach $x (@_) {
    $sum += $x ;
  }
  return $sum/@_ ;
}
sub sum_squares { # sum of square differences from the mean
  my $mean = mean (@_) ;
  my $sum_squares = 0 ;
  foreach $x (@_) {
    $sum_squares += ($x - $mean) ** 2 ;
  }
  return $sum_squares ;
}
sub variance { # variance of values in an array
  my $sum_squares = sum_squares (@_) ;
  my $deg_freedom = @_ - 1 ;
  return $sum_squares/$deg_freedom ;
}

sub median { # median of values in an array
  my @sorted = sort {$a <=> $b} (@_) ;
  if (1 == @sorted % 2) # Odd number of elements
    {return $sorted[(@sorted-1)/2]}
  else                   # Even number of elements
    {return ($sorted[@sorted/2-1]+$sorted[@sorted/2]) / 2}
}
sub histogram { # Counts of elements in an array
  my %histogram = () ;
  foreach $value (@_) {$histogram{$value}++}
  return (%histogram) ;
}
sub _index_of_largest
{
    my $X = shift;  my $L = shift;
    my $x = $#_;    my $l = pop;
    defined $X and $L > $l and ($x,$l) = ($X,$L);
    @_ or return $x;
    unshift @_, $x, $l;
    goto &_index_of_largest
}
sub index_of_largest
{
    @_ or die;
    _index_of_largest(undef,undef,@_);
}
sub log10 {
   my $n = shift;
    if ( $n eq "0" ) {
    } else {
   return log($n)/log(10);
}    
    }
exit;