Remove outliers until there are none left

Illustration
Tatjana Mü - 2023-05-04T10:25:46+00:00
Question: Remove outliers until there are none left

Dear community,   I apologize that I can't offer a better first try. I have a double array. I want to write a Loop for removing outliers from every column. The idea is: The code test for outliers, remove them, do it again, as long as there are outliers. If no outliers are found anymore, it should stop and give me back an double array without these outliers.   I tried it: directory_name=uigetdir('','Ordner mit Messungen auswählen'); [nur_file_name,pfad]=uigetfile({'*.csv','csv-files (*.csv)';'*.*','all Files'},... 'Die csv-Files der Proben oeffnen (probe_001.csv=',[directory_name '/'], 'Multiselect', 'on'); nur_file_name=cellstr(nur_file_name); nur_file_name=sort(nur_file_name); filename=strcat(pfad,nur_file_name); anzahl_files=size(filename,2); for xy=1:anzahl_files fid_in=fopen(char(filename(xy)),'r'); filename_s = matlab.lang.makeValidName(nur_file_name); filename_s=string(filename_s); filename_s = erase(filename_s,"_csv"); filename_s = erase(filename_s,"LiqQuant_"); filename_c=cellstr(filename_s); for c=1:anzahl_files filename_f{c}=extractBefore(filename_c{c},11); end filename_s=string(filename_f); %----------------Import elements and intensity-------------------- clear element_RL clear intens_RL tmpImport = importdata(filename{xy},','); element_RL = tmpImport.colheaders; element_RL(:,[1 6 8 10 12 14 16 17 19 21 23 26 27 29 30 32 33 36 38 43 45 48 57 59 61 64 67 69 94 97 99 102 106 223 298 303 304 305])=[]; element_RL=string(element_RL); [anzahl_zeile,anzahl_elemente]=size(element_RL); intens_RL=tmpImport.data; intens_RL(:,[1 6 8 10 12 14 16 17 19 21 23 26 27 29 30 32 33 36 38 43 45 48 57 59 61 64 67 69 94 97 99 102 106 223 298 303 304 305])=[]; [anzahl_runs,anzahl_elemente]=size(intens_RL); %---------------remove outliers---------------- while intens_RL=ismember(NaN) %Wrong, because will run forever threshold = mean(intens_RL)+3*std(intens_RL); intens_RL(bsxfun(@(x, y) x > y, intens_RL, threshold)) = NaN; %outliers removing, set to NaN end that my loop is so horrible, but I never wrote a while-loop before. 

Expert Answer

Profile picture of Neeta Dsouza Neeta Dsouza answered . 2025-11-20

I updated the end of your code
 
the plot is for myself to see the difffences before / after thresholding (if hot spots are indeed removed)
 
directory_name=uigetdir('','Ordner mit Messungen auswählen');
[nur_file_name,pfad]=uigetfile({'*.csv','csv-files (*.csv)';'*.*','all Files'},...
    'Die csv-Files der Proben oeffnen (probe_001.csv=',[directory_name '/'], 'Multiselect', 'on');
nur_file_name=cellstr(nur_file_name);
nur_file_name=sort(nur_file_name);
filename=strcat(pfad,nur_file_name);
anzahl_files=size(filename,2);
for xy=1:anzahl_files
    fid_in=fopen(char(filename(xy)),'r');
    
    filename_s = matlab.lang.makeValidName(nur_file_name);
    filename_s=string(filename_s);
    filename_s = erase(filename_s,"_csv");
    filename_s = erase(filename_s,"LiqQuant_");
    filename_c=cellstr(filename_s);
    for c=1:anzahl_files
        filename_f{c}=extractBefore(filename_c{c},11);
    end
    filename_s=string(filename_f);
    
    
    %----------------Import elements and intensity--------------------
    
    clear element_RL
    clear intens_RL
    
    tmpImport = importdata(filename{xy},',');
    element_RL = tmpImport.colheaders;
    element_RL(:,[1 6 8 10 12 14 16 17 19 21 23 26 27 29 30 32 33 36 38 43 45 48 57 59 61 64 67 69 94 97 99 102 106 223 298 303 304 305])=[];
    element_RL=string(element_RL);
    [anzahl_zeile,anzahl_elemente]=size(element_RL);
    
    intens_RL=tmpImport.data;
    intens_RL(:,[1 6 8 10 12 14 16 17 19 21 23 26 27 29 30 32 33 36 38 43 45 48 57 59 61 64 67 69 94 97 99 102 106 223 298 303 304 305])=[];
    [anzahl_runs,anzahl_elemente]=size(intens_RL);
    
        %---------------remove outliers----------------
        
        figure(1)
        clim = [-5 7];
        subplot(211),imagesc(log10(abs(intens_RL)),clim);colormap('jet');colorbar("vert")
        title('before thresholding');
        c = 1; % init c above 0
        
        while c>0
            threshold = mean(intens_RL,1,'omitnan')+3*std(intens_RL,1,'omitnan');
           ind = intens_RL>(ones(anzahl_runs,1)*threshold);
    %         ind = intens_RL>threshold; % works too
            b = find(ind);
            c = numel(b)       % will display in the command window how many outliers are removed at each iteration
            intens_RL(ind) = NaN;
        end
        subplot(212),imagesc(log10(abs(intens_RL)),clim);colormap('jet');colorbar("vert")
        title('after thresholding');
        
        
    end

 


Not satisfied with the answer ?? ASK NOW

Get a Free Consultation or a Sample Assignment Review!