working with kolmogrov test

Illustration
hamidreza hamidi - 2022-04-13T12:36:37+00:00
Question: working with kolmogrov test

Hi, I am trying to use kolmogorov test which I' going to use it in my artickle , I generate a data set A then I randomly made a sample set from A. then I wanated to compare these two sample sets with kstest. but It showed me they don't have same distribution.   here is my simple code:   clc clear all close all n_s = 1000; mother_random_variable = lognrnd(0.3,0.5,[1,100000]); %data lognormal S = mother_random_variable(randi(numel(mother_random_variable),1,n_s)) %sample S_y = [S]'; %selected data S_mean=mean(S_y); %mean sample S_var=std(S_y); %variance sammple test_cdf = [S_y,cdf('Lognormal',S_y,S_var,S_mean)]; %make cdf kstest(S_y,'CDF',test_cdf) %ktest plot(sort(S_y),logncdf(sort(S_y)),'r--') hold on cdfplot(S_y) they have same distribution and ITs srange result . I found more strage result when I compare my data set with itself, Its result shows me they don't have same distribution. clc clear all close all n_s = 1000; mother_random_variable = lognrnd(0.3,0.5,[1,100000]); %data S=mother_random_variable; % I named data with S for simpler code S_y = [S]'; %selected data S_mean=mean(S_y); S_var=std(S_y); test_cdf = [S_y,cdf('Lognormal',S_y,S_var,S_mean)]; kstest(S_y,'CDF',test_cdf) plot(sort(S_y),logncdf(sort(S_y)),'r--') hold on cdfplot(S_y) DO you have any Idea.

Related Questions

  • working with kolmogrov test
  • Expert Answer

    Profile picture of Prashant Kumar Prashant Kumar answered . 2025-11-20

    Having only looked at your 2nd block of code, I have some comments and suggestions.
     
    1) The parameters for a lognormal distribution are mean and standard deviation in that order. In your code, you're entering them in reverse when you call the cdf() function and this is creating a totally different distribution than you intend to do.
     
     
    y = cdf('Lognormal', S_y, S_var, S_mean);    % your code, incorrect
    y = cdf('Lognormal', S_y, S_mean, S_var);    % correct

    2) This is just a suggestion but it's a bit cleaner to use the makedist() function rather than entering the parameters manually into cdf().

    doc cdf
    
    pd = makedist('Lognormal', 'mu', S_mean, 'sigma', S_var); 
    y = cdf(pd, S_y);   % instead of cdf('Lognormal', S_y, S_mean, S_var)                  

    3) " when I compare my data set with itself, Its result shows me they don't have same distribution." But you aren't comparing your data with itself. You're comparing your data with the results of the cumulative distribution function of your data. The plot below shows the distribution of values from your data (top) and the distribution of values from the CDF. Clearly those distributions differ and the kstest() correctly rejects the null hypothesis.

    figure
    subplot(2,1,1)
    histogram(S_y)
    title('mother random variable')
    subplot(2,1,2)
    histogram(cdf('Lognormal', S_y, S_mean, S_var))
    title('CDF distribution')
    4) This may be irrelevant given the points above but you are using different means and standard deviations to create the "mother_random_variable" and the cdf() data. For the random variables you are using (0.3, 0.5) for the mean and std but for the cdf you're using the mean and std of the data which are ~(1.5, 0.8)


    Not satisfied with the answer ?? ASK NOW

    Get a Free Consultation or a Sample Assignment Review!