Dataset statistics
Number of variables | 6 |
---|---|
Number of observations | 14075487 |
Missing cells | 16236709 |
Missing cells (%) | 19.2% |
Total size in memory | 644.3 MiB |
Average record size in memory | 48.0 B |
Variable types
Numeric | 3 |
---|---|
Text | 3 |
conts_type_509L has 2394056 (17.0%) missing values | Missing |
credacc_cards_status_52L has 13733404 (97.6%) missing values | Missing |
num_group1 has 2276307 (16.2%) zeros | Zeros |
num_group2 has 6525978 (46.4%) zeros | Zeros |
Reproduction
Analysis started | 2024-02-13 19:37:46.943271 |
---|---|
Analysis finished | 2024-02-13 19:38:08.967824 |
Duration | 22.02 seconds |
Software version | ydata-profiling vv4.6.4 |
Download configuration | config.json |
case_id
Real number (ℝ)
Distinct | 1221522 |
---|---|
Distinct (%) | 8.7% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 1454198.017 |
Minimum | 2 |
---|---|
Maximum | 2703454 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 107.4 MiB |
Quantile statistics
Minimum | 2 |
---|---|
5-th percentile | 144108 |
Q1 | 1237440 |
median | 1575626 |
Q3 | 1861301 |
95-th percentile | 2657763 |
Maximum | 2703454 |
Range | 2703452 |
Interquartile range (IQR) | 623861 |
Descriptive statistics
Standard deviation | 787508.3638 |
---|---|
Coefficient of variation (CV) | 0.5415413545 |
Kurtosis | -0.7224045076 |
Mean | 1454198.017 |
Median Absolute Deviation (MAD) | 307009 |
Skewness | -0.3252931327 |
Sum | 2.046854528 × 1013 |
Variance | 6.201694231 × 1011 |
Monotonicity | Increasing |
Value | Count | Frequency (%) |
1246034 | 79 | < 0.1% |
1390922 | 79 | < 0.1% |
1344039 | 78 | < 0.1% |
1585694 | 77 | < 0.1% |
2647430 | 77 | < 0.1% |
1495888 | 77 | < 0.1% |
1260644 | 76 | < 0.1% |
1416022 | 76 | < 0.1% |
158480 | 74 | < 0.1% |
1393522 | 74 | < 0.1% |
Other values (1221512) | 14074720 |
Value | Count | Frequency (%) |
2 | 4 | |
3 | 3 | |
4 | 2 | < 0.1% |
5 | 1 | < 0.1% |
6 | 5 |
Value | Count | Frequency (%) |
2703454 | 6 | < 0.1% |
2703453 | 24 | |
2703452 | 6 | < 0.1% |
2703451 | 17 | |
2703450 | 29 |
Distinct | 9 |
---|---|
Distinct (%) | < 0.1% |
Missing | 109249 |
Missing (%) | 0.8% |
Memory size | 107.4 MiB |
Length
Max length | 11 |
---|---|
Median length | 8 |
Mean length | 8.001376677 |
Min length | 8 |
Characters and Unicode
Total characters | 111749131 |
---|---|
Distinct characters | 13 |
Distinct categories | 4 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | a55475b1 |
---|---|
2nd row | a55475b1 |
3rd row | a55475b1 |
4th row | a55475b1 |
5th row | a55475b1 |
Value | Count | Frequency (%) |
a55475b1 | 13958443 | |
p33_145_161 | 3309 | < 0.1% |
p201_63_60 | 2960 | < 0.1% |
p19_60_110 | 1135 | < 0.1% |
p23_105_103 | 117 | < 0.1% |
p133_119_56 | 100 | < 0.1% |
p41_107_150 | 92 | < 0.1% |
p17_56_144 | 63 | < 0.1% |
p127_74_114 | 19 | < 0.1% |
Most occurring characters
Value | Count | Frequency (%) |
5 | 41879010 | |
1 | 13975728 | 12.5% |
4 | 13962008 | 12.5% |
7 | 13958636 | 12.5% |
a | 13958443 | 12.5% |
b | 13958443 | 12.5% |
_ | 15590 | < 0.1% |
6 | 10527 | < 0.1% |
3 | 10012 | < 0.1% |
0 | 8608 | < 0.1% |
Other values (3) | 12126 | < 0.1% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 83808860 | |
Lowercase Letter | 27916886 | 25.0% |
Connector Punctuation | 15590 | < 0.1% |
Uppercase Letter | 7795 | < 0.1% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
5 | 41879010 | |
1 | 13975728 | 16.7% |
4 | 13962008 | 16.7% |
7 | 13958636 | 16.7% |
6 | 10527 | < 0.1% |
3 | 10012 | < 0.1% |
0 | 8608 | < 0.1% |
2 | 3096 | < 0.1% |
9 | 1235 | < 0.1% |
Lowercase Letter
Value | Count | Frequency (%) |
a | 13958443 | |
b | 13958443 |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 15590 |
Uppercase Letter
Value | Count | Frequency (%) |
P | 7795 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 83824450 | |
Latin | 27924681 | 25.0% |
Most frequent character per script
Common
Value | Count | Frequency (%) |
5 | 41879010 | |
1 | 13975728 | 16.7% |
4 | 13962008 | 16.7% |
7 | 13958636 | 16.7% |
_ | 15590 | < 0.1% |
6 | 10527 | < 0.1% |
3 | 10012 | < 0.1% |
0 | 8608 | < 0.1% |
2 | 3096 | < 0.1% |
9 | 1235 | < 0.1% |
Latin
Value | Count | Frequency (%) |
a | 13958443 | |
b | 13958443 | |
P | 7795 | < 0.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 111749131 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
5 | 41879010 | |
1 | 13975728 | 12.5% |
4 | 13962008 | 12.5% |
7 | 13958636 | 12.5% |
a | 13958443 | 12.5% |
b | 13958443 | 12.5% |
_ | 15590 | < 0.1% |
6 | 10527 | < 0.1% |
3 | 10012 | < 0.1% |
0 | 8608 | < 0.1% |
Other values (3) | 12126 | < 0.1% |
conts_type_509L
Text
MISSING
 
Distinct | 9 |
---|---|
Distinct (%) | < 0.1% |
Missing | 2394056 |
Missing (%) | 17.0% |
Memory size | 107.4 MiB |
Length
Max length | 17 |
---|---|
Median length | 14 |
Mean length | 12.71871049 |
Min length | 5 |
Characters and Unicode
Total characters | 148572739 |
---|---|
Distinct characters | 20 |
Distinct categories | 2 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | EMPLOYMENT_PHONE |
---|---|
2nd row | EMPLOYMENT_PHONE |
3rd row | PRIMARY_MOBILE |
4th row | PRIMARY_MOBILE |
5th row | PRIMARY_MOBILE |
Value | Count | Frequency (%) |
primary_mobile | 6294386 | |
home_phone | 2472180 | 21.2% |
employment_phone | 1679417 | 14.4% |
phone | 924958 | 7.9% |
primary_email | 257126 | 2.2% |
alternative_phone | 37833 | 0.3% |
secondary_mobile | 15504 | 0.1% |
25 | < 0.1% | |
skype | 2 | < 0.1% |
Most occurring characters
Value | Count | Frequency (%) |
M | 18949542 | |
E | 17603590 | |
O | 15591379 | |
P | 13345369 | |
I | 13156361 | |
R | 13156361 | |
_ | 10756446 | |
L | 8284266 | 5.6% |
Y | 8246435 | 5.6% |
H | 7586593 | 5.1% |
Other values (10) | 21896397 |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 137816293 | |
Connector Punctuation | 10756446 | 7.2% |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
M | 18949542 | |
E | 17603590 | |
O | 15591379 | |
P | 13345369 | |
I | 13156361 | |
R | 13156361 | |
L | 8284266 | |
Y | 8246435 | |
H | 7586593 | |
A | 6899858 | 5.0% |
Other values (9) | 14996539 |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 10756446 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 137816293 | |
Common | 10756446 | 7.2% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
M | 18949542 | |
E | 17603590 | |
O | 15591379 | |
P | 13345369 | |
I | 13156361 | |
R | 13156361 | |
L | 8284266 | |
Y | 8246435 | |
H | 7586593 | |
A | 6899858 | 5.0% |
Other values (9) | 14996539 |
Common
Value | Count | Frequency (%) |
_ | 10756446 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 148572739 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
M | 18949542 | |
E | 17603590 | |
O | 15591379 | |
P | 13345369 | |
I | 13156361 | |
R | 13156361 | |
_ | 10756446 | |
L | 8284266 | 5.6% |
Y | 8246435 | 5.6% |
H | 7586593 | 5.1% |
Other values (10) | 21896397 |
MISSING
 
Distinct | 6 |
---|---|
Distinct (%) | < 0.1% |
Missing | 13733404 |
Missing (%) | 97.6% |
Memory size | 107.4 MiB |
Length
Max length | 11 |
---|---|
Median length | 9 |
Mean length | 7.840579625 |
Min length | 6 |
Characters and Unicode
Total characters | 2682129 |
---|---|
Distinct characters | 17 |
Distinct categories | 1 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | CANCELLED |
---|---|
2nd row | CANCELLED |
3rd row | CANCELLED |
4th row | CANCELLED |
5th row | INACTIVE |
Value | Count | Frequency (%) |
cancelled | 167031 | |
active | 109642 | |
inactive | 62968 | 18.4% |
blocked | 2098 | 0.6% |
renewed | 304 | 0.1% |
unconfirmed | 40 | < 0.1% |
Most occurring characters
Value | Count | Frequency (%) |
E | 509722 | |
C | 508810 | |
A | 339641 | |
L | 336160 | |
I | 235618 | |
N | 230383 | |
V | 172610 | 6.4% |
T | 172610 | 6.4% |
D | 169473 | 6.3% |
O | 2138 | 0.1% |
Other values (7) | 4964 | 0.2% |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 2682129 |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
E | 509722 | |
C | 508810 | |
A | 339641 | |
L | 336160 | |
I | 235618 | |
N | 230383 | |
V | 172610 | 6.4% |
T | 172610 | 6.4% |
D | 169473 | 6.3% |
O | 2138 | 0.1% |
Other values (7) | 4964 | 0.2% |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 2682129 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
E | 509722 | |
C | 508810 | |
A | 339641 | |
L | 336160 | |
I | 235618 | |
N | 230383 | |
V | 172610 | 6.4% |
T | 172610 | 6.4% |
D | 169473 | 6.3% |
O | 2138 | 0.1% |
Other values (7) | 4964 | 0.2% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 2682129 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
E | 509722 | |
C | 508810 | |
A | 339641 | |
L | 336160 | |
I | 235618 | |
N | 230383 | |
V | 172610 | 6.4% |
T | 172610 | 6.4% |
D | 169473 | 6.3% |
O | 2138 | 0.1% |
Other values (7) | 4964 | 0.2% |
num_group1
Real number (ℝ)
ZEROS
 
Distinct | 20 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 4.551473281 |
Minimum | 0 |
---|---|
Maximum | 19 |
Zeros | 2276307 |
Zeros (%) | 16.2% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 107.4 MiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 1 |
median | 3 |
Q3 | 7 |
95-th percentile | 14 |
Maximum | 19 |
Range | 19 |
Interquartile range (IQR) | 6 |
Descriptive statistics
Standard deviation | 4.390422118 |
---|---|
Coefficient of variation (CV) | 0.9646155974 |
Kurtosis | 0.8367524554 |
Mean | 4.551473281 |
Median Absolute Deviation (MAD) | 2 |
Skewness | 1.183532817 |
Sum | 64064203 |
Variance | 19.27580637 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
0 | 2276307 | |
1 | 1976391 | |
2 | 1685107 | |
3 | 1421703 | |
4 | 1190551 | |
5 | 990136 | |
6 | 820840 | 5.8% |
7 | 680934 | 4.8% |
8 | 563599 | 4.0% |
9 | 467994 | 3.3% |
Other values (10) | 2001925 |
Value | Count | Frequency (%) |
0 | 2276307 | |
1 | 1976391 | |
2 | 1685107 | |
3 | 1421703 | |
4 | 1190551 |
Value | Count | Frequency (%) |
19 | 84591 | |
18 | 98875 | |
17 | 115979 | |
16 | 136458 | |
15 | 161700 |
num_group2
Real number (ℝ)
ZEROS
 
Distinct | 12 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 0.7403147756 |
Minimum | 0 |
---|---|
Maximum | 11 |
Zeros | 6525978 |
Zeros (%) | 46.4% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 107.4 MiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0 |
median | 1 |
Q3 | 1 |
95-th percentile | 2 |
Maximum | 11 |
Range | 11 |
Interquartile range (IQR) | 1 |
Descriptive statistics
Standard deviation | 0.8025843052 |
---|---|
Coefficient of variation (CV) | 1.084112234 |
Kurtosis | -0.2042270868 |
Mean | 0.7403147756 |
Median Absolute Deviation (MAD) | 1 |
Skewness | 0.7684751112 |
Sum | 10420291 |
Variance | 0.6441415669 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
0 | 6525978 | |
1 | 4976173 | |
2 | 2286939 | 16.2% |
3 | 276223 | 2.0% |
4 | 9394 | 0.1% |
5 | 717 | < 0.1% |
6 | 46 | < 0.1% |
7 | 9 | < 0.1% |
8 | 4 | < 0.1% |
9 | 2 | < 0.1% |
Other values (2) | 2 | < 0.1% |
Value | Count | Frequency (%) |
0 | 6525978 | |
1 | 4976173 | |
2 | 2286939 | 16.2% |
3 | 276223 | 2.0% |
4 | 9394 | 0.1% |
Value | Count | Frequency (%) |
11 | 1 | < 0.1% |
10 | 1 | < 0.1% |
9 | 2 | < 0.1% |
8 | 4 | |
7 | 9 |