Dataset statistics
| Number of variables | 6 |
|---|---|
| Number of observations | 14075487 |
| Missing cells | 16236709 |
| Missing cells (%) | 19.2% |
| Total size in memory | 644.3 MiB |
| Average record size in memory | 48.0 B |
Variable types
| Numeric | 3 |
|---|---|
| Text | 3 |
conts_type_509L has 2394056 (17.0%) missing values | Missing |
credacc_cards_status_52L has 13733404 (97.6%) missing values | Missing |
num_group1 has 2276307 (16.2%) zeros | Zeros |
num_group2 has 6525978 (46.4%) zeros | Zeros |
Reproduction
| Analysis started | 2024-02-13 19:37:46.943271 |
|---|---|
| Analysis finished | 2024-02-13 19:38:08.967824 |
| Duration | 22.02 seconds |
| Software version | ydata-profiling vv4.6.4 |
| Download configuration | config.json |
case_id
Real number (ℝ)
| Distinct | 1221522 |
|---|---|
| Distinct (%) | 8.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1454198.017 |
| Minimum | 2 |
|---|---|
| Maximum | 2703454 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 107.4 MiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 144108 |
| Q1 | 1237440 |
| median | 1575626 |
| Q3 | 1861301 |
| 95-th percentile | 2657763 |
| Maximum | 2703454 |
| Range | 2703452 |
| Interquartile range (IQR) | 623861 |
Descriptive statistics
| Standard deviation | 787508.3638 |
|---|---|
| Coefficient of variation (CV) | 0.5415413545 |
| Kurtosis | -0.7224045076 |
| Mean | 1454198.017 |
| Median Absolute Deviation (MAD) | 307009 |
| Skewness | -0.3252931327 |
| Sum | 2.046854528 × 1013 |
| Variance | 6.201694231 × 1011 |
| Monotonicity | Increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1246034 | 79 | < 0.1% |
| 1390922 | 79 | < 0.1% |
| 1344039 | 78 | < 0.1% |
| 1585694 | 77 | < 0.1% |
| 2647430 | 77 | < 0.1% |
| 1495888 | 77 | < 0.1% |
| 1260644 | 76 | < 0.1% |
| 1416022 | 76 | < 0.1% |
| 158480 | 74 | < 0.1% |
| 1393522 | 74 | < 0.1% |
| Other values (1221512) | 14074720 |
| Value | Count | Frequency (%) |
| 2 | 4 | |
| 3 | 3 | |
| 4 | 2 | < 0.1% |
| 5 | 1 | < 0.1% |
| 6 | 5 |
| Value | Count | Frequency (%) |
| 2703454 | 6 | < 0.1% |
| 2703453 | 24 | |
| 2703452 | 6 | < 0.1% |
| 2703451 | 17 | |
| 2703450 | 29 |
| Distinct | 9 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 109249 |
| Missing (%) | 0.8% |
| Memory size | 107.4 MiB |
Length
| Max length | 11 |
|---|---|
| Median length | 8 |
| Mean length | 8.001376677 |
| Min length | 8 |
Characters and Unicode
| Total characters | 111749131 |
|---|---|
| Distinct characters | 13 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | a55475b1 |
|---|---|
| 2nd row | a55475b1 |
| 3rd row | a55475b1 |
| 4th row | a55475b1 |
| 5th row | a55475b1 |
| Value | Count | Frequency (%) |
| a55475b1 | 13958443 | |
| p33_145_161 | 3309 | < 0.1% |
| p201_63_60 | 2960 | < 0.1% |
| p19_60_110 | 1135 | < 0.1% |
| p23_105_103 | 117 | < 0.1% |
| p133_119_56 | 100 | < 0.1% |
| p41_107_150 | 92 | < 0.1% |
| p17_56_144 | 63 | < 0.1% |
| p127_74_114 | 19 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 5 | 41879010 | |
| 1 | 13975728 | 12.5% |
| 4 | 13962008 | 12.5% |
| 7 | 13958636 | 12.5% |
| a | 13958443 | 12.5% |
| b | 13958443 | 12.5% |
| _ | 15590 | < 0.1% |
| 6 | 10527 | < 0.1% |
| 3 | 10012 | < 0.1% |
| 0 | 8608 | < 0.1% |
| Other values (3) | 12126 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 83808860 | |
| Lowercase Letter | 27916886 | 25.0% |
| Connector Punctuation | 15590 | < 0.1% |
| Uppercase Letter | 7795 | < 0.1% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 5 | 41879010 | |
| 1 | 13975728 | 16.7% |
| 4 | 13962008 | 16.7% |
| 7 | 13958636 | 16.7% |
| 6 | 10527 | < 0.1% |
| 3 | 10012 | < 0.1% |
| 0 | 8608 | < 0.1% |
| 2 | 3096 | < 0.1% |
| 9 | 1235 | < 0.1% |
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 13958443 | |
| b | 13958443 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 15590 |
Uppercase Letter
| Value | Count | Frequency (%) |
| P | 7795 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 83824450 | |
| Latin | 27924681 | 25.0% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 5 | 41879010 | |
| 1 | 13975728 | 16.7% |
| 4 | 13962008 | 16.7% |
| 7 | 13958636 | 16.7% |
| _ | 15590 | < 0.1% |
| 6 | 10527 | < 0.1% |
| 3 | 10012 | < 0.1% |
| 0 | 8608 | < 0.1% |
| 2 | 3096 | < 0.1% |
| 9 | 1235 | < 0.1% |
Latin
| Value | Count | Frequency (%) |
| a | 13958443 | |
| b | 13958443 | |
| P | 7795 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 111749131 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 5 | 41879010 | |
| 1 | 13975728 | 12.5% |
| 4 | 13962008 | 12.5% |
| 7 | 13958636 | 12.5% |
| a | 13958443 | 12.5% |
| b | 13958443 | 12.5% |
| _ | 15590 | < 0.1% |
| 6 | 10527 | < 0.1% |
| 3 | 10012 | < 0.1% |
| 0 | 8608 | < 0.1% |
| Other values (3) | 12126 | < 0.1% |
conts_type_509L
Text
MISSING 
| Distinct | 9 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 2394056 |
| Missing (%) | 17.0% |
| Memory size | 107.4 MiB |
Length
| Max length | 17 |
|---|---|
| Median length | 14 |
| Mean length | 12.71871049 |
| Min length | 5 |
Characters and Unicode
| Total characters | 148572739 |
|---|---|
| Distinct characters | 20 |
| Distinct categories | 2 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | EMPLOYMENT_PHONE |
|---|---|
| 2nd row | EMPLOYMENT_PHONE |
| 3rd row | PRIMARY_MOBILE |
| 4th row | PRIMARY_MOBILE |
| 5th row | PRIMARY_MOBILE |
| Value | Count | Frequency (%) |
| primary_mobile | 6294386 | |
| home_phone | 2472180 | 21.2% |
| employment_phone | 1679417 | 14.4% |
| phone | 924958 | 7.9% |
| primary_email | 257126 | 2.2% |
| alternative_phone | 37833 | 0.3% |
| secondary_mobile | 15504 | 0.1% |
| 25 | < 0.1% | |
| skype | 2 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| M | 18949542 | |
| E | 17603590 | |
| O | 15591379 | |
| P | 13345369 | |
| I | 13156361 | |
| R | 13156361 | |
| _ | 10756446 | |
| L | 8284266 | 5.6% |
| Y | 8246435 | 5.6% |
| H | 7586593 | 5.1% |
| Other values (10) | 21896397 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 137816293 | |
| Connector Punctuation | 10756446 | 7.2% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 18949542 | |
| E | 17603590 | |
| O | 15591379 | |
| P | 13345369 | |
| I | 13156361 | |
| R | 13156361 | |
| L | 8284266 | |
| Y | 8246435 | |
| H | 7586593 | |
| A | 6899858 | 5.0% |
| Other values (9) | 14996539 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 10756446 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 137816293 | |
| Common | 10756446 | 7.2% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| M | 18949542 | |
| E | 17603590 | |
| O | 15591379 | |
| P | 13345369 | |
| I | 13156361 | |
| R | 13156361 | |
| L | 8284266 | |
| Y | 8246435 | |
| H | 7586593 | |
| A | 6899858 | 5.0% |
| Other values (9) | 14996539 |
Common
| Value | Count | Frequency (%) |
| _ | 10756446 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 148572739 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| M | 18949542 | |
| E | 17603590 | |
| O | 15591379 | |
| P | 13345369 | |
| I | 13156361 | |
| R | 13156361 | |
| _ | 10756446 | |
| L | 8284266 | 5.6% |
| Y | 8246435 | 5.6% |
| H | 7586593 | 5.1% |
| Other values (10) | 21896397 |
MISSING 
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 13733404 |
| Missing (%) | 97.6% |
| Memory size | 107.4 MiB |
Length
| Max length | 11 |
|---|---|
| Median length | 9 |
| Mean length | 7.840579625 |
| Min length | 6 |
Characters and Unicode
| Total characters | 2682129 |
|---|---|
| Distinct characters | 17 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | CANCELLED |
|---|---|
| 2nd row | CANCELLED |
| 3rd row | CANCELLED |
| 4th row | CANCELLED |
| 5th row | INACTIVE |
| Value | Count | Frequency (%) |
| cancelled | 167031 | |
| active | 109642 | |
| inactive | 62968 | 18.4% |
| blocked | 2098 | 0.6% |
| renewed | 304 | 0.1% |
| unconfirmed | 40 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| E | 509722 | |
| C | 508810 | |
| A | 339641 | |
| L | 336160 | |
| I | 235618 | |
| N | 230383 | |
| V | 172610 | 6.4% |
| T | 172610 | 6.4% |
| D | 169473 | 6.3% |
| O | 2138 | 0.1% |
| Other values (7) | 4964 | 0.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 2682129 |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| E | 509722 | |
| C | 508810 | |
| A | 339641 | |
| L | 336160 | |
| I | 235618 | |
| N | 230383 | |
| V | 172610 | 6.4% |
| T | 172610 | 6.4% |
| D | 169473 | 6.3% |
| O | 2138 | 0.1% |
| Other values (7) | 4964 | 0.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 2682129 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| E | 509722 | |
| C | 508810 | |
| A | 339641 | |
| L | 336160 | |
| I | 235618 | |
| N | 230383 | |
| V | 172610 | 6.4% |
| T | 172610 | 6.4% |
| D | 169473 | 6.3% |
| O | 2138 | 0.1% |
| Other values (7) | 4964 | 0.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2682129 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| E | 509722 | |
| C | 508810 | |
| A | 339641 | |
| L | 336160 | |
| I | 235618 | |
| N | 230383 | |
| V | 172610 | 6.4% |
| T | 172610 | 6.4% |
| D | 169473 | 6.3% |
| O | 2138 | 0.1% |
| Other values (7) | 4964 | 0.2% |
num_group1
Real number (ℝ)
ZEROS 
| Distinct | 20 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.551473281 |
| Minimum | 0 |
|---|---|
| Maximum | 19 |
| Zeros | 2276307 |
| Zeros (%) | 16.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 107.4 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 3 |
| Q3 | 7 |
| 95-th percentile | 14 |
| Maximum | 19 |
| Range | 19 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 4.390422118 |
|---|---|
| Coefficient of variation (CV) | 0.9646155974 |
| Kurtosis | 0.8367524554 |
| Mean | 4.551473281 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 1.183532817 |
| Sum | 64064203 |
| Variance | 19.27580637 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=20)
| Value | Count | Frequency (%) |
| 0 | 2276307 | |
| 1 | 1976391 | |
| 2 | 1685107 | |
| 3 | 1421703 | |
| 4 | 1190551 | |
| 5 | 990136 | |
| 6 | 820840 | 5.8% |
| 7 | 680934 | 4.8% |
| 8 | 563599 | 4.0% |
| 9 | 467994 | 3.3% |
| Other values (10) | 2001925 |
| Value | Count | Frequency (%) |
| 0 | 2276307 | |
| 1 | 1976391 | |
| 2 | 1685107 | |
| 3 | 1421703 | |
| 4 | 1190551 |
| Value | Count | Frequency (%) |
| 19 | 84591 | |
| 18 | 98875 | |
| 17 | 115979 | |
| 16 | 136458 | |
| 15 | 161700 |
num_group2
Real number (ℝ)
ZEROS 
| Distinct | 12 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.7403147756 |
| Minimum | 0 |
|---|---|
| Maximum | 11 |
| Zeros | 6525978 |
| Zeros (%) | 46.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 107.4 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 2 |
| Maximum | 11 |
| Range | 11 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 0.8025843052 |
|---|---|
| Coefficient of variation (CV) | 1.084112234 |
| Kurtosis | -0.2042270868 |
| Mean | 0.7403147756 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.7684751112 |
| Sum | 10420291 |
| Variance | 0.6441415669 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=12)
| Value | Count | Frequency (%) |
| 0 | 6525978 | |
| 1 | 4976173 | |
| 2 | 2286939 | 16.2% |
| 3 | 276223 | 2.0% |
| 4 | 9394 | 0.1% |
| 5 | 717 | < 0.1% |
| 6 | 46 | < 0.1% |
| 7 | 9 | < 0.1% |
| 8 | 4 | < 0.1% |
| 9 | 2 | < 0.1% |
| Other values (2) | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 6525978 | |
| 1 | 4976173 | |
| 2 | 2286939 | 16.2% |
| 3 | 276223 | 2.0% |
| 4 | 9394 | 0.1% |
| Value | Count | Frequency (%) |
| 11 | 1 | < 0.1% |
| 10 | 1 | < 0.1% |
| 9 | 2 | < 0.1% |
| 8 | 4 | |
| 7 | 9 |