Dataset statistics
| Number of variables | 11 |
|---|---|
| Number of observations | 1643410 |
| Missing cells | 4828073 |
| Missing cells (%) | 26.7% |
| Total size in memory | 137.9 MiB |
| Average record size in memory | 88.0 B |
Variable types
| Numeric | 3 |
|---|---|
| Text | 8 |
addres_role_871L has 1575736 (95.9%) missing values | Missing |
empls_employedfrom_796D has 1637653 (99.6%) missing values | Missing |
relatedpersons_role_762T has 1614684 (98.3%) missing values | Missing |
num_group1 has 1463928 (89.1%) zeros | Zeros |
num_group2 has 1561280 (95.0%) zeros | Zeros |
Reproduction
| Analysis started | 2024-02-13 19:54:21.492391 |
|---|---|
| Analysis finished | 2024-02-13 19:54:26.576244 |
| Duration | 5.08 seconds |
| Software version | ydata-profiling vv4.6.4 |
| Download configuration | config.json |
case_id
Real number (ℝ)
| Distinct | 1435105 |
|---|---|
| Distinct (%) | 87.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1264005.015 |
| Minimum | 5 |
|---|---|
| Maximum | 2703454 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 12.5 MiB |
Quantile statistics
| Minimum | 5 |
|---|---|
| 5-th percentile | 125222.45 |
| Q1 | 761958.25 |
| median | 1323515.5 |
| Q3 | 1695936.75 |
| 95-th percentile | 2622667.55 |
| Maximum | 2703454 |
| Range | 2703449 |
| Interquartile range (IQR) | 933978.5 |
Descriptive statistics
| Standard deviation | 699545.4755 |
|---|---|
| Coefficient of variation (CV) | 0.5534356803 |
| Kurtosis | -0.458723444 |
| Mean | 1264005.015 |
| Median Absolute Deviation (MAD) | 474319 |
| Skewness | 0.2388533482 |
| Sum | 2.077278482 × 1012 |
| Variance | 4.893638723 × 1011 |
| Monotonicity | Increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 140528 | 34 | < 0.1% |
| 1336868 | 34 | < 0.1% |
| 203731 | 32 | < 0.1% |
| 169916 | 31 | < 0.1% |
| 259366 | 31 | < 0.1% |
| 2631616 | 28 | < 0.1% |
| 254957 | 26 | < 0.1% |
| 648623 | 24 | < 0.1% |
| 1427304 | 23 | < 0.1% |
| 971833 | 23 | < 0.1% |
| Other values (1435095) | 1643124 |
| Value | Count | Frequency (%) |
| 5 | 1 | < 0.1% |
| 6 | 8 | |
| 7 | 1 | < 0.1% |
| 8 | 1 | < 0.1% |
| 9 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 2703454 | 1 | |
| 2703453 | 1 | |
| 2703452 | 1 | |
| 2703451 | 1 | |
| 2703450 | 1 |
| Distinct | 508 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 12.5 MiB |
Length
| Max length | 12 |
|---|---|
| Median length | 8 |
| Mean length | 8.092886133 |
| Min length | 7 |
Characters and Unicode
| Total characters | 13299930 |
|---|---|
| Distinct characters | 20 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 56 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | a55475b1 |
|---|---|
| 2nd row | P55_110_32 |
| 3rd row | P55_110_32 |
| 4th row | P204_92_178 |
| 5th row | P191_109_75 |
| Value | Count | Frequency (%) |
| a55475b1 | 1582872 | |
| p125_48_164 | 9669 | 0.6% |
| p155_139_77 | 4093 | 0.2% |
| p114_74_190 | 2552 | 0.2% |
| p111_2_12 | 2468 | 0.2% |
| p215_163_136 | 1764 | 0.1% |
| p88_3_41 | 1537 | 0.1% |
| p37_84_33 | 1249 | 0.1% |
| p55_110_32 | 1163 | 0.1% |
| p107_131_181 | 1058 | 0.1% |
| Other values (498) | 34985 | 2.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 5 | 4789528 | |
| 1 | 1717341 | 12.9% |
| 4 | 1632374 | 12.3% |
| 7 | 1621337 | 12.2% |
| a | 1582872 | 11.9% |
| b | 1582872 | 11.9% |
| _ | 121076 | 0.9% |
| P | 60534 | 0.5% |
| 8 | 39380 | 0.3% |
| 2 | 39202 | 0.3% |
| Other values (10) | 113414 | 0.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 9952558 | |
| Lowercase Letter | 3165758 | 23.8% |
| Connector Punctuation | 121076 | 0.9% |
| Uppercase Letter | 60538 | 0.5% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 5 | 4789528 | |
| 1 | 1717341 | 17.3% |
| 4 | 1632374 | 16.4% |
| 7 | 1621337 | 16.3% |
| 8 | 39380 | 0.4% |
| 2 | 39202 | 0.4% |
| 6 | 34765 | 0.3% |
| 3 | 31907 | 0.3% |
| 9 | 24574 | 0.2% |
| 0 | 22150 | 0.2% |
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 1582872 | |
| b | 1582872 | |
| e | 6 | < 0.1% |
| k | 2 | < 0.1% |
| p | 2 | < 0.1% |
| t | 2 | < 0.1% |
| h | 2 | < 0.1% |
Uppercase Letter
| Value | Count | Frequency (%) |
| P | 60534 | |
| Q | 4 | < 0.1% |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 121076 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 10073634 | |
| Latin | 3226296 | 24.3% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 5 | 4789528 | |
| 1 | 1717341 | 17.0% |
| 4 | 1632374 | 16.2% |
| 7 | 1621337 | 16.1% |
| _ | 121076 | 1.2% |
| 8 | 39380 | 0.4% |
| 2 | 39202 | 0.4% |
| 6 | 34765 | 0.3% |
| 3 | 31907 | 0.3% |
| 9 | 24574 | 0.2% |
Latin
| Value | Count | Frequency (%) |
| a | 1582872 | |
| b | 1582872 | |
| P | 60534 | 1.9% |
| e | 6 | < 0.1% |
| Q | 4 | < 0.1% |
| k | 2 | < 0.1% |
| p | 2 | < 0.1% |
| t | 2 | < 0.1% |
| h | 2 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 13299930 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 5 | 4789528 | |
| 1 | 1717341 | 12.9% |
| 4 | 1632374 | 12.3% |
| 7 | 1621337 | 12.2% |
| a | 1582872 | 11.9% |
| b | 1582872 | 11.9% |
| _ | 121076 | 0.9% |
| P | 60534 | 0.5% |
| 8 | 39380 | 0.3% |
| 2 | 39202 | 0.3% |
| Other values (10) | 113414 | 0.9% |
addres_role_871L
Text
MISSING 
| Distinct | 8 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 1575736 |
| Missing (%) | 95.9% |
| Memory size | 12.5 MiB |
Length
| Max length | 21 |
|---|---|
| Median length | 9 |
| Mean length | 8.374678606 |
| Min length | 7 |
Characters and Unicode
| Total characters | 566748 |
|---|---|
| Distinct characters | 20 |
| Distinct categories | 2 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | CONTACT |
|---|---|
| 2nd row | PERMANENT |
| 3rd row | CONTACT |
| 4th row | CONTACT |
| 5th row | CONTACT |
| Value | Count | Frequency (%) |
| permanent | 37338 | |
| contact | 21918 | |
| temporary | 7193 | 10.6% |
| registered | 1187 | 1.8% |
| migrated_registration | 19 | < 0.1% |
| migrated_living | 13 | < 0.1% |
| migrated_work | 5 | < 0.1% |
| migrated_other | 1 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| N | 96626 | |
| T | 89631 | |
| E | 85488 | |
| A | 66506 | |
| R | 54180 | |
| M | 44569 | |
| P | 44531 | |
| C | 43836 | |
| O | 29136 | 5.1% |
| Y | 7193 | 1.3% |
| Other values (10) | 5052 | 0.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 566710 | |
| Connector Punctuation | 38 | < 0.1% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| N | 96626 | |
| T | 89631 | |
| E | 85488 | |
| A | 66506 | |
| R | 54180 | |
| M | 44569 | |
| P | 44531 | |
| C | 43836 | |
| O | 29136 | 5.1% |
| Y | 7193 | 1.3% |
| Other values (9) | 5014 | 0.9% |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 38 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 566710 | |
| Common | 38 | < 0.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| N | 96626 | |
| T | 89631 | |
| E | 85488 | |
| A | 66506 | |
| R | 54180 | |
| M | 44569 | |
| P | 44531 | |
| C | 43836 | |
| O | 29136 | 5.1% |
| Y | 7193 | 1.3% |
| Other values (9) | 5014 | 0.9% |
Common
| Value | Count | Frequency (%) |
| _ | 38 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 566748 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| N | 96626 | |
| T | 89631 | |
| E | 85488 | |
| A | 66506 | |
| R | 54180 | |
| M | 44569 | |
| P | 44531 | |
| C | 43836 | |
| O | 29136 | 5.1% |
| Y | 7193 | 1.3% |
| Other values (10) | 5052 | 0.9% |
addres_zip_823M
Text
| Distinct | 2027 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 12.5 MiB |
Length
| Max length | 13 |
|---|---|
| Median length | 8 |
| Mean length | 8.106648371 |
| Min length | 7 |
Characters and Unicode
| Total characters | 13322547 |
|---|---|
| Distinct characters | 28 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 137 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | a55475b1 |
|---|---|
| 2nd row | P10_68_40 |
| 3rd row | P10_68_40 |
| 4th row | P65_136_169 |
| 5th row | P10_68_40 |
| Value | Count | Frequency (%) |
| a55475b1 | 1576370 | |
| p161_14_174 | 5968 | 0.4% |
| p144_138_111 | 3405 | 0.2% |
| p46_103_143 | 3296 | 0.2% |
| p85_138_173 | 2371 | 0.1% |
| p118_161_181 | 2132 | 0.1% |
| p11_15_81 | 2124 | 0.1% |
| p212_16_169 | 1980 | 0.1% |
| p133_34_165 | 1612 | 0.1% |
| p157_35_170 | 1536 | 0.1% |
| Other values (2017) | 42616 | 2.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| 5 | 4764495 | |
| 1 | 1746591 | 13.1% |
| 4 | 1635283 | 12.3% |
| 7 | 1616195 | 12.1% |
| a | 1576377 | 11.8% |
| b | 1576370 | 11.8% |
| _ | 134080 | 1.0% |
| P | 67013 | 0.5% |
| 6 | 44505 | 0.3% |
| 8 | 44256 | 0.3% |
| Other values (18) | 117382 | 0.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 9968574 | |
| Lowercase Letter | 3152853 | 23.7% |
| Connector Punctuation | 134080 | 1.0% |
| Uppercase Letter | 67040 | 0.5% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 1576377 | |
| b | 1576370 | |
| e | 30 | < 0.1% |
| t | 14 | < 0.1% |
| r | 14 | < 0.1% |
| o | 11 | < 0.1% |
| m | 9 | < 0.1% |
| l | 5 | < 0.1% |
| i | 5 | < 0.1% |
| z | 5 | < 0.1% |
| Other values (5) | 13 | < 0.1% |
Decimal Number
| Value | Count | Frequency (%) |
| 5 | 4764495 | |
| 1 | 1746591 | 17.5% |
| 4 | 1635283 | 16.4% |
| 7 | 1616195 | 16.2% |
| 6 | 44505 | 0.4% |
| 8 | 44256 | 0.4% |
| 3 | 42240 | 0.4% |
| 2 | 27427 | 0.3% |
| 9 | 24024 | 0.2% |
| 0 | 23558 | 0.2% |
Uppercase Letter
| Value | Count | Frequency (%) |
| P | 67013 | |
| Q | 27 | < 0.1% |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 134080 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 10102654 | |
| Latin | 3219893 | 24.2% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 1576377 | |
| b | 1576370 | |
| P | 67013 | 2.1% |
| e | 30 | < 0.1% |
| Q | 27 | < 0.1% |
| t | 14 | < 0.1% |
| r | 14 | < 0.1% |
| o | 11 | < 0.1% |
| m | 9 | < 0.1% |
| l | 5 | < 0.1% |
| Other values (7) | 23 | < 0.1% |
Common
| Value | Count | Frequency (%) |
| 5 | 4764495 | |
| 1 | 1746591 | 17.3% |
| 4 | 1635283 | 16.2% |
| 7 | 1616195 | 16.0% |
| _ | 134080 | 1.3% |
| 6 | 44505 | 0.4% |
| 8 | 44256 | 0.4% |
| 3 | 42240 | 0.4% |
| 2 | 27427 | 0.3% |
| 9 | 24024 | 0.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 13322547 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 5 | 4764495 | |
| 1 | 1746591 | 13.1% |
| 4 | 1635283 | 12.3% |
| 7 | 1616195 | 12.1% |
| a | 1576377 | 11.8% |
| b | 1576370 | 11.8% |
| _ | 134080 | 1.0% |
| P | 67013 | 0.5% |
| 6 | 44505 | 0.3% |
| 8 | 44256 | 0.3% |
| Other values (18) | 117382 | 0.9% |
conts_role_79M
Text
| Distinct | 11 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 12.5 MiB |
Length
| Max length | 12 |
|---|---|
| Median length | 8 |
| Mean length | 8.077852149 |
| Min length | 8 |
Characters and Unicode
| Total characters | 13275223 |
|---|---|
| Distinct characters | 14 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | a55475b1 |
|---|---|
| 2nd row | P38_92_157 |
| 3rd row | a55475b1 |
| 4th row | P38_92_157 |
| 5th row | P7_147_157 |
| Value | Count | Frequency (%) |
| a55475b1 | 1587829 | |
| p38_92_157 | 29333 | 1.8% |
| p177_137_98 | 9179 | 0.6% |
| p7_147_157 | 9120 | 0.6% |
| p125_105_50 | 4962 | 0.3% |
| p115_147_77 | 1231 | 0.1% |
| p125_14_176 | 1088 | 0.1% |
| p58_79_51 | 307 | < 0.1% |
| p124_137_181 | 271 | < 0.1% |
| p206_38_166 | 86 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 5 | 4819759 | |
| 7 | 1677418 | 12.6% |
| 1 | 1672126 | 12.6% |
| 4 | 1599547 | 12.0% |
| a | 1587829 | 12.0% |
| b | 1587829 | 12.0% |
| _ | 111162 | 0.8% |
| P | 55581 | 0.4% |
| 8 | 39176 | 0.3% |
| 3 | 38873 | 0.3% |
| Other values (4) | 85923 | 0.6% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 9932822 | |
| Lowercase Letter | 3175658 | 23.9% |
| Connector Punctuation | 111162 | 0.8% |
| Uppercase Letter | 55581 | 0.4% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 5 | 4819759 | |
| 7 | 1677418 | 16.9% |
| 1 | 1672126 | 16.8% |
| 4 | 1599547 | 16.1% |
| 8 | 39176 | 0.4% |
| 3 | 38873 | 0.4% |
| 9 | 38823 | 0.4% |
| 2 | 35744 | 0.4% |
| 0 | 10010 | 0.1% |
| 6 | 1346 | < 0.1% |
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 1587829 | |
| b | 1587829 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 111162 |
Uppercase Letter
| Value | Count | Frequency (%) |
| P | 55581 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 10043984 | |
| Latin | 3231239 | 24.3% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 5 | 4819759 | |
| 7 | 1677418 | 16.7% |
| 1 | 1672126 | 16.6% |
| 4 | 1599547 | 15.9% |
| _ | 111162 | 1.1% |
| 8 | 39176 | 0.4% |
| 3 | 38873 | 0.4% |
| 9 | 38823 | 0.4% |
| 2 | 35744 | 0.4% |
| 0 | 10010 | 0.1% |
Latin
| Value | Count | Frequency (%) |
| a | 1587829 | |
| b | 1587829 | |
| P | 55581 | 1.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 13275223 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 5 | 4819759 | |
| 7 | 1677418 | 12.6% |
| 1 | 1672126 | 12.6% |
| 4 | 1599547 | 12.0% |
| a | 1587829 | 12.0% |
| b | 1587829 | 12.0% |
| _ | 111162 | 0.8% |
| P | 55581 | 0.4% |
| 8 | 39176 | 0.3% |
| 3 | 38873 | 0.3% |
| Other values (4) | 85923 | 0.6% |
| Distinct | 10 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 12.5 MiB |
Length
| Max length | 11 |
|---|---|
| Median length | 8 |
| Mean length | 8.041208828 |
| Min length | 8 |
Characters and Unicode
| Total characters | 13215003 |
|---|---|
| Distinct characters | 14 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | a55475b1 |
|---|---|
| 2nd row | P164_110_33 |
| 3rd row | a55475b1 |
| 4th row | P164_110_33 |
| 5th row | a55475b1 |
| Value | Count | Frequency (%) |
| a55475b1 | 1618686 | |
| p22_131_138 | 9310 | 0.6% |
| p164_110_33 | 6527 | 0.4% |
| p28_32_178 | 6291 | 0.4% |
| p148_57_109 | 1871 | 0.1% |
| p112_86_147 | 468 | < 0.1% |
| p191_80_124 | 114 | < 0.1% |
| p7_47_145 | 79 | < 0.1% |
| p164_122_65 | 61 | < 0.1% |
| p82_144_169 | 3 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 5 | 4858069 | |
| 1 | 1678183 | 12.7% |
| 4 | 1627891 | 12.3% |
| 7 | 1627474 | 12.3% |
| a | 1618686 | 12.2% |
| b | 1618686 | 12.2% |
| _ | 49448 | 0.4% |
| 3 | 37965 | 0.3% |
| 2 | 31909 | 0.2% |
| P | 24724 | 0.2% |
| Other values (4) | 41968 | 0.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 9903459 | |
| Lowercase Letter | 3237372 | 24.5% |
| Connector Punctuation | 49448 | 0.4% |
| Uppercase Letter | 24724 | 0.2% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 5 | 4858069 | |
| 1 | 1678183 | 16.9% |
| 4 | 1627891 | 16.4% |
| 7 | 1627474 | 16.4% |
| 3 | 37965 | 0.4% |
| 2 | 31909 | 0.3% |
| 8 | 24348 | 0.2% |
| 0 | 8512 | 0.1% |
| 6 | 7120 | 0.1% |
| 9 | 1988 | < 0.1% |
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 1618686 | |
| b | 1618686 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 49448 |
Uppercase Letter
| Value | Count | Frequency (%) |
| P | 24724 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 9952907 | |
| Latin | 3262096 | 24.7% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 5 | 4858069 | |
| 1 | 1678183 | 16.9% |
| 4 | 1627891 | 16.4% |
| 7 | 1627474 | 16.4% |
| _ | 49448 | 0.5% |
| 3 | 37965 | 0.4% |
| 2 | 31909 | 0.3% |
| 8 | 24348 | 0.2% |
| 0 | 8512 | 0.1% |
| 6 | 7120 | 0.1% |
Latin
| Value | Count | Frequency (%) |
| a | 1618686 | |
| b | 1618686 | |
| P | 24724 | 0.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 13215003 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 5 | 4858069 | |
| 1 | 1678183 | 12.7% |
| 4 | 1627891 | 12.3% |
| 7 | 1627474 | 12.3% |
| a | 1618686 | 12.2% |
| b | 1618686 | 12.2% |
| _ | 49448 | 0.4% |
| 3 | 37965 | 0.3% |
| 2 | 31909 | 0.2% |
| P | 24724 | 0.2% |
| Other values (4) | 41968 | 0.3% |
MISSING 
| Distinct | 801 |
|---|---|
| Distinct (%) | 13.9% |
| Missing | 1637653 |
| Missing (%) | 99.6% |
| Memory size | 12.5 MiB |
Length
| Max length | 10 |
|---|---|
| Median length | 10 |
| Mean length | 10 |
| Min length | 10 |
Characters and Unicode
| Total characters | 57570 |
|---|---|
| Distinct characters | 11 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 453 ? |
|---|---|
| Unique (%) | 7.9% |
Sample
| 1st row | 2018-06-15 |
|---|---|
| 2nd row | 2011-08-15 |
| 3rd row | 1994-05-15 |
| 4th row | 2013-01-15 |
| 5th row | 2014-09-15 |
| Value | Count | Frequency (%) |
| 2017-01-15 | 228 | 4.0% |
| 2016-01-15 | 196 | 3.4% |
| 2015-01-15 | 181 | 3.1% |
| 2018-01-15 | 125 | 2.2% |
| 2013-01-15 | 113 | 2.0% |
| 2014-01-15 | 106 | 1.8% |
| 2012-01-15 | 102 | 1.8% |
| 2007-09-15 | 71 | 1.2% |
| 2010-01-15 | 69 | 1.2% |
| 2007-01-15 | 56 | 1.0% |
| Other values (791) | 4510 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 13459 | |
| 1 | 12287 | |
| - | 11514 | |
| 2 | 6876 | |
| 5 | 6417 | |
| 9 | 1699 | 3.0% |
| 6 | 1289 | 2.2% |
| 7 | 1244 | 2.2% |
| 8 | 1163 | 2.0% |
| 4 | 866 | 1.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 46056 | |
| Dash Punctuation | 11514 | 20.0% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 13459 | |
| 1 | 12287 | |
| 2 | 6876 | |
| 5 | 6417 | |
| 9 | 1699 | 3.7% |
| 6 | 1289 | 2.8% |
| 7 | 1244 | 2.7% |
| 8 | 1163 | 2.5% |
| 4 | 866 | 1.9% |
| 3 | 756 | 1.6% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 11514 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 57570 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 13459 | |
| 1 | 12287 | |
| - | 11514 | |
| 2 | 6876 | |
| 5 | 6417 | |
| 9 | 1699 | 3.0% |
| 6 | 1289 | 2.2% |
| 7 | 1244 | 2.2% |
| 8 | 1163 | 2.0% |
| 4 | 866 | 1.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 57570 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 13459 | |
| 1 | 12287 | |
| - | 11514 | |
| 2 | 6876 | |
| 5 | 6417 | |
| 9 | 1699 | 3.0% |
| 6 | 1289 | 2.2% |
| 7 | 1244 | 2.2% |
| 8 | 1163 | 2.0% |
| 4 | 866 | 1.5% |
| Distinct | 7153 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 12.5 MiB |
Length
| Max length | 16 |
|---|---|
| Median length | 8 |
| Mean length | 8.011236393 |
| Min length | 7 |
Characters and Unicode
| Total characters | 13165746 |
|---|---|
| Distinct characters | 34 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 6991 ? |
|---|---|
| Unique (%) | 0.4% |
Sample
| 1st row | a55475b1 |
|---|---|
| 2nd row | a55475b1 |
| 3rd row | a55475b1 |
| 4th row | a55475b1 |
| 5th row | a55475b1 |
| Value | Count | Frequency (%) |
| a55475b1 | 1636009 | |
| p114_118_163 | 17 | < 0.1% |
| p179_55_175 | 7 | < 0.1% |
| p26_112_122 | 6 | < 0.1% |
| p133_138_183 | 6 | < 0.1% |
| p9_69_94 | 6 | < 0.1% |
| p74_31_177 | 6 | < 0.1% |
| p38_11_59 | 6 | < 0.1% |
| p149_35_169 | 5 | < 0.1% |
| p204_145_180 | 5 | < 0.1% |
| Other values (7143) | 7337 | 0.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| 5 | 4912435 | |
| 1 | 1651792 | 12.5% |
| 7 | 1641239 | 12.5% |
| 4 | 1640352 | 12.5% |
| a | 1636016 | 12.4% |
| b | 1636010 | 12.4% |
| _ | 14802 | 0.1% |
| P | 7366 | 0.1% |
| 2 | 4892 | < 0.1% |
| 6 | 4858 | < 0.1% |
| Other values (24) | 15984 | 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 9871374 | |
| Lowercase Letter | 3272169 | 24.9% |
| Connector Punctuation | 14802 | 0.1% |
| Uppercase Letter | 7401 | 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 1636016 | |
| b | 1636010 | |
| e | 20 | < 0.1% |
| o | 15 | < 0.1% |
| t | 12 | < 0.1% |
| i | 10 | < 0.1% |
| h | 10 | < 0.1% |
| s | 9 | < 0.1% |
| u | 9 | < 0.1% |
| n | 9 | < 0.1% |
| Other values (11) | 49 | < 0.1% |
Decimal Number
| Value | Count | Frequency (%) |
| 5 | 4912435 | |
| 1 | 1651792 | 16.7% |
| 7 | 1641239 | 16.6% |
| 4 | 1640352 | 16.6% |
| 2 | 4892 | < 0.1% |
| 6 | 4858 | < 0.1% |
| 3 | 4191 | < 0.1% |
| 8 | 4114 | < 0.1% |
| 0 | 3765 | < 0.1% |
| 9 | 3736 | < 0.1% |
Uppercase Letter
| Value | Count | Frequency (%) |
| P | 7366 | |
| Q | 35 | 0.5% |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 14802 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 9886176 | |
| Latin | 3279570 | 24.9% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 1636016 | |
| b | 1636010 | |
| P | 7366 | 0.2% |
| Q | 35 | < 0.1% |
| e | 20 | < 0.1% |
| o | 15 | < 0.1% |
| t | 12 | < 0.1% |
| i | 10 | < 0.1% |
| h | 10 | < 0.1% |
| s | 9 | < 0.1% |
| Other values (13) | 67 | < 0.1% |
Common
| Value | Count | Frequency (%) |
| 5 | 4912435 | |
| 1 | 1651792 | 16.7% |
| 7 | 1641239 | 16.6% |
| 4 | 1640352 | 16.6% |
| _ | 14802 | 0.1% |
| 2 | 4892 | < 0.1% |
| 6 | 4858 | < 0.1% |
| 3 | 4191 | < 0.1% |
| 8 | 4114 | < 0.1% |
| 0 | 3765 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 13165746 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 5 | 4912435 | |
| 1 | 1651792 | 12.5% |
| 7 | 1641239 | 12.5% |
| 4 | 1640352 | 12.5% |
| a | 1636016 | 12.4% |
| b | 1636010 | 12.4% |
| _ | 14802 | 0.1% |
| P | 7366 | 0.1% |
| 2 | 4892 | < 0.1% |
| 6 | 4858 | < 0.1% |
| Other values (24) | 15984 | 0.1% |
num_group1
Real number (ℝ)
ZEROS 
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.1115424635 |
| Minimum | 0 |
|---|---|
| Maximum | 4 |
| Zeros | 1463928 |
| Zeros (%) | 89.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 12.5 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 1 |
| Maximum | 4 |
| Range | 4 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.3224078582 |
|---|---|
| Coefficient of variation (CV) | 2.890449502 |
| Kurtosis | 6.316380546 |
| Mean | 0.1115424635 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.699905629 |
| Sum | 183310 |
| Variance | 0.103946827 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=5)
| Value | Count | Frequency (%) |
| 0 | 1463928 | |
| 1 | 175805 | 10.7% |
| 2 | 3529 | 0.2% |
| 3 | 145 | < 0.1% |
| 4 | 3 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 1463928 | |
| 1 | 175805 | 10.7% |
| 2 | 3529 | 0.2% |
| 3 | 145 | < 0.1% |
| 4 | 3 | < 0.1% |
| Value | Count | Frequency (%) |
| 4 | 3 | < 0.1% |
| 3 | 145 | < 0.1% |
| 2 | 3529 | 0.2% |
| 1 | 175805 | 10.7% |
| 0 | 1463928 |
num_group2
Real number (ℝ)
ZEROS 
| Distinct | 32 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.1237013283 |
| Minimum | 0 |
|---|---|
| Maximum | 31 |
| Zeros | 1561280 |
| Zeros (%) | 95.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 12.5 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 31 |
| Range | 31 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.7612453743 |
|---|---|
| Coefficient of variation (CV) | 6.15389814 |
| Kurtosis | 156.705602 |
| Mean | 0.1237013283 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 10.32953806 |
| Sum | 203292 |
| Variance | 0.57949452 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=32)
| Value | Count | Frequency (%) |
| 0 | 1561280 | |
| 1 | 44507 | 2.7% |
| 2 | 11110 | 0.7% |
| 3 | 8411 | 0.5% |
| 4 | 5356 | 0.3% |
| 5 | 4213 | 0.3% |
| 6 | 2831 | 0.2% |
| 7 | 1979 | 0.1% |
| 8 | 1232 | 0.1% |
| 9 | 860 | 0.1% |
| Other values (22) | 1631 | 0.1% |
| Value | Count | Frequency (%) |
| 0 | 1561280 | |
| 1 | 44507 | 2.7% |
| 2 | 11110 | 0.7% |
| 3 | 8411 | 0.5% |
| 4 | 5356 | 0.3% |
| Value | Count | Frequency (%) |
| 31 | 2 | < 0.1% |
| 30 | 3 | |
| 29 | 5 | |
| 28 | 5 | |
| 27 | 5 |
MISSING 
| Distinct | 10 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 1614684 |
| Missing (%) | 98.3% |
| Memory size | 12.5 MiB |
Length
| Max length | 14 |
|---|---|
| Median length | 12 |
| Mean length | 8.217259625 |
| Min length | 5 |
Characters and Unicode
| Total characters | 236049 |
|---|---|
| Distinct characters | 19 |
| Distinct categories | 2 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | OTHER_RELATIVE |
|---|---|
| 2nd row | OTHER_RELATIVE |
| 3rd row | PARENT |
| 4th row | PARENT |
| 5th row | COLLEAGUE |
| Value | Count | Frequency (%) |
| other_relative | 6211 | |
| sibling | 5406 | |
| friend | 5293 | |
| colleague | 3160 | |
| parent | 2098 | 7.3% |
| other | 2080 | 7.2% |
| child | 1955 | 6.8% |
| spouse | 1476 | 5.1% |
| neighbor | 782 | 2.7% |
| grand_parent | 265 | 0.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| E | 36947 | |
| I | 25053 | |
| R | 23205 | |
| L | 19892 | 8.4% |
| T | 16865 | 7.1% |
| N | 14109 | 6.0% |
| O | 13709 | 5.8% |
| A | 11999 | 5.1% |
| H | 11028 | 4.7% |
| G | 9613 | 4.1% |
| Other values (9) | 53629 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 229573 | |
| Connector Punctuation | 6476 | 2.7% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| E | 36947 | |
| I | 25053 | |
| R | 23205 | |
| L | 19892 | |
| T | 16865 | 7.3% |
| N | 14109 | 6.1% |
| O | 13709 | 6.0% |
| A | 11999 | 5.2% |
| H | 11028 | 4.8% |
| G | 9613 | 4.2% |
| Other values (8) | 47153 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 6476 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 229573 | |
| Common | 6476 | 2.7% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| E | 36947 | |
| I | 25053 | |
| R | 23205 | |
| L | 19892 | |
| T | 16865 | 7.3% |
| N | 14109 | 6.1% |
| O | 13709 | 6.0% |
| A | 11999 | 5.2% |
| H | 11028 | 4.8% |
| G | 9613 | 4.2% |
| Other values (8) | 47153 |
Common
| Value | Count | Frequency (%) |
| _ | 6476 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 236049 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| E | 36947 | |
| I | 25053 | |
| R | 23205 | |
| L | 19892 | 8.4% |
| T | 16865 | 7.1% |
| N | 14109 | 6.0% |
| O | 13709 | 5.8% |
| A | 11999 | 5.1% |
| H | 11028 | 4.7% |
| G | 9613 | 4.1% |
| Other values (9) | 53629 |