Dataset statistics
Number of variables | 11 |
---|---|
Number of observations | 1643410 |
Missing cells | 4828073 |
Missing cells (%) | 26.7% |
Total size in memory | 137.9 MiB |
Average record size in memory | 88.0 B |
Variable types
Numeric | 3 |
---|---|
Text | 8 |
addres_role_871L has 1575736 (95.9%) missing values | Missing |
empls_employedfrom_796D has 1637653 (99.6%) missing values | Missing |
relatedpersons_role_762T has 1614684 (98.3%) missing values | Missing |
num_group1 has 1463928 (89.1%) zeros | Zeros |
num_group2 has 1561280 (95.0%) zeros | Zeros |
Reproduction
Analysis started | 2024-02-13 19:54:21.492391 |
---|---|
Analysis finished | 2024-02-13 19:54:26.576244 |
Duration | 5.08 seconds |
Software version | ydata-profiling vv4.6.4 |
Download configuration | config.json |
case_id
Real number (ℝ)
Distinct | 1435105 |
---|---|
Distinct (%) | 87.3% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 1264005.015 |
Minimum | 5 |
---|---|
Maximum | 2703454 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 12.5 MiB |
Quantile statistics
Minimum | 5 |
---|---|
5-th percentile | 125222.45 |
Q1 | 761958.25 |
median | 1323515.5 |
Q3 | 1695936.75 |
95-th percentile | 2622667.55 |
Maximum | 2703454 |
Range | 2703449 |
Interquartile range (IQR) | 933978.5 |
Descriptive statistics
Standard deviation | 699545.4755 |
---|---|
Coefficient of variation (CV) | 0.5534356803 |
Kurtosis | -0.458723444 |
Mean | 1264005.015 |
Median Absolute Deviation (MAD) | 474319 |
Skewness | 0.2388533482 |
Sum | 2.077278482 × 1012 |
Variance | 4.893638723 × 1011 |
Monotonicity | Increasing |
Value | Count | Frequency (%) |
140528 | 34 | < 0.1% |
1336868 | 34 | < 0.1% |
203731 | 32 | < 0.1% |
169916 | 31 | < 0.1% |
259366 | 31 | < 0.1% |
2631616 | 28 | < 0.1% |
254957 | 26 | < 0.1% |
648623 | 24 | < 0.1% |
1427304 | 23 | < 0.1% |
971833 | 23 | < 0.1% |
Other values (1435095) | 1643124 |
Value | Count | Frequency (%) |
5 | 1 | < 0.1% |
6 | 8 | |
7 | 1 | < 0.1% |
8 | 1 | < 0.1% |
9 | 1 | < 0.1% |
Value | Count | Frequency (%) |
2703454 | 1 | |
2703453 | 1 | |
2703452 | 1 | |
2703451 | 1 | |
2703450 | 1 |
Distinct | 508 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 12.5 MiB |
Length
Max length | 12 |
---|---|
Median length | 8 |
Mean length | 8.092886133 |
Min length | 7 |
Characters and Unicode
Total characters | 13299930 |
---|---|
Distinct characters | 20 |
Distinct categories | 4 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 56 ? |
---|---|
Unique (%) | < 0.1% |
Sample
1st row | a55475b1 |
---|---|
2nd row | P55_110_32 |
3rd row | P55_110_32 |
4th row | P204_92_178 |
5th row | P191_109_75 |
Value | Count | Frequency (%) |
a55475b1 | 1582872 | |
p125_48_164 | 9669 | 0.6% |
p155_139_77 | 4093 | 0.2% |
p114_74_190 | 2552 | 0.2% |
p111_2_12 | 2468 | 0.2% |
p215_163_136 | 1764 | 0.1% |
p88_3_41 | 1537 | 0.1% |
p37_84_33 | 1249 | 0.1% |
p55_110_32 | 1163 | 0.1% |
p107_131_181 | 1058 | 0.1% |
Other values (498) | 34985 | 2.1% |
Most occurring characters
Value | Count | Frequency (%) |
5 | 4789528 | |
1 | 1717341 | 12.9% |
4 | 1632374 | 12.3% |
7 | 1621337 | 12.2% |
a | 1582872 | 11.9% |
b | 1582872 | 11.9% |
_ | 121076 | 0.9% |
P | 60534 | 0.5% |
8 | 39380 | 0.3% |
2 | 39202 | 0.3% |
Other values (10) | 113414 | 0.9% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 9952558 | |
Lowercase Letter | 3165758 | 23.8% |
Connector Punctuation | 121076 | 0.9% |
Uppercase Letter | 60538 | 0.5% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
5 | 4789528 | |
1 | 1717341 | 17.3% |
4 | 1632374 | 16.4% |
7 | 1621337 | 16.3% |
8 | 39380 | 0.4% |
2 | 39202 | 0.4% |
6 | 34765 | 0.3% |
3 | 31907 | 0.3% |
9 | 24574 | 0.2% |
0 | 22150 | 0.2% |
Lowercase Letter
Value | Count | Frequency (%) |
a | 1582872 | |
b | 1582872 | |
e | 6 | < 0.1% |
k | 2 | < 0.1% |
p | 2 | < 0.1% |
t | 2 | < 0.1% |
h | 2 | < 0.1% |
Uppercase Letter
Value | Count | Frequency (%) |
P | 60534 | |
Q | 4 | < 0.1% |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 121076 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 10073634 | |
Latin | 3226296 | 24.3% |
Most frequent character per script
Common
Value | Count | Frequency (%) |
5 | 4789528 | |
1 | 1717341 | 17.0% |
4 | 1632374 | 16.2% |
7 | 1621337 | 16.1% |
_ | 121076 | 1.2% |
8 | 39380 | 0.4% |
2 | 39202 | 0.4% |
6 | 34765 | 0.3% |
3 | 31907 | 0.3% |
9 | 24574 | 0.2% |
Latin
Value | Count | Frequency (%) |
a | 1582872 | |
b | 1582872 | |
P | 60534 | 1.9% |
e | 6 | < 0.1% |
Q | 4 | < 0.1% |
k | 2 | < 0.1% |
p | 2 | < 0.1% |
t | 2 | < 0.1% |
h | 2 | < 0.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 13299930 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
5 | 4789528 | |
1 | 1717341 | 12.9% |
4 | 1632374 | 12.3% |
7 | 1621337 | 12.2% |
a | 1582872 | 11.9% |
b | 1582872 | 11.9% |
_ | 121076 | 0.9% |
P | 60534 | 0.5% |
8 | 39380 | 0.3% |
2 | 39202 | 0.3% |
Other values (10) | 113414 | 0.9% |
addres_role_871L
Text
MISSING
 
Distinct | 8 |
---|---|
Distinct (%) | < 0.1% |
Missing | 1575736 |
Missing (%) | 95.9% |
Memory size | 12.5 MiB |
Value | Count | Frequency (%) |
permanent | 37338 | |
contact | 21918 | |
temporary | 7193 | 10.6% |
registered | 1187 | 1.8% |
migrated_registration | 19 | < 0.1% |
migrated_living | 13 | < 0.1% |
migrated_work | 5 | < 0.1% |
migrated_other | 1 | < 0.1% |
Most occurring characters
Value | Count | Frequency (%) |
N | 96626 | |
T | 89631 | |
E | 85488 | |
A | 66506 | |
R | 54180 | |
M | 44569 | |
P | 44531 | |
C | 43836 | |
O | 29136 | 5.1% |
Y | 7193 | 1.3% |
Other values (10) | 5052 | 0.9% |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 566710 | |
Connector Punctuation | 38 | < 0.1% |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
N | 96626 | |
T | 89631 | |
E | 85488 | |
A | 66506 | |
R | 54180 | |
M | 44569 | |
P | 44531 | |
C | 43836 | |
O | 29136 | 5.1% |
Y | 7193 | 1.3% |
Other values (9) | 5014 | 0.9% |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 38 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 566710 | |
Common | 38 | < 0.1% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
N | 96626 | |
T | 89631 | |
E | 85488 | |
A | 66506 | |
R | 54180 | |
M | 44569 | |
P | 44531 | |
C | 43836 | |
O | 29136 | 5.1% |
Y | 7193 | 1.3% |
Other values (9) | 5014 | 0.9% |
Common
Value | Count | Frequency (%) |
_ | 38 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 566748 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
N | 96626 | |
T | 89631 | |
E | 85488 | |
A | 66506 | |
R | 54180 | |
M | 44569 | |
P | 44531 | |
C | 43836 | |
O | 29136 | 5.1% |
Y | 7193 | 1.3% |
Other values (10) | 5052 | 0.9% |
addres_zip_823M
Text
Distinct | 2027 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 12.5 MiB |
Length
Max length | 13 |
---|---|
Median length | 8 |
Mean length | 8.106648371 |
Min length | 7 |
Characters and Unicode
Total characters | 13322547 |
---|---|
Distinct characters | 28 |
Distinct categories | 4 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 137 ? |
---|---|
Unique (%) | < 0.1% |
Sample
1st row | a55475b1 |
---|---|
2nd row | P10_68_40 |
3rd row | P10_68_40 |
4th row | P65_136_169 |
5th row | P10_68_40 |
Value | Count | Frequency (%) |
a55475b1 | 1576370 | |
p161_14_174 | 5968 | 0.4% |
p144_138_111 | 3405 | 0.2% |
p46_103_143 | 3296 | 0.2% |
p85_138_173 | 2371 | 0.1% |
p118_161_181 | 2132 | 0.1% |
p11_15_81 | 2124 | 0.1% |
p212_16_169 | 1980 | 0.1% |
p133_34_165 | 1612 | 0.1% |
p157_35_170 | 1536 | 0.1% |
Other values (2017) | 42616 | 2.6% |
Most occurring characters
Value | Count | Frequency (%) |
5 | 4764495 | |
1 | 1746591 | 13.1% |
4 | 1635283 | 12.3% |
7 | 1616195 | 12.1% |
a | 1576377 | 11.8% |
b | 1576370 | 11.8% |
_ | 134080 | 1.0% |
P | 67013 | 0.5% |
6 | 44505 | 0.3% |
8 | 44256 | 0.3% |
Other values (18) | 117382 | 0.9% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 9968574 | |
Lowercase Letter | 3152853 | 23.7% |
Connector Punctuation | 134080 | 1.0% |
Uppercase Letter | 67040 | 0.5% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
a | 1576377 | |
b | 1576370 | |
e | 30 | < 0.1% |
t | 14 | < 0.1% |
r | 14 | < 0.1% |
o | 11 | < 0.1% |
m | 9 | < 0.1% |
l | 5 | < 0.1% |
i | 5 | < 0.1% |
z | 5 | < 0.1% |
Other values (5) | 13 | < 0.1% |
Decimal Number
Value | Count | Frequency (%) |
5 | 4764495 | |
1 | 1746591 | 17.5% |
4 | 1635283 | 16.4% |
7 | 1616195 | 16.2% |
6 | 44505 | 0.4% |
8 | 44256 | 0.4% |
3 | 42240 | 0.4% |
2 | 27427 | 0.3% |
9 | 24024 | 0.2% |
0 | 23558 | 0.2% |
Uppercase Letter
Value | Count | Frequency (%) |
P | 67013 | |
Q | 27 | < 0.1% |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 134080 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 10102654 | |
Latin | 3219893 | 24.2% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
a | 1576377 | |
b | 1576370 | |
P | 67013 | 2.1% |
e | 30 | < 0.1% |
Q | 27 | < 0.1% |
t | 14 | < 0.1% |
r | 14 | < 0.1% |
o | 11 | < 0.1% |
m | 9 | < 0.1% |
l | 5 | < 0.1% |
Other values (7) | 23 | < 0.1% |
Common
Value | Count | Frequency (%) |
5 | 4764495 | |
1 | 1746591 | 17.3% |
4 | 1635283 | 16.2% |
7 | 1616195 | 16.0% |
_ | 134080 | 1.3% |
6 | 44505 | 0.4% |
8 | 44256 | 0.4% |
3 | 42240 | 0.4% |
2 | 27427 | 0.3% |
9 | 24024 | 0.2% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 13322547 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
5 | 4764495 | |
1 | 1746591 | 13.1% |
4 | 1635283 | 12.3% |
7 | 1616195 | 12.1% |
a | 1576377 | 11.8% |
b | 1576370 | 11.8% |
_ | 134080 | 1.0% |
P | 67013 | 0.5% |
6 | 44505 | 0.3% |
8 | 44256 | 0.3% |
Other values (18) | 117382 | 0.9% |
conts_role_79M
Text
Distinct | 11 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 12.5 MiB |
Length
Max length | 12 |
---|---|
Median length | 8 |
Mean length | 8.077852149 |
Min length | 8 |
Characters and Unicode
Total characters | 13275223 |
---|---|
Distinct characters | 14 |
Distinct categories | 4 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | a55475b1 |
---|---|
2nd row | P38_92_157 |
3rd row | a55475b1 |
4th row | P38_92_157 |
5th row | P7_147_157 |
Value | Count | Frequency (%) |
a55475b1 | 1587829 | |
p38_92_157 | 29333 | 1.8% |
p177_137_98 | 9179 | 0.6% |
p7_147_157 | 9120 | 0.6% |
p125_105_50 | 4962 | 0.3% |
p115_147_77 | 1231 | 0.1% |
p125_14_176 | 1088 | 0.1% |
p58_79_51 | 307 | < 0.1% |
p124_137_181 | 271 | < 0.1% |
p206_38_166 | 86 | < 0.1% |
Most occurring characters
Value | Count | Frequency (%) |
5 | 4819759 | |
7 | 1677418 | 12.6% |
1 | 1672126 | 12.6% |
4 | 1599547 | 12.0% |
a | 1587829 | 12.0% |
b | 1587829 | 12.0% |
_ | 111162 | 0.8% |
P | 55581 | 0.4% |
8 | 39176 | 0.3% |
3 | 38873 | 0.3% |
Other values (4) | 85923 | 0.6% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 9932822 | |
Lowercase Letter | 3175658 | 23.9% |
Connector Punctuation | 111162 | 0.8% |
Uppercase Letter | 55581 | 0.4% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
5 | 4819759 | |
7 | 1677418 | 16.9% |
1 | 1672126 | 16.8% |
4 | 1599547 | 16.1% |
8 | 39176 | 0.4% |
3 | 38873 | 0.4% |
9 | 38823 | 0.4% |
2 | 35744 | 0.4% |
0 | 10010 | 0.1% |
6 | 1346 | < 0.1% |
Lowercase Letter
Value | Count | Frequency (%) |
a | 1587829 | |
b | 1587829 |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 111162 |
Uppercase Letter
Value | Count | Frequency (%) |
P | 55581 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 10043984 | |
Latin | 3231239 | 24.3% |
Most frequent character per script
Common
Value | Count | Frequency (%) |
5 | 4819759 | |
7 | 1677418 | 16.7% |
1 | 1672126 | 16.6% |
4 | 1599547 | 15.9% |
_ | 111162 | 1.1% |
8 | 39176 | 0.4% |
3 | 38873 | 0.4% |
9 | 38823 | 0.4% |
2 | 35744 | 0.4% |
0 | 10010 | 0.1% |
Latin
Value | Count | Frequency (%) |
a | 1587829 | |
b | 1587829 | |
P | 55581 | 1.7% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 13275223 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
5 | 4819759 | |
7 | 1677418 | 12.6% |
1 | 1672126 | 12.6% |
4 | 1599547 | 12.0% |
a | 1587829 | 12.0% |
b | 1587829 | 12.0% |
_ | 111162 | 0.8% |
P | 55581 | 0.4% |
8 | 39176 | 0.3% |
3 | 38873 | 0.3% |
Other values (4) | 85923 | 0.6% |
Distinct | 10 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 12.5 MiB |
Length
Max length | 11 |
---|---|
Median length | 8 |
Mean length | 8.041208828 |
Min length | 8 |
Characters and Unicode
Total characters | 13215003 |
---|---|
Distinct characters | 14 |
Distinct categories | 4 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | a55475b1 |
---|---|
2nd row | P164_110_33 |
3rd row | a55475b1 |
4th row | P164_110_33 |
5th row | a55475b1 |
Value | Count | Frequency (%) |
a55475b1 | 1618686 | |
p22_131_138 | 9310 | 0.6% |
p164_110_33 | 6527 | 0.4% |
p28_32_178 | 6291 | 0.4% |
p148_57_109 | 1871 | 0.1% |
p112_86_147 | 468 | < 0.1% |
p191_80_124 | 114 | < 0.1% |
p7_47_145 | 79 | < 0.1% |
p164_122_65 | 61 | < 0.1% |
p82_144_169 | 3 | < 0.1% |
Most occurring characters
Value | Count | Frequency (%) |
5 | 4858069 | |
1 | 1678183 | 12.7% |
4 | 1627891 | 12.3% |
7 | 1627474 | 12.3% |
a | 1618686 | 12.2% |
b | 1618686 | 12.2% |
_ | 49448 | 0.4% |
3 | 37965 | 0.3% |
2 | 31909 | 0.2% |
P | 24724 | 0.2% |
Other values (4) | 41968 | 0.3% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 9903459 | |
Lowercase Letter | 3237372 | 24.5% |
Connector Punctuation | 49448 | 0.4% |
Uppercase Letter | 24724 | 0.2% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
5 | 4858069 | |
1 | 1678183 | 16.9% |
4 | 1627891 | 16.4% |
7 | 1627474 | 16.4% |
3 | 37965 | 0.4% |
2 | 31909 | 0.3% |
8 | 24348 | 0.2% |
0 | 8512 | 0.1% |
6 | 7120 | 0.1% |
9 | 1988 | < 0.1% |
Lowercase Letter
Value | Count | Frequency (%) |
a | 1618686 | |
b | 1618686 |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 49448 |
Uppercase Letter
Value | Count | Frequency (%) |
P | 24724 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 9952907 | |
Latin | 3262096 | 24.7% |
Most frequent character per script
Common
Value | Count | Frequency (%) |
5 | 4858069 | |
1 | 1678183 | 16.9% |
4 | 1627891 | 16.4% |
7 | 1627474 | 16.4% |
_ | 49448 | 0.5% |
3 | 37965 | 0.4% |
2 | 31909 | 0.3% |
8 | 24348 | 0.2% |
0 | 8512 | 0.1% |
6 | 7120 | 0.1% |
Latin
Value | Count | Frequency (%) |
a | 1618686 | |
b | 1618686 | |
P | 24724 | 0.8% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 13215003 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
5 | 4858069 | |
1 | 1678183 | 12.7% |
4 | 1627891 | 12.3% |
7 | 1627474 | 12.3% |
a | 1618686 | 12.2% |
b | 1618686 | 12.2% |
_ | 49448 | 0.4% |
3 | 37965 | 0.3% |
2 | 31909 | 0.2% |
P | 24724 | 0.2% |
Other values (4) | 41968 | 0.3% |
MISSING
 
Distinct | 801 |
---|---|
Distinct (%) | 13.9% |
Missing | 1637653 |
Missing (%) | 99.6% |
Memory size | 12.5 MiB |
Length
Max length | 10 |
---|---|
Median length | 10 |
Mean length | 10 |
Min length | 10 |
Characters and Unicode
Total characters | 57570 |
---|---|
Distinct characters | 11 |
Distinct categories | 2 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
Unique
Unique | 453 ? |
---|---|
Unique (%) | 7.9% |
Sample
1st row | 2018-06-15 |
---|---|
2nd row | 2011-08-15 |
3rd row | 1994-05-15 |
4th row | 2013-01-15 |
5th row | 2014-09-15 |
Value | Count | Frequency (%) |
2017-01-15 | 228 | 4.0% |
2016-01-15 | 196 | 3.4% |
2015-01-15 | 181 | 3.1% |
2018-01-15 | 125 | 2.2% |
2013-01-15 | 113 | 2.0% |
2014-01-15 | 106 | 1.8% |
2012-01-15 | 102 | 1.8% |
2007-09-15 | 71 | 1.2% |
2010-01-15 | 69 | 1.2% |
2007-01-15 | 56 | 1.0% |
Other values (791) | 4510 |
Most occurring characters
Value | Count | Frequency (%) |
0 | 13459 | |
1 | 12287 | |
- | 11514 | |
2 | 6876 | |
5 | 6417 | |
9 | 1699 | 3.0% |
6 | 1289 | 2.2% |
7 | 1244 | 2.2% |
8 | 1163 | 2.0% |
4 | 866 | 1.5% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 46056 | |
Dash Punctuation | 11514 | 20.0% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
0 | 13459 | |
1 | 12287 | |
2 | 6876 | |
5 | 6417 | |
9 | 1699 | 3.7% |
6 | 1289 | 2.8% |
7 | 1244 | 2.7% |
8 | 1163 | 2.5% |
4 | 866 | 1.9% |
3 | 756 | 1.6% |
Dash Punctuation
Value | Count | Frequency (%) |
- | 11514 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 57570 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
0 | 13459 | |
1 | 12287 | |
- | 11514 | |
2 | 6876 | |
5 | 6417 | |
9 | 1699 | 3.0% |
6 | 1289 | 2.2% |
7 | 1244 | 2.2% |
8 | 1163 | 2.0% |
4 | 866 | 1.5% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 57570 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 13459 | |
1 | 12287 | |
- | 11514 | |
2 | 6876 | |
5 | 6417 | |
9 | 1699 | 3.0% |
6 | 1289 | 2.2% |
7 | 1244 | 2.2% |
8 | 1163 | 2.0% |
4 | 866 | 1.5% |
Distinct | 7153 |
---|---|
Distinct (%) | 0.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 12.5 MiB |
Length
Max length | 16 |
---|---|
Median length | 8 |
Mean length | 8.011236393 |
Min length | 7 |
Characters and Unicode
Total characters | 13165746 |
---|---|
Distinct characters | 34 |
Distinct categories | 4 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 6991 ? |
---|---|
Unique (%) | 0.4% |
Sample
1st row | a55475b1 |
---|---|
2nd row | a55475b1 |
3rd row | a55475b1 |
4th row | a55475b1 |
5th row | a55475b1 |
Value | Count | Frequency (%) |
a55475b1 | 1636009 | |
p114_118_163 | 17 | < 0.1% |
p179_55_175 | 7 | < 0.1% |
p26_112_122 | 6 | < 0.1% |
p133_138_183 | 6 | < 0.1% |
p9_69_94 | 6 | < 0.1% |
p74_31_177 | 6 | < 0.1% |
p38_11_59 | 6 | < 0.1% |
p149_35_169 | 5 | < 0.1% |
p204_145_180 | 5 | < 0.1% |
Other values (7143) | 7337 | 0.4% |
Most occurring characters
Value | Count | Frequency (%) |
5 | 4912435 | |
1 | 1651792 | 12.5% |
7 | 1641239 | 12.5% |
4 | 1640352 | 12.5% |
a | 1636016 | 12.4% |
b | 1636010 | 12.4% |
_ | 14802 | 0.1% |
P | 7366 | 0.1% |
2 | 4892 | < 0.1% |
6 | 4858 | < 0.1% |
Other values (24) | 15984 | 0.1% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 9871374 | |
Lowercase Letter | 3272169 | 24.9% |
Connector Punctuation | 14802 | 0.1% |
Uppercase Letter | 7401 | 0.1% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
a | 1636016 | |
b | 1636010 | |
e | 20 | < 0.1% |
o | 15 | < 0.1% |
t | 12 | < 0.1% |
i | 10 | < 0.1% |
h | 10 | < 0.1% |
s | 9 | < 0.1% |
u | 9 | < 0.1% |
n | 9 | < 0.1% |
Other values (11) | 49 | < 0.1% |
Decimal Number
Value | Count | Frequency (%) |
5 | 4912435 | |
1 | 1651792 | 16.7% |
7 | 1641239 | 16.6% |
4 | 1640352 | 16.6% |
2 | 4892 | < 0.1% |
6 | 4858 | < 0.1% |
3 | 4191 | < 0.1% |
8 | 4114 | < 0.1% |
0 | 3765 | < 0.1% |
9 | 3736 | < 0.1% |
Uppercase Letter
Value | Count | Frequency (%) |
P | 7366 | |
Q | 35 | 0.5% |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 14802 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 9886176 | |
Latin | 3279570 | 24.9% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
a | 1636016 | |
b | 1636010 | |
P | 7366 | 0.2% |
Q | 35 | < 0.1% |
e | 20 | < 0.1% |
o | 15 | < 0.1% |
t | 12 | < 0.1% |
i | 10 | < 0.1% |
h | 10 | < 0.1% |
s | 9 | < 0.1% |
Other values (13) | 67 | < 0.1% |
Common
Value | Count | Frequency (%) |
5 | 4912435 | |
1 | 1651792 | 16.7% |
7 | 1641239 | 16.6% |
4 | 1640352 | 16.6% |
_ | 14802 | 0.1% |
2 | 4892 | < 0.1% |
6 | 4858 | < 0.1% |
3 | 4191 | < 0.1% |
8 | 4114 | < 0.1% |
0 | 3765 | < 0.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 13165746 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
5 | 4912435 | |
1 | 1651792 | 12.5% |
7 | 1641239 | 12.5% |
4 | 1640352 | 12.5% |
a | 1636016 | 12.4% |
b | 1636010 | 12.4% |
_ | 14802 | 0.1% |
P | 7366 | 0.1% |
2 | 4892 | < 0.1% |
6 | 4858 | < 0.1% |
Other values (24) | 15984 | 0.1% |
num_group1
Real number (ℝ)
ZEROS
 
Distinct | 5 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 0.1115424635 |
Minimum | 0 |
---|---|
Maximum | 4 |
Zeros | 1463928 |
Zeros (%) | 89.1% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 12.5 MiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0 |
median | 0 |
Q3 | 0 |
95-th percentile | 1 |
Maximum | 4 |
Range | 4 |
Interquartile range (IQR) | 0 |
Descriptive statistics
Standard deviation | 0.3224078582 |
---|---|
Coefficient of variation (CV) | 2.890449502 |
Kurtosis | 6.316380546 |
Mean | 0.1115424635 |
Median Absolute Deviation (MAD) | 0 |
Skewness | 2.699905629 |
Sum | 183310 |
Variance | 0.103946827 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
0 | 1463928 | |
1 | 175805 | 10.7% |
2 | 3529 | 0.2% |
3 | 145 | < 0.1% |
4 | 3 | < 0.1% |
Value | Count | Frequency (%) |
0 | 1463928 | |
1 | 175805 | 10.7% |
2 | 3529 | 0.2% |
3 | 145 | < 0.1% |
4 | 3 | < 0.1% |
Value | Count | Frequency (%) |
4 | 3 | < 0.1% |
3 | 145 | < 0.1% |
2 | 3529 | 0.2% |
1 | 175805 | 10.7% |
0 | 1463928 |
num_group2
Real number (ℝ)
ZEROS
 
Distinct | 32 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 0.1237013283 |
Minimum | 0 |
---|---|
Maximum | 31 |
Zeros | 1561280 |
Zeros (%) | 95.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 12.5 MiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0 |
median | 0 |
Q3 | 0 |
95-th percentile | 0 |
Maximum | 31 |
Range | 31 |
Interquartile range (IQR) | 0 |
Descriptive statistics
Standard deviation | 0.7612453743 |
---|---|
Coefficient of variation (CV) | 6.15389814 |
Kurtosis | 156.705602 |
Mean | 0.1237013283 |
Median Absolute Deviation (MAD) | 0 |
Skewness | 10.32953806 |
Sum | 203292 |
Variance | 0.57949452 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
0 | 1561280 | |
1 | 44507 | 2.7% |
2 | 11110 | 0.7% |
3 | 8411 | 0.5% |
4 | 5356 | 0.3% |
5 | 4213 | 0.3% |
6 | 2831 | 0.2% |
7 | 1979 | 0.1% |
8 | 1232 | 0.1% |
9 | 860 | 0.1% |
Other values (22) | 1631 | 0.1% |
Value | Count | Frequency (%) |
0 | 1561280 | |
1 | 44507 | 2.7% |
2 | 11110 | 0.7% |
3 | 8411 | 0.5% |
4 | 5356 | 0.3% |
Value | Count | Frequency (%) |
31 | 2 | < 0.1% |
30 | 3 | |
29 | 5 | |
28 | 5 | |
27 | 5 |
MISSING
 
Distinct | 10 |
---|---|
Distinct (%) | < 0.1% |
Missing | 1614684 |
Missing (%) | 98.3% |
Memory size | 12.5 MiB |
Length
Max length | 14 |
---|---|
Median length | 12 |
Mean length | 8.217259625 |
Min length | 5 |
Characters and Unicode
Total characters | 236049 |
---|---|
Distinct characters | 19 |
Distinct categories | 2 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | OTHER_RELATIVE |
---|---|
2nd row | OTHER_RELATIVE |
3rd row | PARENT |
4th row | PARENT |
5th row | COLLEAGUE |
Value | Count | Frequency (%) |
other_relative | 6211 | |
sibling | 5406 | |
friend | 5293 | |
colleague | 3160 | |
parent | 2098 | 7.3% |
other | 2080 | 7.2% |
child | 1955 | 6.8% |
spouse | 1476 | 5.1% |
neighbor | 782 | 2.7% |
grand_parent | 265 | 0.9% |
Most occurring characters
Value | Count | Frequency (%) |
E | 36947 | |
I | 25053 | |
R | 23205 | |
L | 19892 | 8.4% |
T | 16865 | 7.1% |
N | 14109 | 6.0% |
O | 13709 | 5.8% |
A | 11999 | 5.1% |
H | 11028 | 4.7% |
G | 9613 | 4.1% |
Other values (9) | 53629 |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 229573 | |
Connector Punctuation | 6476 | 2.7% |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
E | 36947 | |
I | 25053 | |
R | 23205 | |
L | 19892 | |
T | 16865 | 7.3% |
N | 14109 | 6.1% |
O | 13709 | 6.0% |
A | 11999 | 5.2% |
H | 11028 | 4.8% |
G | 9613 | 4.2% |
Other values (8) | 47153 |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 6476 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 229573 | |
Common | 6476 | 2.7% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
E | 36947 | |
I | 25053 | |
R | 23205 | |
L | 19892 | |
T | 16865 | 7.3% |
N | 14109 | 6.1% |
O | 13709 | 6.0% |
A | 11999 | 5.2% |
H | 11028 | 4.8% |
G | 9613 | 4.2% |
Other values (8) | 47153 |
Common
Value | Count | Frequency (%) |
_ | 6476 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 236049 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
E | 36947 | |
I | 25053 | |
R | 23205 | |
L | 19892 | 8.4% |
T | 16865 | 7.1% |
N | 14109 | 6.0% |
O | 13709 | 5.8% |
A | 11999 | 5.1% |
H | 11028 | 4.7% |
G | 9613 | 4.1% |
Other values (9) | 53629 |