Dataset statistics
Number of variables | 37 |
---|---|
Number of observations | 2973991 |
Missing cells | 51051536 |
Missing cells (%) | 46.4% |
Total size in memory | 839.5 MiB |
Average record size in memory | 296.0 B |
Variable types
Numeric | 7 |
---|---|
Text | 25 |
Boolean | 5 |
contaddr_matchlist_1032L has constant value "" | Constant |
remitter_829L has constant value "" | Constant |
role_993L has constant value "" | Constant |
contaddr_smempladdr_334L is highly imbalanced (95.8%) | Imbalance |
safeguarantyflag_411L is highly imbalanced (69.9%) | Imbalance |
birth_259D has 1447332 (48.7%) missing values | Missing |
birthdate_87D has 2949075 (99.2%) missing values | Missing |
childnum_185L has 2964084 (99.7%) missing values | Missing |
contaddr_matchlist_1032L has 1447773 (48.7%) missing values | Missing |
contaddr_smempladdr_334L has 1447773 (48.7%) missing values | Missing |
empl_employedfrom_271D has 2407290 (80.9%) missing values | Missing |
empl_employedtotal_800L has 2445676 (82.2%) missing values | Missing |
empl_industry_691L has 2451755 (82.4%) missing values | Missing |
familystate_447L has 2245378 (75.5%) missing values | Missing |
gender_992L has 2949075 (99.2%) missing values | Missing |
housetype_905L has 2873173 (96.6%) missing values | Missing |
housingtype_772L has 2964176 (99.7%) missing values | Missing |
incometype_1044T has 1447332 (48.7%) missing values | Missing |
isreference_387L has 2949075 (99.2%) missing values | Missing |
mainoccupationinc_384A has 1447332 (48.7%) missing values | Missing |
maritalst_703L has 2962646 (99.6%) missing values | Missing |
personindex_1023L has 642283 (21.6%) missing values | Missing |
persontype_792L has 642283 (21.6%) missing values | Missing |
relationshiptoclient_415T has 2168942 (72.9%) missing values | Missing |
relationshiptoclient_642T has 2168049 (72.9%) missing values | Missing |
remitter_829L has 2168942 (72.9%) missing values | Missing |
role_993L has 2949075 (99.2%) missing values | Missing |
safeguarantyflag_411L has 1447334 (48.7%) missing values | Missing |
sex_738L has 1447332 (48.7%) missing values | Missing |
num_group1 has 1526659 (51.3%) zeros | Zeros |
personindex_1023L has 1526659 (51.3%) zeros | Zeros |
Reproduction
Analysis started | 2024-02-13 19:53:39.005022 |
---|---|
Analysis finished | 2024-02-13 19:54:00.981048 |
Duration | 21.98 seconds |
Software version | ydata-profiling vv4.6.4 |
Download configuration | config.json |
case_id
Real number (ℝ)
Distinct | 1526659 |
---|---|
Distinct (%) | 51.3% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 1055195.612 |
Minimum | 0 |
---|---|
Maximum | 2703454 |
Zeros | 4 |
Zeros (%) | < 0.1% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 22.7 MiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 43181 |
Q1 | 637353.5 |
median | 890817 |
Q3 | 1568333.5 |
95-th percentile | 2597849.5 |
Maximum | 2703454 |
Range | 2703454 |
Interquartile range (IQR) | 930980 |
Descriptive statistics
Standard deviation | 724571.3851 |
---|---|
Coefficient of variation (CV) | 0.6866702033 |
Kurtosis | -0.3569768256 |
Mean | 1055195.612 |
Median Absolute Deviation (MAD) | 580360 |
Skewness | 0.5677730015 |
Sum | 3.138142252 × 1012 |
Variance | 5.250036921 × 1011 |
Monotonicity | Increasing |
Value | Count | Frequency (%) |
147982 | 10 | < 0.1% |
706273 | 10 | < 0.1% |
141673 | 8 | < 0.1% |
25018 | 8 | < 0.1% |
1757817 | 8 | < 0.1% |
124748 | 8 | < 0.1% |
608607 | 8 | < 0.1% |
693470 | 8 | < 0.1% |
611761 | 7 | < 0.1% |
200877 | 7 | < 0.1% |
Other values (1526649) | 2973909 |
Value | Count | Frequency (%) |
0 | 4 | |
1 | 5 | |
2 | 5 | |
3 | 3 | |
4 | 4 |
Value | Count | Frequency (%) |
2703454 | 1 | |
2703453 | 2 | |
2703452 | 1 | |
2703451 | 2 | |
2703450 | 1 |
birth_259D
Text
MISSING
 
Distinct | 680 |
---|---|
Distinct (%) | < 0.1% |
Missing | 1447332 |
Missing (%) | 48.7% |
Memory size | 22.7 MiB |
Length
Max length | 10 |
---|---|
Median length | 10 |
Mean length | 10 |
Min length | 10 |
Characters and Unicode
Total characters | 15266590 |
---|---|
Distinct characters | 11 |
Distinct categories | 2 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 1986-07-01 |
---|---|
2nd row | 1957-08-01 |
3rd row | 1974-12-01 |
4th row | 1993-08-01 |
5th row | 1994-01-01 |
Value | Count | Frequency (%) |
1988-07-01 | 3713 | 0.2% |
1986-07-01 | 3655 | 0.2% |
1987-05-01 | 3655 | 0.2% |
1986-08-01 | 3626 | 0.2% |
1988-08-01 | 3584 | 0.2% |
1987-06-01 | 3581 | 0.2% |
1987-08-01 | 3569 | 0.2% |
1987-07-01 | 3554 | 0.2% |
1989-03-01 | 3547 | 0.2% |
1988-05-01 | 3545 | 0.2% |
Other values (670) | 1490630 |
Most occurring characters
Value | Count | Frequency (%) |
1 | 3827240 | |
- | 3053318 | |
0 | 2961792 | |
9 | 2059991 | |
8 | 672485 | 4.4% |
7 | 615279 | 4.0% |
6 | 536355 | 3.5% |
5 | 526876 | 3.5% |
2 | 393048 | 2.6% |
4 | 331903 | 2.2% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 12213272 | |
Dash Punctuation | 3053318 | 20.0% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
1 | 3827240 | |
0 | 2961792 | |
9 | 2059991 | |
8 | 672485 | 5.5% |
7 | 615279 | 5.0% |
6 | 536355 | 4.4% |
5 | 526876 | 4.3% |
2 | 393048 | 3.2% |
4 | 331903 | 2.7% |
3 | 288303 | 2.4% |
Dash Punctuation
Value | Count | Frequency (%) |
- | 3053318 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 15266590 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
1 | 3827240 | |
- | 3053318 | |
0 | 2961792 | |
9 | 2059991 | |
8 | 672485 | 4.4% |
7 | 615279 | 4.0% |
6 | 536355 | 3.5% |
5 | 526876 | 3.5% |
2 | 393048 | 2.6% |
4 | 331903 | 2.2% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 15266590 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
1 | 3827240 | |
- | 3053318 | |
0 | 2961792 | |
9 | 2059991 | |
8 | 672485 | 4.4% |
7 | 615279 | 4.0% |
6 | 536355 | 3.5% |
5 | 526876 | 3.5% |
2 | 393048 | 2.6% |
4 | 331903 | 2.2% |
birthdate_87D
Text
MISSING
 
Distinct | 659 |
---|---|
Distinct (%) | 2.6% |
Missing | 2949075 |
Missing (%) | 99.2% |
Memory size | 22.7 MiB |
Length
Max length | 10 |
---|---|
Median length | 10 |
Mean length | 10 |
Min length | 10 |
Characters and Unicode
Total characters | 249160 |
---|---|
Distinct characters | 11 |
Distinct categories | 2 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
Unique
Unique | 22 ? |
---|---|
Unique (%) | 0.1% |
Sample
1st row | 1991-01-01 |
---|---|
2nd row | 1991-01-01 |
3rd row | 1988-09-01 |
4th row | 1994-09-01 |
5th row | 1969-09-01 |
Value | Count | Frequency (%) |
1982-04-01 | 197 | 0.8% |
1981-04-01 | 196 | 0.8% |
1984-04-01 | 183 | 0.7% |
1983-04-01 | 144 | 0.6% |
1985-05-01 | 126 | 0.5% |
1988-06-01 | 124 | 0.5% |
1986-06-01 | 120 | 0.5% |
1983-12-01 | 116 | 0.5% |
1989-03-01 | 114 | 0.5% |
1990-06-01 | 108 | 0.4% |
Other values (649) | 23488 |
Most occurring characters
Value | Count | Frequency (%) |
1 | 62349 | |
- | 49832 | |
0 | 48226 | |
9 | 35346 | |
8 | 14151 | 5.7% |
7 | 9082 | 3.6% |
6 | 7221 | 2.9% |
2 | 6491 | 2.6% |
5 | 6313 | 2.5% |
4 | 5337 | 2.1% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 199328 | |
Dash Punctuation | 49832 | 20.0% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
1 | 62349 | |
0 | 48226 | |
9 | 35346 | |
8 | 14151 | 7.1% |
7 | 9082 | 4.6% |
6 | 7221 | 3.6% |
2 | 6491 | 3.3% |
5 | 6313 | 3.2% |
4 | 5337 | 2.7% |
3 | 4812 | 2.4% |
Dash Punctuation
Value | Count | Frequency (%) |
- | 49832 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 249160 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
1 | 62349 | |
- | 49832 | |
0 | 48226 | |
9 | 35346 | |
8 | 14151 | 5.7% |
7 | 9082 | 3.6% |
6 | 7221 | 2.9% |
2 | 6491 | 2.6% |
5 | 6313 | 2.5% |
4 | 5337 | 2.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 249160 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
1 | 62349 | |
- | 49832 | |
0 | 48226 | |
9 | 35346 | |
8 | 14151 | 5.7% |
7 | 9082 | 3.6% |
6 | 7221 | 2.9% |
2 | 6491 | 2.6% |
5 | 6313 | 2.5% |
4 | 5337 | 2.1% |
childnum_185L
Real number (ℝ)
MISSING
 
Distinct | 11 |
---|---|
Distinct (%) | 0.1% |
Missing | 2964084 |
Missing (%) | 99.7% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 0.6160290704 |
Minimum | 0 |
---|---|
Maximum | 11 |
Zeros | 6043 |
Zeros (%) | 0.2% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 22.7 MiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0 |
median | 0 |
Q3 | 1 |
95-th percentile | 2 |
Maximum | 11 |
Range | 11 |
Interquartile range (IQR) | 1 |
Descriptive statistics
Standard deviation | 0.9660800059 |
---|---|
Coefficient of variation (CV) | 1.568237689 |
Kurtosis | 7.396626101 |
Mean | 0.6160290704 |
Median Absolute Deviation (MAD) | 0 |
Skewness | 2.180601546 |
Sum | 6103 |
Variance | 0.9333105778 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
0 | 6043 | 0.2% |
1 | 2371 | 0.1% |
2 | 1021 | < 0.1% |
3 | 313 | < 0.1% |
4 | 88 | < 0.1% |
5 | 43 | < 0.1% |
6 | 20 | < 0.1% |
7 | 5 | < 0.1% |
11 | 1 | < 0.1% |
8 | 1 | < 0.1% |
(Missing) | 2964084 |
Value | Count | Frequency (%) |
0 | 6043 | |
1 | 2371 | 0.1% |
2 | 1021 | < 0.1% |
3 | 313 | < 0.1% |
4 | 88 | < 0.1% |
Value | Count | Frequency (%) |
11 | 1 | < 0.1% |
10 | 1 | < 0.1% |
8 | 1 | < 0.1% |
7 | 5 | < 0.1% |
6 | 20 |
Distinct | 975 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 22.7 MiB |
Length
Max length | 12 |
---|---|
Median length | 11 |
Mean length | 9.385123223 |
Min length | 8 |
Characters and Unicode
Total characters | 27911272 |
---|---|
Distinct characters | 18 |
Distinct categories | 4 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 154 ? |
---|---|
Unique (%) | < 0.1% |
Sample
1st row | P88_18_84 |
---|---|
2nd row | a55475b1 |
3rd row | a55475b1 |
4th row | a55475b1 |
5th row | P103_93_94 |
Value | Count | Frequency (%) |
a55475b1 | 1449674 | |
p131_33_167 | 80070 | 2.7% |
p197_47_166 | 77354 | 2.6% |
p123_6_84 | 55868 | 1.9% |
p98_137_111 | 42799 | 1.4% |
p159_143_123 | 35814 | 1.2% |
p62_144_102 | 35641 | 1.2% |
p204_99_158 | 34659 | 1.2% |
p19_11_176 | 31987 | 1.1% |
p178_112_160 | 30076 | 1.0% |
Other values (965) | 1100049 |
Most occurring characters
Value | Count | Frequency (%) |
1 | 5171032 | |
5 | 5168667 | |
_ | 3048634 | |
7 | 2652943 | |
4 | 2376505 | |
P | 1524315 | 5.5% |
a | 1449674 | 5.2% |
b | 1449674 | 5.2% |
6 | 1032051 | 3.7% |
3 | 972874 | 3.5% |
Other values (8) | 3064903 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 20438967 | |
Connector Punctuation | 3048634 | 10.9% |
Lowercase Letter | 2899354 | 10.4% |
Uppercase Letter | 1524317 | 5.5% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
1 | 5171032 | |
5 | 5168667 | |
7 | 2652943 | |
4 | 2376505 | |
6 | 1032051 | 5.0% |
3 | 972874 | 4.8% |
2 | 852697 | 4.2% |
8 | 814902 | 4.0% |
9 | 814404 | 4.0% |
0 | 582892 | 2.9% |
Lowercase Letter
Value | Count | Frequency (%) |
a | 1449674 | |
b | 1449674 | |
t | 2 | < 0.1% |
h | 2 | < 0.1% |
e | 2 | < 0.1% |
Uppercase Letter
Value | Count | Frequency (%) |
P | 1524315 | |
Q | 2 | < 0.1% |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 3048634 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 23487601 | |
Latin | 4423671 | 15.8% |
Most frequent character per script
Common
Value | Count | Frequency (%) |
1 | 5171032 | |
5 | 5168667 | |
_ | 3048634 | |
7 | 2652943 | |
4 | 2376505 | |
6 | 1032051 | 4.4% |
3 | 972874 | 4.1% |
2 | 852697 | 3.6% |
8 | 814902 | 3.5% |
9 | 814404 | 3.5% |
Latin
Value | Count | Frequency (%) |
P | 1524315 | |
a | 1449674 | |
b | 1449674 | |
Q | 2 | < 0.1% |
t | 2 | < 0.1% |
h | 2 | < 0.1% |
e | 2 | < 0.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 27911272 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
1 | 5171032 | |
5 | 5168667 | |
_ | 3048634 | |
7 | 2652943 | |
4 | 2376505 | |
P | 1524315 | 5.5% |
a | 1449674 | 5.2% |
b | 1449674 | 5.2% |
6 | 1032051 | 3.7% |
3 | 972874 | 3.5% |
Other values (8) | 3064903 |
contaddr_matchlist_1032L
Boolean
CONSTANT
  MISSING
 
Distinct | 1 |
---|---|
Distinct (%) | < 0.1% |
Missing | 1447773 |
Missing (%) | 48.7% |
Memory size | 22.7 MiB |
False | |
---|---|
(Missing) |
Value | Count | Frequency (%) |
False | 1526218 | |
(Missing) | 1447773 |
contaddr_smempladdr_334L
Boolean
IMBALANCE
  MISSING
 
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 1447773 |
Missing (%) | 48.7% |
Memory size | 22.7 MiB |
False | |
---|---|
True | 6932 |
(Missing) |
Value | Count | Frequency (%) |
False | 1519286 | |
True | 6932 | 0.2% |
(Missing) | 1447773 |
Distinct | 3530 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 22.7 MiB |
Length
Max length | 13 |
---|---|
Median length | 8 |
Mean length | 9.297594714 |
Min length | 7 |
Characters and Unicode
Total characters | 27650963 |
---|---|
Distinct characters | 33 |
Distinct categories | 4 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 68 ? |
---|---|
Unique (%) | < 0.1% |
Sample
1st row | P167_100_165 |
---|---|
2nd row | a55475b1 |
3rd row | a55475b1 |
4th row | a55475b1 |
5th row | P176_37_166 |
Value | Count | Frequency (%) |
a55475b1 | 1474844 | |
p161_14_174 | 97924 | 3.3% |
p144_138_111 | 90612 | 3.0% |
p46_103_143 | 55529 | 1.9% |
p91_47_168 | 53628 | 1.8% |
p62_116_179 | 34396 | 1.2% |
p212_16_169 | 32118 | 1.1% |
p157_35_170 | 30610 | 1.0% |
p11_15_81 | 30256 | 1.0% |
p131_154_48 | 28360 | 1.0% |
Other values (3520) | 1045714 |
Most occurring characters
Value | Count | Frequency (%) |
5 | 5225088 | |
1 | 5153745 | |
_ | 2998294 | |
4 | 2755481 | |
7 | 2409867 | |
P | 1497360 | 5.4% |
a | 1475205 | 5.3% |
b | 1474844 | 5.3% |
8 | 1012664 | 3.7% |
6 | 987117 | 3.6% |
Other values (23) | 2661298 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 20197415 | |
Connector Punctuation | 2998294 | 10.8% |
Lowercase Letter | 2956107 | 10.7% |
Uppercase Letter | 1499147 | 5.4% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
a | 1475205 | |
b | 1474844 | |
e | 1837 | 0.1% |
t | 771 | < 0.1% |
r | 682 | < 0.1% |
m | 610 | < 0.1% |
o | 593 | < 0.1% |
l | 287 | < 0.1% |
u | 212 | < 0.1% |
g | 170 | < 0.1% |
Other values (10) | 896 | < 0.1% |
Decimal Number
Value | Count | Frequency (%) |
5 | 5225088 | |
1 | 5153745 | |
4 | 2755481 | |
7 | 2409867 | |
8 | 1012664 | 5.0% |
6 | 987117 | 4.9% |
3 | 858532 | 4.3% |
2 | 663914 | 3.3% |
9 | 595785 | 2.9% |
0 | 535222 | 2.6% |
Uppercase Letter
Value | Count | Frequency (%) |
P | 1497360 | |
Q | 1787 | 0.1% |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 2998294 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 23195709 | |
Latin | 4455254 | 16.1% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
P | 1497360 | |
a | 1475205 | |
b | 1474844 | |
e | 1837 | < 0.1% |
Q | 1787 | < 0.1% |
t | 771 | < 0.1% |
r | 682 | < 0.1% |
m | 610 | < 0.1% |
o | 593 | < 0.1% |
l | 287 | < 0.1% |
Other values (12) | 1278 | < 0.1% |
Common
Value | Count | Frequency (%) |
5 | 5225088 | |
1 | 5153745 | |
_ | 2998294 | |
4 | 2755481 | |
7 | 2409867 | |
8 | 1012664 | 4.4% |
6 | 987117 | 4.3% |
3 | 858532 | 3.7% |
2 | 663914 | 2.9% |
9 | 595785 | 2.6% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 27650963 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
5 | 5225088 | |
1 | 5153745 | |
_ | 2998294 | |
4 | 2755481 | |
7 | 2409867 | |
P | 1497360 | 5.4% |
a | 1475205 | 5.3% |
b | 1474844 | 5.3% |
8 | 1012664 | 3.7% |
6 | 987117 | 3.6% |
Other values (23) | 2661298 |
education_927M
Text
Distinct | 6 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 22.7 MiB |
Length
Max length | 11 |
---|---|
Median length | 8 |
Mean length | 8.604439287 |
Min length | 8 |
Characters and Unicode
Total characters | 25589525 |
---|---|
Distinct characters | 14 |
Distinct categories | 4 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | P97_36_170 |
---|---|
2nd row | a55475b1 |
3rd row | a55475b1 |
4th row | a55475b1 |
5th row | P97_36_170 |
Value | Count | Frequency (%) |
a55475b1 | 2234573 | |
p97_36_170 | 415087 | 14.0% |
p33_146_175 | 263189 | 8.8% |
p106_81_188 | 54931 | 1.8% |
p17_36_170 | 5570 | 0.2% |
p157_18_172 | 641 | < 0.1% |
Most occurring characters
Value | Count | Frequency (%) |
5 | 6967549 | |
1 | 3353894 | |
7 | 3340358 | |
4 | 2497762 | 9.8% |
a | 2234573 | 8.7% |
b | 2234573 | 8.7% |
_ | 1478836 | 5.8% |
3 | 947035 | 3.7% |
P | 739418 | 2.9% |
6 | 738777 | 2.9% |
Other values (4) | 1056750 | 4.1% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 18902125 | |
Lowercase Letter | 4469146 | 17.5% |
Connector Punctuation | 1478836 | 5.8% |
Uppercase Letter | 739418 | 2.9% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
5 | 6967549 | |
1 | 3353894 | |
7 | 3340358 | |
4 | 2497762 | 13.2% |
3 | 947035 | 5.0% |
6 | 738777 | 3.9% |
0 | 475588 | 2.5% |
9 | 415087 | 2.2% |
8 | 165434 | 0.9% |
2 | 641 | < 0.1% |
Lowercase Letter
Value | Count | Frequency (%) |
a | 2234573 | |
b | 2234573 |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 1478836 |
Uppercase Letter
Value | Count | Frequency (%) |
P | 739418 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 20380961 | |
Latin | 5208564 | 20.4% |
Most frequent character per script
Common
Value | Count | Frequency (%) |
5 | 6967549 | |
1 | 3353894 | |
7 | 3340358 | |
4 | 2497762 | 12.3% |
_ | 1478836 | 7.3% |
3 | 947035 | 4.6% |
6 | 738777 | 3.6% |
0 | 475588 | 2.3% |
9 | 415087 | 2.0% |
8 | 165434 | 0.8% |
Latin
Value | Count | Frequency (%) |
a | 2234573 | |
b | 2234573 | |
P | 739418 | 14.2% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 25589525 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
5 | 6967549 | |
1 | 3353894 | |
7 | 3340358 | |
4 | 2497762 | 9.8% |
a | 2234573 | 8.7% |
b | 2234573 | 8.7% |
_ | 1478836 | 5.8% |
3 | 947035 | 3.7% |
P | 739418 | 2.9% |
6 | 738777 | 2.9% |
Other values (4) | 1056750 | 4.1% |
MISSING
 
Distinct | 8075 |
---|---|
Distinct (%) | 1.4% |
Missing | 2407290 |
Missing (%) | 80.9% |
Memory size | 22.7 MiB |
Length
Max length | 10 |
---|---|
Median length | 10 |
Mean length | 10 |
Min length | 10 |
Characters and Unicode
Total characters | 5667010 |
---|---|
Distinct characters | 11 |
Distinct categories | 2 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
Unique
Unique | 2129 ? |
---|---|
Unique (%) | 0.4% |
Sample
1st row | 2017-09-15 |
---|---|
2nd row | 2008-10-29 |
3rd row | 2010-02-15 |
4th row | 2018-05-15 |
5th row | 2014-12-15 |
Value | Count | Frequency (%) |
2018-01-15 | 48503 | 8.6% |
2017-01-15 | 40965 | 7.2% |
2019-01-15 | 29346 | 5.2% |
2016-01-15 | 28452 | 5.0% |
2015-01-15 | 24818 | 4.4% |
2014-01-15 | 20590 | 3.6% |
2013-01-15 | 15362 | 2.7% |
2012-01-15 | 12392 | 2.2% |
2010-01-15 | 11556 | 2.0% |
2011-01-15 | 9139 | 1.6% |
Other values (8065) | 325578 |
Most occurring characters
Value | Count | Frequency (%) |
1 | 1390947 | |
0 | 1223499 | |
- | 1133402 | |
2 | 667656 | |
5 | 579830 | |
9 | 178466 | 3.1% |
8 | 143289 | 2.5% |
7 | 108883 | 1.9% |
6 | 100485 | 1.8% |
4 | 75533 | 1.3% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 4533608 | |
Dash Punctuation | 1133402 | 20.0% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
1 | 1390947 | |
0 | 1223499 | |
2 | 667656 | |
5 | 579830 | |
9 | 178466 | 3.9% |
8 | 143289 | 3.2% |
7 | 108883 | 2.4% |
6 | 100485 | 2.2% |
4 | 75533 | 1.7% |
3 | 65020 | 1.4% |
Dash Punctuation
Value | Count | Frequency (%) |
- | 1133402 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 5667010 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
1 | 1390947 | |
0 | 1223499 | |
- | 1133402 | |
2 | 667656 | |
5 | 579830 | |
9 | 178466 | 3.1% |
8 | 143289 | 2.5% |
7 | 108883 | 1.9% |
6 | 100485 | 1.8% |
4 | 75533 | 1.3% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 5667010 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
1 | 1390947 | |
0 | 1223499 | |
- | 1133402 | |
2 | 667656 | |
5 | 579830 | |
9 | 178466 | 3.1% |
8 | 143289 | 2.5% |
7 | 108883 | 1.9% |
6 | 100485 | 1.8% |
4 | 75533 | 1.3% |
MISSING
 
Distinct | 3 |
---|---|
Distinct (%) | < 0.1% |
Missing | 2445676 |
Missing (%) | 82.2% |
Memory size | 22.7 MiB |
Length
Max length | 9 |
---|---|
Median length | 9 |
Mean length | 8.702840162 |
Min length | 8 |
Characters and Unicode
Total characters | 4597841 |
---|---|
Distinct characters | 11 |
Distinct categories | 2 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | MORE_FIVE |
---|---|
2nd row | MORE_FIVE |
3rd row | MORE_FIVE |
4th row | MORE_FIVE |
5th row | MORE_FIVE |
Value | Count | Frequency (%) |
more_five | 371321 | |
more_one | 126527 | 23.9% |
less_one | 30467 | 5.8% |
Most occurring characters
Value | Count | Frequency (%) |
E | 1056630 | |
O | 654842 | |
_ | 528315 | |
M | 497848 | |
R | 497848 | |
F | 371321 | 8.1% |
I | 371321 | 8.1% |
V | 371321 | 8.1% |
N | 156994 | 3.4% |
S | 60934 | 1.3% |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 4069526 | |
Connector Punctuation | 528315 | 11.5% |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
E | 1056630 | |
O | 654842 | |
M | 497848 | |
R | 497848 | |
F | 371321 | 9.1% |
I | 371321 | 9.1% |
V | 371321 | 9.1% |
N | 156994 | 3.9% |
S | 60934 | 1.5% |
L | 30467 | 0.7% |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 528315 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 4069526 | |
Common | 528315 | 11.5% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
E | 1056630 | |
O | 654842 | |
M | 497848 | |
R | 497848 | |
F | 371321 | 9.1% |
I | 371321 | 9.1% |
V | 371321 | 9.1% |
N | 156994 | 3.9% |
S | 60934 | 1.5% |
L | 30467 | 0.7% |
Common
Value | Count | Frequency (%) |
_ | 528315 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 4597841 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
E | 1056630 | |
O | 654842 | |
_ | 528315 | |
M | 497848 | |
R | 497848 | |
F | 371321 | 8.1% |
I | 371321 | 8.1% |
V | 371321 | 8.1% |
N | 156994 | 3.4% |
S | 60934 | 1.3% |
MISSING
 
Distinct | 24 |
---|---|
Distinct (%) | < 0.1% |
Missing | 2451755 |
Missing (%) | 82.4% |
Memory size | 22.7 MiB |
Value | Count | Frequency (%) |
other | 386837 | |
government | 35440 | 6.8% |
education | 30346 | 5.8% |
trade | 20696 | 4.0% |
health | 13026 | 2.5% |
manufacturing | 9035 | 1.7% |
agriculture | 5288 | 1.0% |
transportation | 4318 | 0.8% |
real_estate | 3680 | 0.7% |
mining | 3582 | 0.7% |
Other values (14) | 9988 | 1.9% |
Most occurring characters
Value | Count | Frequency (%) |
E | 547739 | |
T | 527185 | |
R | 480420 | |
O | 463035 | |
H | 412914 | |
N | 146199 | 4.7% |
A | 111725 | 3.6% |
I | 64572 | 2.1% |
U | 59832 | 1.9% |
G | 57309 | 1.8% |
Other values (12) | 245705 |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 3112059 | |
Connector Punctuation | 4576 | 0.1% |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
E | 547739 | |
T | 527185 | |
R | 480420 | |
O | 463035 | |
H | 412914 | |
N | 146199 | 4.7% |
A | 111725 | 3.6% |
I | 64572 | 2.1% |
U | 59832 | 1.9% |
G | 57309 | 1.8% |
Other values (11) | 241129 |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 4576 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 3112059 | |
Common | 4576 | 0.1% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
E | 547739 | |
T | 527185 | |
R | 480420 | |
O | 463035 | |
H | 412914 | |
N | 146199 | 4.7% |
A | 111725 | 3.6% |
I | 64572 | 2.1% |
U | 59832 | 1.9% |
G | 57309 | 1.8% |
Other values (11) | 241129 |
Common
Value | Count | Frequency (%) |
_ | 4576 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 3116635 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
E | 547739 | |
T | 527185 | |
R | 480420 | |
O | 463035 | |
H | 412914 | |
N | 146199 | 4.7% |
A | 111725 | 3.6% |
I | 64572 | 2.1% |
U | 59832 | 1.9% |
G | 57309 | 1.8% |
Other values (12) | 245705 |
Distinct | 223 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 22.7 MiB |
Length
Max length | 12 |
---|---|
Median length | 8 |
Mean length | 8.493035453 |
Min length | 8 |
Characters and Unicode
Total characters | 25258211 |
---|---|
Distinct characters | 14 |
Distinct categories | 4 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | P142_57_166 |
---|---|
2nd row | a55475b1 |
3rd row | a55475b1 |
4th row | a55475b1 |
5th row | P49_46_174 |
Value | Count | Frequency (%) |
a55475b1 | 2448480 | |
p197_47_166 | 57521 | 1.9% |
p131_33_167 | 41960 | 1.4% |
p62_144_102 | 21172 | 0.7% |
p98_137_111 | 20595 | 0.7% |
p123_6_84 | 19021 | 0.6% |
p159_143_123 | 15720 | 0.5% |
p19_11_176 | 13883 | 0.5% |
p109_162_152 | 12941 | 0.4% |
p112_89_137 | 10583 | 0.4% |
Other values (213) | 312115 | 10.5% |
Most occurring characters
Value | Count | Frequency (%) |
5 | 7581626 | |
1 | 3772799 | |
7 | 2892790 | 11.5% |
4 | 2778339 | 11.0% |
a | 2448480 | 9.7% |
b | 2448480 | 9.7% |
_ | 1051022 | 4.2% |
P | 525511 | 2.1% |
6 | 399949 | 1.6% |
3 | 350910 | 1.4% |
Other values (4) | 1008305 | 4.0% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 18784718 | |
Lowercase Letter | 4896960 | 19.4% |
Connector Punctuation | 1051022 | 4.2% |
Uppercase Letter | 525511 | 2.1% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
5 | 7581626 | |
1 | 3772799 | |
7 | 2892790 | 15.4% |
4 | 2778339 | 14.8% |
6 | 399949 | 2.1% |
3 | 350910 | 1.9% |
9 | 293310 | 1.6% |
2 | 287439 | 1.5% |
8 | 244274 | 1.3% |
0 | 183282 | 1.0% |
Lowercase Letter
Value | Count | Frequency (%) |
a | 2448480 | |
b | 2448480 |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 1051022 |
Uppercase Letter
Value | Count | Frequency (%) |
P | 525511 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 19835740 | |
Latin | 5422471 | 21.5% |
Most frequent character per script
Common
Value | Count | Frequency (%) |
5 | 7581626 | |
1 | 3772799 | |
7 | 2892790 | 14.6% |
4 | 2778339 | 14.0% |
_ | 1051022 | 5.3% |
6 | 399949 | 2.0% |
3 | 350910 | 1.8% |
9 | 293310 | 1.5% |
2 | 287439 | 1.4% |
8 | 244274 | 1.2% |
Latin
Value | Count | Frequency (%) |
a | 2448480 | |
b | 2448480 | |
P | 525511 | 9.7% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 25258211 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
5 | 7581626 | |
1 | 3772799 | |
7 | 2892790 | 11.5% |
4 | 2778339 | 11.0% |
a | 2448480 | 9.7% |
b | 2448480 | 9.7% |
_ | 1051022 | 4.2% |
P | 525511 | 2.1% |
6 | 399949 | 1.6% |
3 | 350910 | 1.4% |
Other values (4) | 1008305 | 4.0% |
Distinct | 3339 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 22.7 MiB |
Length
Max length | 13 |
---|---|
Median length | 8 |
Mean length | 8.461853113 |
Min length | 7 |
Characters and Unicode
Total characters | 25165475 |
---|---|
Distinct characters | 33 |
Distinct categories | 4 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 179 ? |
---|---|
Unique (%) | < 0.1% |
Sample
1st row | P167_100_165 |
---|---|
2nd row | a55475b1 |
3rd row | a55475b1 |
4th row | a55475b1 |
5th row | P160_59_140 |
Value | Count | Frequency (%) |
a55475b1 | 2448349 | |
p144_138_111 | 61514 | 2.1% |
p161_14_174 | 44960 | 1.5% |
p91_47_168 | 22301 | 0.7% |
p8_88_79 | 19321 | 0.6% |
p46_103_143 | 18945 | 0.6% |
p62_116_179 | 15640 | 0.5% |
p11_15_81 | 13644 | 0.5% |
p45_25_38 | 12915 | 0.4% |
p118_161_181 | 10472 | 0.4% |
Other values (3329) | 305930 | 10.3% |
Most occurring characters
Value | Count | Frequency (%) |
5 | 7594625 | |
1 | 3828238 | |
4 | 2955454 | 11.7% |
7 | 2753246 | 10.9% |
a | 2448420 | 9.7% |
b | 2448349 | 9.7% |
_ | 1051284 | 4.2% |
P | 525336 | 2.1% |
8 | 396794 | 1.6% |
6 | 318412 | 1.3% |
Other values (23) | 845317 | 3.4% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 18690771 | |
Lowercase Letter | 4897778 | 19.5% |
Connector Punctuation | 1051284 | 4.2% |
Uppercase Letter | 525642 | 2.1% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
a | 2448420 | |
b | 2448349 | |
e | 261 | < 0.1% |
o | 124 | < 0.1% |
r | 113 | < 0.1% |
t | 101 | < 0.1% |
m | 74 | < 0.1% |
l | 54 | < 0.1% |
g | 51 | < 0.1% |
u | 50 | < 0.1% |
Other values (10) | 181 | < 0.1% |
Decimal Number
Value | Count | Frequency (%) |
5 | 7594625 | |
1 | 3828238 | |
4 | 2955454 | 15.8% |
7 | 2753246 | 14.7% |
8 | 396794 | 2.1% |
6 | 318412 | 1.7% |
3 | 307112 | 1.6% |
2 | 192096 | 1.0% |
9 | 190232 | 1.0% |
0 | 154562 | 0.8% |
Uppercase Letter
Value | Count | Frequency (%) |
P | 525336 | |
Q | 306 | 0.1% |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 1051284 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 19742055 | |
Latin | 5423420 | 21.6% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
a | 2448420 | |
b | 2448349 | |
P | 525336 | 9.7% |
Q | 306 | < 0.1% |
e | 261 | < 0.1% |
o | 124 | < 0.1% |
r | 113 | < 0.1% |
t | 101 | < 0.1% |
m | 74 | < 0.1% |
l | 54 | < 0.1% |
Other values (12) | 282 | < 0.1% |
Common
Value | Count | Frequency (%) |
5 | 7594625 | |
1 | 3828238 | |
4 | 2955454 | 15.0% |
7 | 2753246 | 13.9% |
_ | 1051284 | 5.3% |
8 | 396794 | 2.0% |
6 | 318412 | 1.6% |
3 | 307112 | 1.6% |
2 | 192096 | 1.0% |
9 | 190232 | 1.0% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 25165475 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
5 | 7594625 | |
1 | 3828238 | |
4 | 2955454 | 11.7% |
7 | 2753246 | 10.9% |
a | 2448420 | 9.7% |
b | 2448349 | 9.7% |
_ | 1051284 | 4.2% |
P | 525336 | 2.1% |
8 | 396794 | 1.6% |
6 | 318412 | 1.3% |
Other values (23) | 845317 | 3.4% |
familystate_447L
Text
MISSING
 
Distinct | 5 |
---|---|
Distinct (%) | < 0.1% |
Missing | 2245378 |
Missing (%) | 75.5% |
Memory size | 22.7 MiB |
Value | Count | Frequency (%) |
married | 484846 | |
single | 183334 | 25.2% |
widowed | 32995 | 4.5% |
divorced | 19296 | 2.6% |
living_with_partner | 8142 | 1.1% |
Most occurring characters
Value | Count | Frequency (%) |
R | 1005272 | |
I | 744897 | |
E | 728613 | |
D | 589428 | |
A | 492988 | |
M | 484846 | |
N | 199618 | 4.0% |
G | 191476 | 3.8% |
L | 191476 | 3.8% |
S | 183334 | 3.6% |
Other values (8) | 222009 | 4.4% |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 5017673 | |
Connector Punctuation | 16284 | 0.3% |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
R | 1005272 | |
I | 744897 | |
E | 728613 | |
D | 589428 | |
A | 492988 | |
M | 484846 | |
N | 199618 | 4.0% |
G | 191476 | 3.8% |
L | 191476 | 3.8% |
S | 183334 | 3.7% |
Other values (7) | 205725 | 4.1% |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 16284 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 5017673 | |
Common | 16284 | 0.3% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
R | 1005272 | |
I | 744897 | |
E | 728613 | |
D | 589428 | |
A | 492988 | |
M | 484846 | |
N | 199618 | 4.0% |
G | 191476 | 3.8% |
L | 191476 | 3.8% |
S | 183334 | 3.7% |
Other values (7) | 205725 | 4.1% |
Common
Value | Count | Frequency (%) |
_ | 16284 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 5033957 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
R | 1005272 | |
I | 744897 | |
E | 728613 | |
D | 589428 | |
A | 492988 | |
M | 484846 | |
N | 199618 | 4.0% |
G | 191476 | 3.8% |
L | 191476 | 3.8% |
S | 183334 | 3.6% |
Other values (8) | 222009 | 4.4% |
gender_992L
Text
MISSING
 
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 2949075 |
Missing (%) | 99.2% |
Memory size | 22.7 MiB |
Value | Count | Frequency (%) |
f | 20981 | |
m | 3935 | 15.8% |
Most occurring characters
Value | Count | Frequency (%) |
F | 20981 | |
M | 3935 | 15.8% |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 24916 |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
F | 20981 | |
M | 3935 | 15.8% |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 24916 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
F | 20981 | |
M | 3935 | 15.8% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 24916 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
F | 20981 | |
M | 3935 | 15.8% |
housetype_905L
Text
MISSING
 
Distinct | 6 |
---|---|
Distinct (%) | < 0.1% |
Missing | 2873173 |
Missing (%) | 96.6% |
Memory size | 22.7 MiB |
Value | Count | Frequency (%) |
owned | 94076 | |
parental | 4212 | 4.2% |
flat | 1501 | 1.5% |
company_flat | 641 | 0.6% |
coop_flat | 222 | 0.2% |
state_flat | 166 | 0.2% |
Most occurring characters
Value | Count | Frequency (%) |
N | 98929 | |
E | 98454 | |
O | 95161 | |
W | 94076 | |
D | 94076 | |
A | 11761 | 2.3% |
T | 7074 | 1.4% |
L | 6742 | 1.3% |
P | 5075 | 1.0% |
R | 4212 | 0.8% |
Other values (6) | 5870 | 1.1% |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 520401 | |
Connector Punctuation | 1029 | 0.2% |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
N | 98929 | |
E | 98454 | |
O | 95161 | |
W | 94076 | |
D | 94076 | |
A | 11761 | 2.3% |
T | 7074 | 1.4% |
L | 6742 | 1.3% |
P | 5075 | 1.0% |
R | 4212 | 0.8% |
Other values (5) | 4841 | 0.9% |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 1029 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 520401 | |
Common | 1029 | 0.2% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
N | 98929 | |
E | 98454 | |
O | 95161 | |
W | 94076 | |
D | 94076 | |
A | 11761 | 2.3% |
T | 7074 | 1.4% |
L | 6742 | 1.3% |
P | 5075 | 1.0% |
R | 4212 | 0.8% |
Other values (5) | 4841 | 0.9% |
Common
Value | Count | Frequency (%) |
_ | 1029 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 521430 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
N | 98929 | |
E | 98454 | |
O | 95161 | |
W | 94076 | |
D | 94076 | |
A | 11761 | 2.3% |
T | 7074 | 1.4% |
L | 6742 | 1.3% |
P | 5075 | 1.0% |
R | 4212 | 0.8% |
Other values (6) | 5870 | 1.1% |
housingtype_772L
Text
MISSING
 
Distinct | 6 |
---|---|
Distinct (%) | 0.1% |
Missing | 2964176 |
Missing (%) | 99.7% |
Memory size | 22.7 MiB |
Value | Count | Frequency (%) |
owned | 7850 | |
parental | 1436 | 14.6% |
flat | 376 | 3.8% |
company_flat | 95 | 1.0% |
state_flat | 38 | 0.4% |
coop_flat | 20 | 0.2% |
Most occurring characters
Value | Count | Frequency (%) |
N | 9381 | |
E | 9324 | |
O | 7985 | |
W | 7850 | |
D | 7850 | |
A | 3534 | 6.6% |
T | 2041 | 3.8% |
L | 1965 | 3.6% |
P | 1551 | 2.9% |
R | 1436 | 2.7% |
Other values (6) | 1025 | 1.9% |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 53789 | |
Connector Punctuation | 153 | 0.3% |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
N | 9381 | |
E | 9324 | |
O | 7985 | |
W | 7850 | |
D | 7850 | |
A | 3534 | 6.6% |
T | 2041 | 3.8% |
L | 1965 | 3.7% |
P | 1551 | 2.9% |
R | 1436 | 2.7% |
Other values (5) | 872 | 1.6% |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 153 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 53789 | |
Common | 153 | 0.3% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
N | 9381 | |
E | 9324 | |
O | 7985 | |
W | 7850 | |
D | 7850 | |
A | 3534 | 6.6% |
T | 2041 | 3.8% |
L | 1965 | 3.7% |
P | 1551 | 2.9% |
R | 1436 | 2.7% |
Other values (5) | 872 | 1.6% |
Common
Value | Count | Frequency (%) |
_ | 153 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 53942 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
N | 9381 | |
E | 9324 | |
O | 7985 | |
W | 7850 | |
D | 7850 | |
A | 3534 | 6.6% |
T | 2041 | 3.8% |
L | 1965 | 3.6% |
P | 1551 | 2.9% |
R | 1436 | 2.7% |
Other values (6) | 1025 | 1.9% |
incometype_1044T
Text
MISSING
 
Distinct | 9 |
---|---|
Distinct (%) | < 0.1% |
Missing | 1447332 |
Missing (%) | 48.7% |
Memory size | 22.7 MiB |
Length
Max length | 23 |
---|---|
Median length | 17 |
Mean length | 15.97266973 |
Min length | 5 |
Characters and Unicode
Total characters | 24384820 |
---|---|
Distinct characters | 21 |
Distinct categories | 3 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 1 ? |
---|---|
Unique (%) | < 0.1% |
Sample
1st row | SALARIED_GOVT |
---|---|
2nd row | SALARIED_GOVT |
3rd row | EMPLOYED |
4th row | EMPLOYED |
5th row | EMPLOYED |
Value | Count | Frequency (%) |
private_sector_employee | 490562 | |
salaried_govt | 373646 | |
retired_pensioner | 311028 | |
employed | 298158 | |
selfemployed | 29199 | 1.9% |
other | 11436 | 0.7% |
handicapped_2 | 7371 | 0.5% |
handicapped_3 | 5258 | 0.3% |
handicapped | 1 | < 0.1% |
Most occurring characters
Value | Count | Frequency (%) |
E | 4778547 | |
R | 2299290 | |
O | 2004591 | 8.2% |
_ | 1678427 | 6.9% |
T | 1677234 | 6.9% |
P | 1644769 | 6.7% |
I | 1498894 | 6.1% |
A | 1263114 | 5.2% |
L | 1220764 | 5.0% |
S | 1204435 | 4.9% |
Other values (11) | 5114755 |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 22693764 | |
Connector Punctuation | 1678427 | 6.9% |
Decimal Number | 12629 | 0.1% |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
E | 4778547 | |
R | 2299290 | |
O | 2004591 | |
T | 1677234 | 7.4% |
P | 1644769 | 7.2% |
I | 1498894 | 6.6% |
A | 1263114 | 5.6% |
L | 1220764 | 5.4% |
S | 1204435 | 5.3% |
D | 1037291 | 4.6% |
Other values (8) | 4064835 |
Decimal Number
Value | Count | Frequency (%) |
2 | 7371 | |
3 | 5258 |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 1678427 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 22693764 | |
Common | 1691056 | 6.9% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
E | 4778547 | |
R | 2299290 | |
O | 2004591 | |
T | 1677234 | 7.4% |
P | 1644769 | 7.2% |
I | 1498894 | 6.6% |
A | 1263114 | 5.6% |
L | 1220764 | 5.4% |
S | 1204435 | 5.3% |
D | 1037291 | 4.6% |
Other values (8) | 4064835 |
Common
Value | Count | Frequency (%) |
_ | 1678427 | |
2 | 7371 | 0.4% |
3 | 5258 | 0.3% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 24384820 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
E | 4778547 | |
R | 2299290 | |
O | 2004591 | 8.2% |
_ | 1678427 | 6.9% |
T | 1677234 | 6.9% |
P | 1644769 | 6.7% |
I | 1498894 | 6.1% |
A | 1263114 | 5.2% |
L | 1220764 | 5.0% |
S | 1204435 | 4.9% |
Other values (11) | 5114755 |
isreference_387L
Boolean
MISSING
 
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 2949075 |
Missing (%) | 99.2% |
Memory size | 22.7 MiB |
True | 12458 |
---|---|
False | 12458 |
(Missing) |
Value | Count | Frequency (%) |
True | 12458 | 0.4% |
False | 12458 | 0.4% |
(Missing) | 2949075 |
language1_981M
Text
Distinct | 3 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 22.7 MiB |
Length
Max length | 12 |
---|---|
Median length | 8 |
Mean length | 9.40439228 |
Min length | 8 |
Characters and Unicode
Total characters | 27968578 |
---|---|
Distinct characters | 13 |
Distinct categories | 4 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | P10_39_147 |
---|---|
2nd row | a55475b1 |
3rd row | a55475b1 |
4th row | a55475b1 |
5th row | P10_39_147 |
Value | Count | Frequency (%) |
a55475b1 | 1505452 | |
p10_39_147 | 848753 | |
p209_127_106 | 619786 |
Most occurring characters
Value | Count | Frequency (%) |
5 | 4516356 | |
1 | 4442530 | |
7 | 2973991 | |
_ | 2937078 | |
4 | 2354205 | |
0 | 2088325 | |
a | 1505452 | 5.4% |
b | 1505452 | 5.4% |
P | 1468539 | 5.3% |
9 | 1468539 | 5.3% |
Other values (3) | 2708111 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 20552057 | |
Lowercase Letter | 3010904 | 10.8% |
Connector Punctuation | 2937078 | 10.5% |
Uppercase Letter | 1468539 | 5.3% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
5 | 4516356 | |
1 | 4442530 | |
7 | 2973991 | |
4 | 2354205 | |
0 | 2088325 | |
9 | 1468539 | 7.1% |
2 | 1239572 | 6.0% |
3 | 848753 | 4.1% |
6 | 619786 | 3.0% |
Lowercase Letter
Value | Count | Frequency (%) |
a | 1505452 | |
b | 1505452 |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 2937078 |
Uppercase Letter
Value | Count | Frequency (%) |
P | 1468539 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 23489135 | |
Latin | 4479443 | 16.0% |
Most frequent character per script
Common
Value | Count | Frequency (%) |
5 | 4516356 | |
1 | 4442530 | |
7 | 2973991 | |
_ | 2937078 | |
4 | 2354205 | |
0 | 2088325 | |
9 | 1468539 | 6.3% |
2 | 1239572 | 5.3% |
3 | 848753 | 3.6% |
6 | 619786 | 2.6% |
Latin
Value | Count | Frequency (%) |
a | 1505452 | |
b | 1505452 | |
P | 1468539 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 27968578 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
5 | 4516356 | |
1 | 4442530 | |
7 | 2973991 | |
_ | 2937078 | |
4 | 2354205 | |
0 | 2088325 | |
a | 1505452 | 5.4% |
b | 1505452 | 5.4% |
P | 1468539 | 5.3% |
9 | 1468539 | 5.3% |
Other values (3) | 2708111 |
mainoccupationinc_384A
Real number (ℝ)
MISSING
 
Distinct | 6632 |
---|---|
Distinct (%) | 0.4% |
Missing | 1447332 |
Missing (%) | 48.7% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 57707.48346 |
Minimum | 0 |
---|---|
Maximum | 200000 |
Zeros | 9 |
Zeros (%) | < 0.1% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 22.7 MiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 20000 |
Q1 | 36000 |
median | 50000 |
Q3 | 70000 |
95-th percentile | 120000 |
Maximum | 200000 |
Range | 200000 |
Interquartile range (IQR) | 34000 |
Descriptive statistics
Standard deviation | 33348.30285 |
---|---|
Coefficient of variation (CV) | 0.5778852385 |
Kurtosis | 3.713818555 |
Mean | 57707.48346 |
Median Absolute Deviation (MAD) | 20000 |
Skewness | 1.664849512 |
Sum | 8.809964899 × 1010 |
Variance | 1112109303 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
40000 | 149658 | 5.0% |
50000 | 141374 | 4.8% |
60000 | 133345 | 4.5% |
30000 | 116608 | 3.9% |
70000 | 89103 | 3.0% |
100000 | 60078 | 2.0% |
80000 | 58412 | 2.0% |
36000 | 46507 | 1.6% |
20000 | 44162 | 1.5% |
24000 | 40383 | 1.4% |
Other values (6622) | 647029 | |
(Missing) | 1447332 |
Value | Count | Frequency (%) |
0 | 9 | |
0.2 | 2 | < 0.1% |
1 | 2 | < 0.1% |
1.2 | 1 | < 0.1% |
1.6 | 1 | < 0.1% |
Value | Count | Frequency (%) |
200000 | 12430 | |
199999.8 | 5 | < 0.1% |
199998 | 2 | < 0.1% |
199980 | 1 | < 0.1% |
199971.61 | 1 | < 0.1% |
maritalst_703L
Text
MISSING
 
Distinct | 5 |
---|---|
Distinct (%) | < 0.1% |
Missing | 2962646 |
Missing (%) | 99.6% |
Memory size | 22.7 MiB |
Value | Count | Frequency (%) |
married | 5970 | |
single | 4059 | |
divorced | 549 | 4.8% |
living_with_partner | 485 | 4.3% |
widowed | 282 | 2.5% |
Most occurring characters
Value | Count | Frequency (%) |
R | 13459 | |
I | 12315 | |
E | 11345 | |
D | 7632 | |
A | 6455 | |
M | 5970 | |
N | 5029 | 6.2% |
G | 4544 | 5.6% |
L | 4544 | 5.6% |
S | 4059 | 5.0% |
Other values (8) | 6373 |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 80755 | |
Connector Punctuation | 970 | 1.2% |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
R | 13459 | |
I | 12315 | |
E | 11345 | |
D | 7632 | |
A | 6455 | |
M | 5970 | |
N | 5029 | 6.2% |
G | 4544 | 5.6% |
L | 4544 | 5.6% |
S | 4059 | 5.0% |
Other values (7) | 5403 |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 970 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 80755 | |
Common | 970 | 1.2% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
R | 13459 | |
I | 12315 | |
E | 11345 | |
D | 7632 | |
A | 6455 | |
M | 5970 | |
N | 5029 | 6.2% |
G | 4544 | 5.6% |
L | 4544 | 5.6% |
S | 4059 | 5.0% |
Other values (7) | 5403 |
Common
Value | Count | Frequency (%) |
_ | 970 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 81725 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
R | 13459 | |
I | 12315 | |
E | 11345 | |
D | 7632 | |
A | 6455 | |
M | 5970 | |
N | 5029 | 6.2% |
G | 4544 | 5.6% |
L | 4544 | 5.6% |
S | 4059 | 5.0% |
Other values (8) | 6373 |
num_group1
Real number (ℝ)
ZEROS
 
Distinct | 10 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 0.796531664 |
Minimum | 0 |
---|---|
Maximum | 9 |
Zeros | 1526659 |
Zeros (%) | 51.3% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 22.7 MiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0 |
median | 0 |
Q3 | 1 |
95-th percentile | 3 |
Maximum | 9 |
Range | 9 |
Interquartile range (IQR) | 1 |
Descriptive statistics
Standard deviation | 0.9777888443 |
---|---|
Coefficient of variation (CV) | 1.227558035 |
Kurtosis | 0.2510041297 |
Mean | 0.796531664 |
Median Absolute Deviation (MAD) | 0 |
Skewness | 1.038853795 |
Sum | 2368878 |
Variance | 0.956071024 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
0 | 1526659 | |
1 | 757320 | |
2 | 484214 | 16.3% |
3 | 181768 | 6.1% |
4 | 22453 | 0.8% |
5 | 1466 | < 0.1% |
6 | 99 | < 0.1% |
7 | 8 | < 0.1% |
8 | 2 | < 0.1% |
9 | 2 | < 0.1% |
Value | Count | Frequency (%) |
0 | 1526659 | |
1 | 757320 | |
2 | 484214 | 16.3% |
3 | 181768 | 6.1% |
4 | 22453 | 0.8% |
Value | Count | Frequency (%) |
9 | 2 | < 0.1% |
8 | 2 | < 0.1% |
7 | 8 | < 0.1% |
6 | 99 | < 0.1% |
5 | 1466 |
personindex_1023L
Real number (ℝ)
MISSING
  ZEROS
 
Distinct | 7 |
---|---|
Distinct (%) | < 0.1% |
Missing | 642283 |
Missing (%) | 21.6% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 0.4383567754 |
Minimum | 0 |
---|---|
Maximum | 6 |
Zeros | 1526659 |
Zeros (%) | 51.3% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 22.7 MiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0 |
median | 0 |
Q3 | 1 |
95-th percentile | 2 |
Maximum | 6 |
Range | 6 |
Interquartile range (IQR) | 1 |
Descriptive statistics
Standard deviation | 0.6596617567 |
---|---|
Coefficient of variation (CV) | 1.504851285 |
Kurtosis | 0.432036164 |
Mean | 0.4383567754 |
Median Absolute Deviation (MAD) | 0 |
Skewness | 1.245270671 |
Sum | 1022120 |
Variance | 0.4351536333 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
0 | 1526659 | |
1 | 591033 | 19.9% |
2 | 211121 | 7.1% |
3 | 2740 | 0.1% |
4 | 151 | < 0.1% |
5 | 3 | < 0.1% |
6 | 1 | < 0.1% |
(Missing) | 642283 |
Value | Count | Frequency (%) |
0 | 1526659 | |
1 | 591033 | 19.9% |
2 | 211121 | 7.1% |
3 | 2740 | 0.1% |
4 | 151 | < 0.1% |
Value | Count | Frequency (%) |
6 | 1 | < 0.1% |
5 | 3 | < 0.1% |
4 | 151 | < 0.1% |
3 | 2740 | 0.1% |
2 | 211121 |
persontype_1072L
Real number (ℝ)
Distinct | 3 |
---|---|
Distinct (%) | < 0.1% |
Missing | 6117 |
Missing (%) | 0.2% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 2.034861992 |
Minimum | 1 |
---|---|
Maximum | 5 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 22.7 MiB |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 1 |
Q1 | 1 |
median | 1 |
Q3 | 4 |
95-th percentile | 5 |
Maximum | 5 |
Range | 4 |
Interquartile range (IQR) | 3 |
Descriptive statistics
Standard deviation | 1.707170653 |
---|---|
Coefficient of variation (CV) | 0.8389613937 |
Kurtosis | -0.8075389231 |
Mean | 2.034861992 |
Median Absolute Deviation (MAD) | 0 |
Skewness | 1.069831992 |
Sum | 6039214 |
Variance | 2.914431638 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
1 | 2161932 | |
5 | 653514 | 22.0% |
4 | 152428 | 5.1% |
(Missing) | 6117 | 0.2% |
Value | Count | Frequency (%) |
1 | 2161932 | |
4 | 152428 | 5.1% |
5 | 653514 | 22.0% |
Value | Count | Frequency (%) |
5 | 653514 | 22.0% |
4 | 152428 | 5.1% |
1 | 2161932 |
persontype_792L
Real number (ℝ)
MISSING
 
Distinct | 3 |
---|---|
Distinct (%) | < 0.1% |
Missing | 642283 |
Missing (%) | 21.6% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 2.315690901 |
Minimum | 1 |
---|---|
Maximum | 5 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 22.7 MiB |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 1 |
Q1 | 1 |
median | 1 |
Q3 | 5 |
95-th percentile | 5 |
Maximum | 5 |
Range | 4 |
Interquartile range (IQR) | 4 |
Descriptive statistics
Standard deviation | 1.826378165 |
---|---|
Coefficient of variation (CV) | 0.7886968695 |
Kurtosis | -1.47028854 |
Mean | 2.315690901 |
Median Absolute Deviation (MAD) | 0 |
Skewness | 0.6951560876 |
Sum | 5399515 |
Variance | 3.3356572 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
1 | 1526659 | |
5 | 652660 | |
4 | 152389 | 5.1% |
(Missing) | 642283 |
Value | Count | Frequency (%) |
1 | 1526659 | |
4 | 152389 | 5.1% |
5 | 652660 |
Value | Count | Frequency (%) |
5 | 652660 | |
4 | 152389 | 5.1% |
1 | 1526659 |
Distinct | 991 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 22.7 MiB |
Length
Max length | 12 |
---|---|
Median length | 11 |
Mean length | 9.387631637 |
Min length | 8 |
Characters and Unicode
Total characters | 27918732 |
---|---|
Distinct characters | 18 |
Distinct categories | 4 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 152 ? |
---|---|
Unique (%) | < 0.1% |
Sample
1st row | P88_18_84 |
---|---|
2nd row | a55475b1 |
3rd row | a55475b1 |
4th row | a55475b1 |
5th row | P103_93_94 |
Value | Count | Frequency (%) |
a55475b1 | 1447423 | |
p131_33_167 | 79574 | 2.7% |
p197_47_166 | 69825 | 2.3% |
p123_6_84 | 55824 | 1.9% |
p98_137_111 | 42880 | 1.4% |
p159_143_123 | 36207 | 1.2% |
p62_144_102 | 35563 | 1.2% |
p204_99_158 | 34868 | 1.2% |
p19_11_176 | 31849 | 1.1% |
p178_112_160 | 29988 | 1.0% |
Other values (981) | 1109990 |
Most occurring characters
Value | Count | Frequency (%) |
1 | 5180754 | |
5 | 5166806 | |
_ | 3053136 | |
7 | 2645105 | |
4 | 2372638 | |
P | 1526566 | 5.5% |
a | 1447423 | 5.2% |
b | 1447423 | 5.2% |
6 | 1024509 | 3.7% |
3 | 975246 | 3.5% |
Other values (8) | 3079126 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 20444176 | |
Connector Punctuation | 3053136 | 10.9% |
Lowercase Letter | 2894852 | 10.4% |
Uppercase Letter | 1526568 | 5.5% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
1 | 5180754 | |
5 | 5166806 | |
7 | 2645105 | |
4 | 2372638 | |
6 | 1024509 | 5.0% |
3 | 975246 | 4.8% |
2 | 860122 | 4.2% |
8 | 819632 | 4.0% |
9 | 811448 | 4.0% |
0 | 587916 | 2.9% |
Lowercase Letter
Value | Count | Frequency (%) |
a | 1447423 | |
b | 1447423 | |
t | 2 | < 0.1% |
h | 2 | < 0.1% |
e | 2 | < 0.1% |
Uppercase Letter
Value | Count | Frequency (%) |
P | 1526566 | |
Q | 2 | < 0.1% |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 3053136 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 23497312 | |
Latin | 4421420 | 15.8% |
Most frequent character per script
Common
Value | Count | Frequency (%) |
1 | 5180754 | |
5 | 5166806 | |
_ | 3053136 | |
7 | 2645105 | |
4 | 2372638 | |
6 | 1024509 | 4.4% |
3 | 975246 | 4.2% |
2 | 860122 | 3.7% |
8 | 819632 | 3.5% |
9 | 811448 | 3.5% |
Latin
Value | Count | Frequency (%) |
P | 1526566 | |
a | 1447423 | |
b | 1447423 | |
Q | 2 | < 0.1% |
t | 2 | < 0.1% |
h | 2 | < 0.1% |
e | 2 | < 0.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 27918732 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
1 | 5180754 | |
5 | 5166806 | |
_ | 3053136 | |
7 | 2645105 | |
4 | 2372638 | |
P | 1526566 | 5.5% |
a | 1447423 | 5.2% |
b | 1447423 | 5.2% |
6 | 1024509 | 3.7% |
3 | 975246 | 3.5% |
Other values (8) | 3079126 |
Distinct | 3531 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 22.7 MiB |
Length
Max length | 13 |
---|---|
Median length | 8 |
Mean length | 9.259763395 |
Min length | 7 |
Characters and Unicode
Total characters | 27538453 |
---|---|
Distinct characters | 33 |
Distinct categories | 4 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 64 ? |
---|---|
Unique (%) | < 0.1% |
Sample
1st row | P167_100_165 |
---|---|
2nd row | a55475b1 |
3rd row | a55475b1 |
4th row | a55475b1 |
5th row | P176_37_166 |
Value | Count | Frequency (%) |
a55475b1 | 1515676 | |
p161_14_174 | 96828 | 3.3% |
p144_138_111 | 81525 | 2.7% |
p46_103_143 | 54825 | 1.8% |
p91_47_168 | 53660 | 1.8% |
p62_116_179 | 33450 | 1.1% |
p212_16_169 | 30579 | 1.0% |
p157_35_170 | 29059 | 1.0% |
p11_15_81 | 28689 | 1.0% |
p85_138_173 | 26491 | 0.9% |
Other values (3521) | 1023209 |
Most occurring characters
Value | Count | Frequency (%) |
5 | 5325384 | |
1 | 5076787 | |
_ | 2916630 | |
4 | 2755042 | |
7 | 2434168 | |
a | 1516042 | 5.5% |
b | 1515676 | 5.5% |
P | 1456545 | 5.3% |
8 | 980009 | 3.6% |
6 | 966794 | 3.5% |
Other values (23) | 2595376 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 20125819 | |
Lowercase Letter | 3037689 | 11.0% |
Connector Punctuation | 2916630 | 10.6% |
Uppercase Letter | 1458315 | 5.3% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
a | 1516042 | |
b | 1515676 | |
e | 1777 | 0.1% |
t | 753 | < 0.1% |
r | 670 | < 0.1% |
o | 597 | < 0.1% |
m | 577 | < 0.1% |
l | 282 | < 0.1% |
u | 215 | < 0.1% |
g | 177 | < 0.1% |
Other values (10) | 923 | < 0.1% |
Decimal Number
Value | Count | Frequency (%) |
5 | 5325384 | |
1 | 5076787 | |
4 | 2755042 | |
7 | 2434168 | |
8 | 980009 | 4.9% |
6 | 966794 | 4.8% |
3 | 829255 | 4.1% |
2 | 651988 | 3.2% |
9 | 579147 | 2.9% |
0 | 527245 | 2.6% |
Uppercase Letter
Value | Count | Frequency (%) |
P | 1456545 | |
Q | 1770 | 0.1% |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 2916630 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 23042449 | |
Latin | 4496004 | 16.3% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
a | 1516042 | |
b | 1515676 | |
P | 1456545 | |
e | 1777 | < 0.1% |
Q | 1770 | < 0.1% |
t | 753 | < 0.1% |
r | 670 | < 0.1% |
o | 597 | < 0.1% |
m | 577 | < 0.1% |
l | 282 | < 0.1% |
Other values (12) | 1315 | < 0.1% |
Common
Value | Count | Frequency (%) |
5 | 5325384 | |
1 | 5076787 | |
_ | 2916630 | |
4 | 2755042 | |
7 | 2434168 | |
8 | 980009 | 4.3% |
6 | 966794 | 4.2% |
3 | 829255 | 3.6% |
2 | 651988 | 2.8% |
9 | 579147 | 2.5% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 27538453 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
5 | 5325384 | |
1 | 5076787 | |
_ | 2916630 | |
4 | 2755042 | |
7 | 2434168 | |
a | 1516042 | 5.5% |
b | 1515676 | 5.5% |
P | 1456545 | 5.3% |
8 | 980009 | 3.6% |
6 | 966794 | 3.5% |
Other values (23) | 2595376 |
MISSING
 
Distinct | 10 |
---|---|
Distinct (%) | < 0.1% |
Missing | 2168942 |
Missing (%) | 72.9% |
Memory size | 22.7 MiB |
Length
Max length | 14 |
---|---|
Median length | 12 |
Mean length | 7.319615328 |
Min length | 5 |
Characters and Unicode
Total characters | 5892649 |
---|---|
Distinct characters | 19 |
Distinct categories | 2 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | SPOUSE |
---|---|
2nd row | COLLEAGUE |
3rd row | SIBLING |
4th row | OTHER_RELATIVE |
5th row | SIBLING |
Value | Count | Frequency (%) |
spouse | 152389 | |
sibling | 151484 | |
child | 143018 | |
other_relative | 112570 | |
friend | 100453 | |
parent | 65009 | |
colleague | 48795 | 6.1% |
other | 20154 | 2.5% |
neighbor | 9991 | 1.2% |
grand_parent | 1186 | 0.1% |
Most occurring characters
Value | Count | Frequency (%) |
E | 784482 | |
I | 669000 | |
L | 504662 | 8.6% |
S | 456262 | 7.7% |
R | 423119 | 7.2% |
O | 343899 | 5.8% |
N | 329309 | 5.6% |
T | 311489 | 5.3% |
H | 285733 | 4.8% |
D | 244657 | 4.2% |
Other values (9) | 1540037 |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 5778893 | |
Connector Punctuation | 113756 | 1.9% |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
E | 784482 | |
I | 669000 | |
L | 504662 | 8.7% |
S | 456262 | 7.9% |
R | 423119 | 7.3% |
O | 343899 | 6.0% |
N | 329309 | 5.7% |
T | 311489 | 5.4% |
H | 285733 | 4.9% |
D | 244657 | 4.2% |
Other values (8) | 1426281 |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 113756 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 5778893 | |
Common | 113756 | 1.9% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
E | 784482 | |
I | 669000 | |
L | 504662 | 8.7% |
S | 456262 | 7.9% |
R | 423119 | 7.3% |
O | 343899 | 6.0% |
N | 329309 | 5.7% |
T | 311489 | 5.4% |
H | 285733 | 4.9% |
D | 244657 | 4.2% |
Other values (8) | 1426281 |
Common
Value | Count | Frequency (%) |
_ | 113756 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 5892649 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
E | 784482 | |
I | 669000 | |
L | 504662 | 8.6% |
S | 456262 | 7.7% |
R | 423119 | 7.2% |
O | 343899 | 5.8% |
N | 329309 | 5.6% |
T | 311489 | 5.3% |
H | 285733 | 4.8% |
D | 244657 | 4.2% |
Other values (9) | 1540037 |
MISSING
 
Distinct | 10 |
---|---|
Distinct (%) | < 0.1% |
Missing | 2168049 |
Missing (%) | 72.9% |
Memory size | 22.7 MiB |
Length
Max length | 14 |
---|---|
Median length | 12 |
Mean length | 7.320624809 |
Min length | 5 |
Characters and Unicode
Total characters | 5899999 |
---|---|
Distinct characters | 19 |
Distinct categories | 2 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | SPOUSE |
---|---|
2nd row | COLLEAGUE |
3rd row | SIBLING |
4th row | OTHER_RELATIVE |
5th row | SIBLING |
Value | Count | Frequency (%) |
spouse | 152428 | |
sibling | 151666 | |
child | 143151 | |
other_relative | 112775 | |
friend | 100595 | |
parent | 65081 | |
colleague | 48847 | 6.1% |
other | 20157 | 2.5% |
neighbor | 10051 | 1.2% |
grand_parent | 1191 | 0.1% |
Most occurring characters
Value | Count | Frequency (%) |
E | 785522 | |
I | 669904 | |
L | 505286 | 8.6% |
S | 456522 | 7.7% |
R | 423816 | 7.2% |
O | 344258 | 5.8% |
N | 329775 | 5.6% |
T | 311979 | 5.3% |
H | 286134 | 4.8% |
D | 244937 | 4.2% |
Other values (9) | 1541866 |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 5786033 | |
Connector Punctuation | 113966 | 1.9% |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
E | 785522 | |
I | 669904 | |
L | 505286 | 8.7% |
S | 456522 | 7.9% |
R | 423816 | 7.3% |
O | 344258 | 5.9% |
N | 329775 | 5.7% |
T | 311979 | 5.4% |
H | 286134 | 4.9% |
D | 244937 | 4.2% |
Other values (8) | 1427900 |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 113966 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 5786033 | |
Common | 113966 | 1.9% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
E | 785522 | |
I | 669904 | |
L | 505286 | 8.7% |
S | 456522 | 7.9% |
R | 423816 | 7.3% |
O | 344258 | 5.9% |
N | 329775 | 5.7% |
T | 311979 | 5.4% |
H | 286134 | 4.9% |
D | 244937 | 4.2% |
Other values (8) | 1427900 |
Common
Value | Count | Frequency (%) |
_ | 113966 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 5899999 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
E | 785522 | |
I | 669904 | |
L | 505286 | 8.6% |
S | 456522 | 7.7% |
R | 423816 | 7.2% |
O | 344258 | 5.8% |
N | 329775 | 5.6% |
T | 311979 | 5.3% |
H | 286134 | 4.8% |
D | 244937 | 4.2% |
Other values (9) | 1541866 |
remitter_829L
Boolean
CONSTANT
  MISSING
 
Distinct | 1 |
---|---|
Distinct (%) | < 0.1% |
Missing | 2168942 |
Missing (%) | 72.9% |
Memory size | 22.7 MiB |
False | |
---|---|
(Missing) |
Value | Count | Frequency (%) |
False | 805049 | 27.1% |
(Missing) | 2168942 |
role_1084L
Text
Distinct | 3 |
---|---|
Distinct (%) | < 0.1% |
Missing | 6117 |
Missing (%) | 0.2% |
Memory size | 22.7 MiB |
Value | Count | Frequency (%) |
cl | 1625689 | |
pe | 805942 | |
em | 536243 | 18.1% |
Most occurring characters
Value | Count | Frequency (%) |
C | 1625689 | |
L | 1625689 | |
E | 1342185 | |
P | 805942 | |
M | 536243 | 9.0% |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 5935748 |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
C | 1625689 | |
L | 1625689 | |
E | 1342185 | |
P | 805942 | |
M | 536243 | 9.0% |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 5935748 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
C | 1625689 | |
L | 1625689 | |
E | 1342185 | |
P | 805942 | |
M | 536243 | 9.0% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 5935748 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
C | 1625689 | |
L | 1625689 | |
E | 1342185 | |
P | 805942 | |
M | 536243 | 9.0% |
role_993L
Text
CONSTANT
  MISSING
 
Distinct | 1 |
---|---|
Distinct (%) | < 0.1% |
Missing | 2949075 |
Missing (%) | 99.2% |
Memory size | 22.7 MiB |
Value | Count | Frequency (%) |
full | 24916 |
Most occurring characters
Value | Count | Frequency (%) |
L | 49832 | |
F | 24916 | |
U | 24916 |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 99664 |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
L | 49832 | |
F | 24916 | |
U | 24916 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 99664 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
L | 49832 | |
F | 24916 | |
U | 24916 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 99664 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
L | 49832 | |
F | 24916 | |
U | 24916 |
safeguarantyflag_411L
Boolean
IMBALANCE
  MISSING
 
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 1447334 |
Missing (%) | 48.7% |
Memory size | 22.7 MiB |
True | |
---|---|
False | 81592 |
(Missing) |
Value | Count | Frequency (%) |
True | 1445065 | |
False | 81592 | 2.7% |
(Missing) | 1447334 |
sex_738L
Text
MISSING
 
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 1447332 |
Missing (%) | 48.7% |
Memory size | 22.7 MiB |
Value | Count | Frequency (%) |
f | 952776 | |
m | 573883 |
Most occurring characters
Value | Count | Frequency (%) |
F | 952776 | |
M | 573883 |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 1526659 |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
F | 952776 | |
M | 573883 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 1526659 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
F | 952776 | |
M | 573883 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 1526659 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
F | 952776 | |
M | 573883 |
type_25L
Text
Distinct | 8 |
---|---|
Distinct (%) | < 0.1% |
Missing | 6117 |
Missing (%) | 0.2% |
Memory size | 22.7 MiB |
Length
Max length | 17 |
---|---|
Median length | 14 |
Mean length | 10.58595075 |
Min length | 5 |
Characters and Unicode
Total characters | 31417768 |
---|---|
Distinct characters | 19 |
Distinct categories | 2 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 1 ? |
---|---|
Unique (%) | < 0.1% |
Sample
1st row | PRIMARY_MOBILE |
---|---|
2nd row | PHONE |
3rd row | PHONE |
4th row | PHONE |
5th row | PRIMARY_MOBILE |
Value | Count | Frequency (%) |
primary_mobile | 1770812 | |
phone | 1087729 | |
home_phone | 95036 | 3.2% |
alternative_phone | 9932 | 0.3% |
secondary_mobile | 3959 | 0.1% |
primary_email | 392 | < 0.1% |
13 | < 0.1% | |
1 | < 0.1% |
Most occurring characters
Value | Count | Frequency (%) |
M | 3641403 | |
R | 3556300 | |
I | 3556300 | |
E | 3086720 | |
O | 3066463 | |
P | 2963927 | |
_ | 1880131 | |
A | 1795445 | |
L | 1785095 | |
Y | 1775163 | |
Other values (9) | 4310821 |
Most occurring categories
Value | Count | Frequency (%) |
Uppercase Letter | 29537637 | |
Connector Punctuation | 1880131 | 6.0% |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
M | 3641403 | |
R | 3556300 | |
I | 3556300 | |
E | 3086720 | |
O | 3066463 | |
P | 2963927 | |
A | 1795445 | |
L | 1785095 | |
Y | 1775163 | |
B | 1774771 | |
Other values (8) | 2536050 |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 1880131 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 29537637 | |
Common | 1880131 | 6.0% |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
M | 3641403 | |
R | 3556300 | |
I | 3556300 | |
E | 3086720 | |
O | 3066463 | |
P | 2963927 | |
A | 1795445 | |
L | 1785095 | |
Y | 1775163 | |
B | 1774771 | |
Other values (8) | 2536050 |
Common
Value | Count | Frequency (%) |
_ | 1880131 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 31417768 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
M | 3641403 | |
R | 3556300 | |
I | 3556300 | |
E | 3086720 | |
O | 3066463 | |
P | 2963927 | |
_ | 1880131 | |
A | 1795445 | |
L | 1785095 | |
Y | 1775163 | |
Other values (9) | 4310821 |