Overview

Dataset statistics

Number of variables37
Number of observations2973991
Missing cells51051536
Missing cells (%)46.4%
Total size in memory839.5 MiB
Average record size in memory296.0 B

Variable types

Numeric7
Text25
Boolean5

Alerts

contaddr_matchlist_1032L has constant value ""Constant
remitter_829L has constant value ""Constant
role_993L has constant value ""Constant
contaddr_smempladdr_334L is highly imbalanced (95.8%)Imbalance
safeguarantyflag_411L is highly imbalanced (69.9%)Imbalance
birth_259D has 1447332 (48.7%) missing valuesMissing
birthdate_87D has 2949075 (99.2%) missing valuesMissing
childnum_185L has 2964084 (99.7%) missing valuesMissing
contaddr_matchlist_1032L has 1447773 (48.7%) missing valuesMissing
contaddr_smempladdr_334L has 1447773 (48.7%) missing valuesMissing
empl_employedfrom_271D has 2407290 (80.9%) missing valuesMissing
empl_employedtotal_800L has 2445676 (82.2%) missing valuesMissing
empl_industry_691L has 2451755 (82.4%) missing valuesMissing
familystate_447L has 2245378 (75.5%) missing valuesMissing
gender_992L has 2949075 (99.2%) missing valuesMissing
housetype_905L has 2873173 (96.6%) missing valuesMissing
housingtype_772L has 2964176 (99.7%) missing valuesMissing
incometype_1044T has 1447332 (48.7%) missing valuesMissing
isreference_387L has 2949075 (99.2%) missing valuesMissing
mainoccupationinc_384A has 1447332 (48.7%) missing valuesMissing
maritalst_703L has 2962646 (99.6%) missing valuesMissing
personindex_1023L has 642283 (21.6%) missing valuesMissing
persontype_792L has 642283 (21.6%) missing valuesMissing
relationshiptoclient_415T has 2168942 (72.9%) missing valuesMissing
relationshiptoclient_642T has 2168049 (72.9%) missing valuesMissing
remitter_829L has 2168942 (72.9%) missing valuesMissing
role_993L has 2949075 (99.2%) missing valuesMissing
safeguarantyflag_411L has 1447334 (48.7%) missing valuesMissing
sex_738L has 1447332 (48.7%) missing valuesMissing
num_group1 has 1526659 (51.3%) zerosZeros
personindex_1023L has 1526659 (51.3%) zerosZeros

Reproduction

Analysis started2024-02-13 19:53:39.005022
Analysis finished2024-02-13 19:54:00.981048
Duration21.98 seconds
Software versionydata-profiling vv4.6.4
Download configurationconfig.json

Variables

case_id
Real number (ℝ)

Distinct1526659
Distinct (%)51.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1055195.612
Minimum0
Maximum2703454
Zeros4
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size22.7 MiB
2024-02-13T20:54:01.109047image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile43181
Q1637353.5
median890817
Q31568333.5
95-th percentile2597849.5
Maximum2703454
Range2703454
Interquartile range (IQR)930980

Descriptive statistics

Standard deviation724571.3851
Coefficient of variation (CV)0.6866702033
Kurtosis-0.3569768256
Mean1055195.612
Median Absolute Deviation (MAD)580360
Skewness0.5677730015
Sum3.138142252 × 1012
Variance5.250036921 × 1011
MonotonicityIncreasing
2024-02-13T20:54:01.328057image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
147982 10
 
< 0.1%
706273 10
 
< 0.1%
141673 8
 
< 0.1%
25018 8
 
< 0.1%
1757817 8
 
< 0.1%
124748 8
 
< 0.1%
608607 8
 
< 0.1%
693470 8
 
< 0.1%
611761 7
 
< 0.1%
200877 7
 
< 0.1%
Other values (1526649) 2973909
> 99.9%
ValueCountFrequency (%)
0 4
< 0.1%
1 5
< 0.1%
2 5
< 0.1%
3 3
< 0.1%
4 4
< 0.1%
ValueCountFrequency (%)
2703454 1
< 0.1%
2703453 2
< 0.1%
2703452 1
< 0.1%
2703451 2
< 0.1%
2703450 1
< 0.1%

birth_259D
Text

MISSING 

Distinct680
Distinct (%)< 0.1%
Missing1447332
Missing (%)48.7%
Memory size22.7 MiB
2024-02-13T20:54:01.816712image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters15266590
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1986-07-01
2nd row1957-08-01
3rd row1974-12-01
4th row1993-08-01
5th row1994-01-01
ValueCountFrequency (%)
1988-07-01 3713
 
0.2%
1986-07-01 3655
 
0.2%
1987-05-01 3655
 
0.2%
1986-08-01 3626
 
0.2%
1988-08-01 3584
 
0.2%
1987-06-01 3581
 
0.2%
1987-08-01 3569
 
0.2%
1987-07-01 3554
 
0.2%
1989-03-01 3547
 
0.2%
1988-05-01 3545
 
0.2%
Other values (670) 1490630
97.6%
2024-02-13T20:54:02.382728image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 3827240
25.1%
- 3053318
20.0%
0 2961792
19.4%
9 2059991
13.5%
8 672485
 
4.4%
7 615279
 
4.0%
6 536355
 
3.5%
5 526876
 
3.5%
2 393048
 
2.6%
4 331903
 
2.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 12213272
80.0%
Dash Punctuation 3053318
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 3827240
31.3%
0 2961792
24.3%
9 2059991
16.9%
8 672485
 
5.5%
7 615279
 
5.0%
6 536355
 
4.4%
5 526876
 
4.3%
2 393048
 
3.2%
4 331903
 
2.7%
3 288303
 
2.4%
Dash Punctuation
ValueCountFrequency (%)
- 3053318
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 15266590
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 3827240
25.1%
- 3053318
20.0%
0 2961792
19.4%
9 2059991
13.5%
8 672485
 
4.4%
7 615279
 
4.0%
6 536355
 
3.5%
5 526876
 
3.5%
2 393048
 
2.6%
4 331903
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 15266590
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 3827240
25.1%
- 3053318
20.0%
0 2961792
19.4%
9 2059991
13.5%
8 672485
 
4.4%
7 615279
 
4.0%
6 536355
 
3.5%
5 526876
 
3.5%
2 393048
 
2.6%
4 331903
 
2.2%

birthdate_87D
Text

MISSING 

Distinct659
Distinct (%)2.6%
Missing2949075
Missing (%)99.2%
Memory size22.7 MiB
2024-02-13T20:54:02.785856image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters249160
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique22 ?
Unique (%)0.1%

Sample

1st row1991-01-01
2nd row1991-01-01
3rd row1988-09-01
4th row1994-09-01
5th row1969-09-01
ValueCountFrequency (%)
1982-04-01 197
 
0.8%
1981-04-01 196
 
0.8%
1984-04-01 183
 
0.7%
1983-04-01 144
 
0.6%
1985-05-01 126
 
0.5%
1988-06-01 124
 
0.5%
1986-06-01 120
 
0.5%
1983-12-01 116
 
0.5%
1989-03-01 114
 
0.5%
1990-06-01 108
 
0.4%
Other values (649) 23488
94.3%
2024-02-13T20:54:03.353824image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 62349
25.0%
- 49832
20.0%
0 48226
19.4%
9 35346
14.2%
8 14151
 
5.7%
7 9082
 
3.6%
6 7221
 
2.9%
2 6491
 
2.6%
5 6313
 
2.5%
4 5337
 
2.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 199328
80.0%
Dash Punctuation 49832
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 62349
31.3%
0 48226
24.2%
9 35346
17.7%
8 14151
 
7.1%
7 9082
 
4.6%
6 7221
 
3.6%
2 6491
 
3.3%
5 6313
 
3.2%
4 5337
 
2.7%
3 4812
 
2.4%
Dash Punctuation
ValueCountFrequency (%)
- 49832
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 249160
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 62349
25.0%
- 49832
20.0%
0 48226
19.4%
9 35346
14.2%
8 14151
 
5.7%
7 9082
 
3.6%
6 7221
 
2.9%
2 6491
 
2.6%
5 6313
 
2.5%
4 5337
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 249160
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 62349
25.0%
- 49832
20.0%
0 48226
19.4%
9 35346
14.2%
8 14151
 
5.7%
7 9082
 
3.6%
6 7221
 
2.9%
2 6491
 
2.6%
5 6313
 
2.5%
4 5337
 
2.1%

childnum_185L
Real number (ℝ)

MISSING 

Distinct11
Distinct (%)0.1%
Missing2964084
Missing (%)99.7%
Infinite0
Infinite (%)0.0%
Mean0.6160290704
Minimum0
Maximum11
Zeros6043
Zeros (%)0.2%
Negative0
Negative (%)0.0%
Memory size22.7 MiB
2024-02-13T20:54:03.502823image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile2
Maximum11
Range11
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.9660800059
Coefficient of variation (CV)1.568237689
Kurtosis7.396626101
Mean0.6160290704
Median Absolute Deviation (MAD)0
Skewness2.180601546
Sum6103
Variance0.9333105778
MonotonicityNot monotonic
2024-02-13T20:54:03.747157image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
0 6043
 
0.2%
1 2371
 
0.1%
2 1021
 
< 0.1%
3 313
 
< 0.1%
4 88
 
< 0.1%
5 43
 
< 0.1%
6 20
 
< 0.1%
7 5
 
< 0.1%
11 1
 
< 0.1%
8 1
 
< 0.1%
(Missing) 2964084
99.7%
ValueCountFrequency (%)
0 6043
0.2%
1 2371
 
0.1%
2 1021
 
< 0.1%
3 313
 
< 0.1%
4 88
 
< 0.1%
ValueCountFrequency (%)
11 1
 
< 0.1%
10 1
 
< 0.1%
8 1
 
< 0.1%
7 5
 
< 0.1%
6 20
< 0.1%
Distinct975
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size22.7 MiB
2024-02-13T20:54:04.101662image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length12
Median length11
Mean length9.385123223
Min length8

Characters and Unicode

Total characters27911272
Distinct characters18
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique154 ?
Unique (%)< 0.1%

Sample

1st rowP88_18_84
2nd rowa55475b1
3rd rowa55475b1
4th rowa55475b1
5th rowP103_93_94
ValueCountFrequency (%)
a55475b1 1449674
48.7%
p131_33_167 80070
 
2.7%
p197_47_166 77354
 
2.6%
p123_6_84 55868
 
1.9%
p98_137_111 42799
 
1.4%
p159_143_123 35814
 
1.2%
p62_144_102 35641
 
1.2%
p204_99_158 34659
 
1.2%
p19_11_176 31987
 
1.1%
p178_112_160 30076
 
1.0%
Other values (965) 1100049
37.0%
2024-02-13T20:54:04.640129image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 5171032
18.5%
5 5168667
18.5%
_ 3048634
10.9%
7 2652943
9.5%
4 2376505
8.5%
P 1524315
 
5.5%
a 1449674
 
5.2%
b 1449674
 
5.2%
6 1032051
 
3.7%
3 972874
 
3.5%
Other values (8) 3064903
11.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 20438967
73.2%
Connector Punctuation 3048634
 
10.9%
Lowercase Letter 2899354
 
10.4%
Uppercase Letter 1524317
 
5.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 5171032
25.3%
5 5168667
25.3%
7 2652943
13.0%
4 2376505
11.6%
6 1032051
 
5.0%
3 972874
 
4.8%
2 852697
 
4.2%
8 814902
 
4.0%
9 814404
 
4.0%
0 582892
 
2.9%
Lowercase Letter
ValueCountFrequency (%)
a 1449674
50.0%
b 1449674
50.0%
t 2
 
< 0.1%
h 2
 
< 0.1%
e 2
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
P 1524315
> 99.9%
Q 2
 
< 0.1%
Connector Punctuation
ValueCountFrequency (%)
_ 3048634
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 23487601
84.2%
Latin 4423671
 
15.8%

Most frequent character per script

Common
ValueCountFrequency (%)
1 5171032
22.0%
5 5168667
22.0%
_ 3048634
13.0%
7 2652943
11.3%
4 2376505
10.1%
6 1032051
 
4.4%
3 972874
 
4.1%
2 852697
 
3.6%
8 814902
 
3.5%
9 814404
 
3.5%
Latin
ValueCountFrequency (%)
P 1524315
34.5%
a 1449674
32.8%
b 1449674
32.8%
Q 2
 
< 0.1%
t 2
 
< 0.1%
h 2
 
< 0.1%
e 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 27911272
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 5171032
18.5%
5 5168667
18.5%
_ 3048634
10.9%
7 2652943
9.5%
4 2376505
8.5%
P 1524315
 
5.5%
a 1449674
 
5.2%
b 1449674
 
5.2%
6 1032051
 
3.7%
3 972874
 
3.5%
Other values (8) 3064903
11.0%

contaddr_matchlist_1032L
Boolean

CONSTANT  MISSING 

Distinct1
Distinct (%)< 0.1%
Missing1447773
Missing (%)48.7%
Memory size22.7 MiB
False
1526218 
(Missing)
1447773 
ValueCountFrequency (%)
False 1526218
51.3%
(Missing) 1447773
48.7%
2024-02-13T20:54:04.778110image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

contaddr_smempladdr_334L
Boolean

IMBALANCE  MISSING 

Distinct2
Distinct (%)< 0.1%
Missing1447773
Missing (%)48.7%
Memory size22.7 MiB
False
1519286 
True
 
6932
(Missing)
1447773 
ValueCountFrequency (%)
False 1519286
51.1%
True 6932
 
0.2%
(Missing) 1447773
48.7%
2024-02-13T20:54:04.873236image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Distinct3530
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size22.7 MiB
2024-02-13T20:54:05.312169image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length13
Median length8
Mean length9.297594714
Min length7

Characters and Unicode

Total characters27650963
Distinct characters33
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique68 ?
Unique (%)< 0.1%

Sample

1st rowP167_100_165
2nd rowa55475b1
3rd rowa55475b1
4th rowa55475b1
5th rowP176_37_166
ValueCountFrequency (%)
a55475b1 1474844
49.6%
p161_14_174 97924
 
3.3%
p144_138_111 90612
 
3.0%
p46_103_143 55529
 
1.9%
p91_47_168 53628
 
1.8%
p62_116_179 34396
 
1.2%
p212_16_169 32118
 
1.1%
p157_35_170 30610
 
1.0%
p11_15_81 30256
 
1.0%
p131_154_48 28360
 
1.0%
Other values (3520) 1045714
35.2%
2024-02-13T20:54:05.868219image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5 5225088
18.9%
1 5153745
18.6%
_ 2998294
10.8%
4 2755481
10.0%
7 2409867
8.7%
P 1497360
 
5.4%
a 1475205
 
5.3%
b 1474844
 
5.3%
8 1012664
 
3.7%
6 987117
 
3.6%
Other values (23) 2661298
9.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 20197415
73.0%
Connector Punctuation 2998294
 
10.8%
Lowercase Letter 2956107
 
10.7%
Uppercase Letter 1499147
 
5.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 1475205
49.9%
b 1474844
49.9%
e 1837
 
0.1%
t 771
 
< 0.1%
r 682
 
< 0.1%
m 610
 
< 0.1%
o 593
 
< 0.1%
l 287
 
< 0.1%
u 212
 
< 0.1%
g 170
 
< 0.1%
Other values (10) 896
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
5 5225088
25.9%
1 5153745
25.5%
4 2755481
13.6%
7 2409867
11.9%
8 1012664
 
5.0%
6 987117
 
4.9%
3 858532
 
4.3%
2 663914
 
3.3%
9 595785
 
2.9%
0 535222
 
2.6%
Uppercase Letter
ValueCountFrequency (%)
P 1497360
99.9%
Q 1787
 
0.1%
Connector Punctuation
ValueCountFrequency (%)
_ 2998294
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 23195709
83.9%
Latin 4455254
 
16.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
P 1497360
33.6%
a 1475205
33.1%
b 1474844
33.1%
e 1837
 
< 0.1%
Q 1787
 
< 0.1%
t 771
 
< 0.1%
r 682
 
< 0.1%
m 610
 
< 0.1%
o 593
 
< 0.1%
l 287
 
< 0.1%
Other values (12) 1278
 
< 0.1%
Common
ValueCountFrequency (%)
5 5225088
22.5%
1 5153745
22.2%
_ 2998294
12.9%
4 2755481
11.9%
7 2409867
10.4%
8 1012664
 
4.4%
6 987117
 
4.3%
3 858532
 
3.7%
2 663914
 
2.9%
9 595785
 
2.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 27650963
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5 5225088
18.9%
1 5153745
18.6%
_ 2998294
10.8%
4 2755481
10.0%
7 2409867
8.7%
P 1497360
 
5.4%
a 1475205
 
5.3%
b 1474844
 
5.3%
8 1012664
 
3.7%
6 987117
 
3.6%
Other values (23) 2661298
9.6%
Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size22.7 MiB
2024-02-13T20:54:06.055478image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length11
Median length8
Mean length8.604439287
Min length8

Characters and Unicode

Total characters25589525
Distinct characters14
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowP97_36_170
2nd rowa55475b1
3rd rowa55475b1
4th rowa55475b1
5th rowP97_36_170
ValueCountFrequency (%)
a55475b1 2234573
75.1%
p97_36_170 415087
 
14.0%
p33_146_175 263189
 
8.8%
p106_81_188 54931
 
1.8%
p17_36_170 5570
 
0.2%
p157_18_172 641
 
< 0.1%
2024-02-13T20:54:06.358775image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5 6967549
27.2%
1 3353894
13.1%
7 3340358
13.1%
4 2497762
 
9.8%
a 2234573
 
8.7%
b 2234573
 
8.7%
_ 1478836
 
5.8%
3 947035
 
3.7%
P 739418
 
2.9%
6 738777
 
2.9%
Other values (4) 1056750
 
4.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 18902125
73.9%
Lowercase Letter 4469146
 
17.5%
Connector Punctuation 1478836
 
5.8%
Uppercase Letter 739418
 
2.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5 6967549
36.9%
1 3353894
17.7%
7 3340358
17.7%
4 2497762
 
13.2%
3 947035
 
5.0%
6 738777
 
3.9%
0 475588
 
2.5%
9 415087
 
2.2%
8 165434
 
0.9%
2 641
 
< 0.1%
Lowercase Letter
ValueCountFrequency (%)
a 2234573
50.0%
b 2234573
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1478836
100.0%
Uppercase Letter
ValueCountFrequency (%)
P 739418
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 20380961
79.6%
Latin 5208564
 
20.4%

Most frequent character per script

Common
ValueCountFrequency (%)
5 6967549
34.2%
1 3353894
16.5%
7 3340358
16.4%
4 2497762
 
12.3%
_ 1478836
 
7.3%
3 947035
 
4.6%
6 738777
 
3.6%
0 475588
 
2.3%
9 415087
 
2.0%
8 165434
 
0.8%
Latin
ValueCountFrequency (%)
a 2234573
42.9%
b 2234573
42.9%
P 739418
 
14.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 25589525
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5 6967549
27.2%
1 3353894
13.1%
7 3340358
13.1%
4 2497762
 
9.8%
a 2234573
 
8.7%
b 2234573
 
8.7%
_ 1478836
 
5.8%
3 947035
 
3.7%
P 739418
 
2.9%
6 738777
 
2.9%
Other values (4) 1056750
 
4.1%
Distinct8075
Distinct (%)1.4%
Missing2407290
Missing (%)80.9%
Memory size22.7 MiB
2024-02-13T20:54:06.750169image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters5667010
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2129 ?
Unique (%)0.4%

Sample

1st row2017-09-15
2nd row2008-10-29
3rd row2010-02-15
4th row2018-05-15
5th row2014-12-15
ValueCountFrequency (%)
2018-01-15 48503
 
8.6%
2017-01-15 40965
 
7.2%
2019-01-15 29346
 
5.2%
2016-01-15 28452
 
5.0%
2015-01-15 24818
 
4.4%
2014-01-15 20590
 
3.6%
2013-01-15 15362
 
2.7%
2012-01-15 12392
 
2.2%
2010-01-15 11556
 
2.0%
2011-01-15 9139
 
1.6%
Other values (8065) 325578
57.5%
2024-02-13T20:54:07.227335image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 1390947
24.5%
0 1223499
21.6%
- 1133402
20.0%
2 667656
11.8%
5 579830
10.2%
9 178466
 
3.1%
8 143289
 
2.5%
7 108883
 
1.9%
6 100485
 
1.8%
4 75533
 
1.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 4533608
80.0%
Dash Punctuation 1133402
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 1390947
30.7%
0 1223499
27.0%
2 667656
14.7%
5 579830
12.8%
9 178466
 
3.9%
8 143289
 
3.2%
7 108883
 
2.4%
6 100485
 
2.2%
4 75533
 
1.7%
3 65020
 
1.4%
Dash Punctuation
ValueCountFrequency (%)
- 1133402
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 5667010
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 1390947
24.5%
0 1223499
21.6%
- 1133402
20.0%
2 667656
11.8%
5 579830
10.2%
9 178466
 
3.1%
8 143289
 
2.5%
7 108883
 
1.9%
6 100485
 
1.8%
4 75533
 
1.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5667010
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 1390947
24.5%
0 1223499
21.6%
- 1133402
20.0%
2 667656
11.8%
5 579830
10.2%
9 178466
 
3.1%
8 143289
 
2.5%
7 108883
 
1.9%
6 100485
 
1.8%
4 75533
 
1.3%
Distinct3
Distinct (%)< 0.1%
Missing2445676
Missing (%)82.2%
Memory size22.7 MiB
2024-02-13T20:54:07.407407image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length8.702840162
Min length8

Characters and Unicode

Total characters4597841
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMORE_FIVE
2nd rowMORE_FIVE
3rd rowMORE_FIVE
4th rowMORE_FIVE
5th rowMORE_FIVE
ValueCountFrequency (%)
more_five 371321
70.3%
more_one 126527
 
23.9%
less_one 30467
 
5.8%
2024-02-13T20:54:07.714422image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
E 1056630
23.0%
O 654842
14.2%
_ 528315
11.5%
M 497848
10.8%
R 497848
10.8%
F 371321
 
8.1%
I 371321
 
8.1%
V 371321
 
8.1%
N 156994
 
3.4%
S 60934
 
1.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 4069526
88.5%
Connector Punctuation 528315
 
11.5%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 1056630
26.0%
O 654842
16.1%
M 497848
12.2%
R 497848
12.2%
F 371321
 
9.1%
I 371321
 
9.1%
V 371321
 
9.1%
N 156994
 
3.9%
S 60934
 
1.5%
L 30467
 
0.7%
Connector Punctuation
ValueCountFrequency (%)
_ 528315
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4069526
88.5%
Common 528315
 
11.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 1056630
26.0%
O 654842
16.1%
M 497848
12.2%
R 497848
12.2%
F 371321
 
9.1%
I 371321
 
9.1%
V 371321
 
9.1%
N 156994
 
3.9%
S 60934
 
1.5%
L 30467
 
0.7%
Common
ValueCountFrequency (%)
_ 528315
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4597841
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E 1056630
23.0%
O 654842
14.2%
_ 528315
11.5%
M 497848
10.8%
R 497848
10.8%
F 371321
 
8.1%
I 371321
 
8.1%
V 371321
 
8.1%
N 156994
 
3.4%
S 60934
 
1.3%

empl_industry_691L
Text

MISSING 

Distinct24
Distinct (%)< 0.1%
Missing2451755
Missing (%)82.4%
Memory size22.7 MiB
2024-02-13T20:54:07.883897image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length17
Median length5
Mean length5.967867018
Min length2

Characters and Unicode

Total characters3116635
Distinct characters22
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOTHER
2nd rowOTHER
3rd rowOTHER
4th rowOTHER
5th rowOTHER
ValueCountFrequency (%)
other 386837
74.1%
government 35440
 
6.8%
education 30346
 
5.8%
trade 20696
 
4.0%
health 13026
 
2.5%
manufacturing 9035
 
1.7%
agriculture 5288
 
1.0%
transportation 4318
 
0.8%
real_estate 3680
 
0.7%
mining 3582
 
0.7%
Other values (14) 9988
 
1.9%
2024-02-13T20:54:08.207103image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
E 547739
17.6%
T 527185
16.9%
R 480420
15.4%
O 463035
14.9%
H 412914
13.2%
N 146199
 
4.7%
A 111725
 
3.6%
I 64572
 
2.1%
U 59832
 
1.9%
G 57309
 
1.8%
Other values (12) 245705
7.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 3112059
99.9%
Connector Punctuation 4576
 
0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 547739
17.6%
T 527185
16.9%
R 480420
15.4%
O 463035
14.9%
H 412914
13.3%
N 146199
 
4.7%
A 111725
 
3.6%
I 64572
 
2.1%
U 59832
 
1.9%
G 57309
 
1.8%
Other values (11) 241129
7.7%
Connector Punctuation
ValueCountFrequency (%)
_ 4576
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3112059
99.9%
Common 4576
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 547739
17.6%
T 527185
16.9%
R 480420
15.4%
O 463035
14.9%
H 412914
13.3%
N 146199
 
4.7%
A 111725
 
3.6%
I 64572
 
2.1%
U 59832
 
1.9%
G 57309
 
1.8%
Other values (11) 241129
7.7%
Common
ValueCountFrequency (%)
_ 4576
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3116635
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E 547739
17.6%
T 527185
16.9%
R 480420
15.4%
O 463035
14.9%
H 412914
13.2%
N 146199
 
4.7%
A 111725
 
3.6%
I 64572
 
2.1%
U 59832
 
1.9%
G 57309
 
1.8%
Other values (12) 245705
7.9%
Distinct223
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size22.7 MiB
2024-02-13T20:54:08.611379image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length12
Median length8
Mean length8.493035453
Min length8

Characters and Unicode

Total characters25258211
Distinct characters14
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowP142_57_166
2nd rowa55475b1
3rd rowa55475b1
4th rowa55475b1
5th rowP49_46_174
ValueCountFrequency (%)
a55475b1 2448480
82.3%
p197_47_166 57521
 
1.9%
p131_33_167 41960
 
1.4%
p62_144_102 21172
 
0.7%
p98_137_111 20595
 
0.7%
p123_6_84 19021
 
0.6%
p159_143_123 15720
 
0.5%
p19_11_176 13883
 
0.5%
p109_162_152 12941
 
0.4%
p112_89_137 10583
 
0.4%
Other values (213) 312115
 
10.5%
2024-02-13T20:54:09.161774image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5 7581626
30.0%
1 3772799
14.9%
7 2892790
 
11.5%
4 2778339
 
11.0%
a 2448480
 
9.7%
b 2448480
 
9.7%
_ 1051022
 
4.2%
P 525511
 
2.1%
6 399949
 
1.6%
3 350910
 
1.4%
Other values (4) 1008305
 
4.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 18784718
74.4%
Lowercase Letter 4896960
 
19.4%
Connector Punctuation 1051022
 
4.2%
Uppercase Letter 525511
 
2.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5 7581626
40.4%
1 3772799
20.1%
7 2892790
 
15.4%
4 2778339
 
14.8%
6 399949
 
2.1%
3 350910
 
1.9%
9 293310
 
1.6%
2 287439
 
1.5%
8 244274
 
1.3%
0 183282
 
1.0%
Lowercase Letter
ValueCountFrequency (%)
a 2448480
50.0%
b 2448480
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1051022
100.0%
Uppercase Letter
ValueCountFrequency (%)
P 525511
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 19835740
78.5%
Latin 5422471
 
21.5%

Most frequent character per script

Common
ValueCountFrequency (%)
5 7581626
38.2%
1 3772799
19.0%
7 2892790
 
14.6%
4 2778339
 
14.0%
_ 1051022
 
5.3%
6 399949
 
2.0%
3 350910
 
1.8%
9 293310
 
1.5%
2 287439
 
1.4%
8 244274
 
1.2%
Latin
ValueCountFrequency (%)
a 2448480
45.2%
b 2448480
45.2%
P 525511
 
9.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 25258211
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5 7581626
30.0%
1 3772799
14.9%
7 2892790
 
11.5%
4 2778339
 
11.0%
a 2448480
 
9.7%
b 2448480
 
9.7%
_ 1051022
 
4.2%
P 525511
 
2.1%
6 399949
 
1.6%
3 350910
 
1.4%
Other values (4) 1008305
 
4.0%
Distinct3339
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size22.7 MiB
2024-02-13T20:54:09.530346image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length13
Median length8
Mean length8.461853113
Min length7

Characters and Unicode

Total characters25165475
Distinct characters33
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique179 ?
Unique (%)< 0.1%

Sample

1st rowP167_100_165
2nd rowa55475b1
3rd rowa55475b1
4th rowa55475b1
5th rowP160_59_140
ValueCountFrequency (%)
a55475b1 2448349
82.3%
p144_138_111 61514
 
2.1%
p161_14_174 44960
 
1.5%
p91_47_168 22301
 
0.7%
p8_88_79 19321
 
0.6%
p46_103_143 18945
 
0.6%
p62_116_179 15640
 
0.5%
p11_15_81 13644
 
0.5%
p45_25_38 12915
 
0.4%
p118_161_181 10472
 
0.4%
Other values (3329) 305930
 
10.3%
2024-02-13T20:54:10.048209image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5 7594625
30.2%
1 3828238
15.2%
4 2955454
 
11.7%
7 2753246
 
10.9%
a 2448420
 
9.7%
b 2448349
 
9.7%
_ 1051284
 
4.2%
P 525336
 
2.1%
8 396794
 
1.6%
6 318412
 
1.3%
Other values (23) 845317
 
3.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 18690771
74.3%
Lowercase Letter 4897778
 
19.5%
Connector Punctuation 1051284
 
4.2%
Uppercase Letter 525642
 
2.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 2448420
50.0%
b 2448349
50.0%
e 261
 
< 0.1%
o 124
 
< 0.1%
r 113
 
< 0.1%
t 101
 
< 0.1%
m 74
 
< 0.1%
l 54
 
< 0.1%
g 51
 
< 0.1%
u 50
 
< 0.1%
Other values (10) 181
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
5 7594625
40.6%
1 3828238
20.5%
4 2955454
 
15.8%
7 2753246
 
14.7%
8 396794
 
2.1%
6 318412
 
1.7%
3 307112
 
1.6%
2 192096
 
1.0%
9 190232
 
1.0%
0 154562
 
0.8%
Uppercase Letter
ValueCountFrequency (%)
P 525336
99.9%
Q 306
 
0.1%
Connector Punctuation
ValueCountFrequency (%)
_ 1051284
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 19742055
78.4%
Latin 5423420
 
21.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 2448420
45.1%
b 2448349
45.1%
P 525336
 
9.7%
Q 306
 
< 0.1%
e 261
 
< 0.1%
o 124
 
< 0.1%
r 113
 
< 0.1%
t 101
 
< 0.1%
m 74
 
< 0.1%
l 54
 
< 0.1%
Other values (12) 282
 
< 0.1%
Common
ValueCountFrequency (%)
5 7594625
38.5%
1 3828238
19.4%
4 2955454
 
15.0%
7 2753246
 
13.9%
_ 1051284
 
5.3%
8 396794
 
2.0%
6 318412
 
1.6%
3 307112
 
1.6%
2 192096
 
1.0%
9 190232
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 25165475
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5 7594625
30.2%
1 3828238
15.2%
4 2955454
 
11.7%
7 2753246
 
10.9%
a 2448420
 
9.7%
b 2448349
 
9.7%
_ 1051284
 
4.2%
P 525336
 
2.1%
8 396794
 
1.6%
6 318412
 
1.3%
Other values (23) 845317
 
3.4%

familystate_447L
Text

MISSING 

Distinct5
Distinct (%)< 0.1%
Missing2245378
Missing (%)75.5%
Memory size22.7 MiB
2024-02-13T20:54:10.216017image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length19
Median length7
Mean length6.908958528
Min length6

Characters and Unicode

Total characters5033957
Distinct characters18
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMARRIED
2nd rowDIVORCED
3rd rowMARRIED
4th rowMARRIED
5th rowMARRIED
ValueCountFrequency (%)
married 484846
66.5%
single 183334
 
25.2%
widowed 32995
 
4.5%
divorced 19296
 
2.6%
living_with_partner 8142
 
1.1%
2024-02-13T20:54:10.512082image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
R 1005272
20.0%
I 744897
14.8%
E 728613
14.5%
D 589428
11.7%
A 492988
9.8%
M 484846
9.6%
N 199618
 
4.0%
G 191476
 
3.8%
L 191476
 
3.8%
S 183334
 
3.6%
Other values (8) 222009
 
4.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 5017673
99.7%
Connector Punctuation 16284
 
0.3%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
R 1005272
20.0%
I 744897
14.8%
E 728613
14.5%
D 589428
11.7%
A 492988
9.8%
M 484846
9.7%
N 199618
 
4.0%
G 191476
 
3.8%
L 191476
 
3.8%
S 183334
 
3.7%
Other values (7) 205725
 
4.1%
Connector Punctuation
ValueCountFrequency (%)
_ 16284
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5017673
99.7%
Common 16284
 
0.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
R 1005272
20.0%
I 744897
14.8%
E 728613
14.5%
D 589428
11.7%
A 492988
9.8%
M 484846
9.7%
N 199618
 
4.0%
G 191476
 
3.8%
L 191476
 
3.8%
S 183334
 
3.7%
Other values (7) 205725
 
4.1%
Common
ValueCountFrequency (%)
_ 16284
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5033957
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
R 1005272
20.0%
I 744897
14.8%
E 728613
14.5%
D 589428
11.7%
A 492988
9.8%
M 484846
9.6%
N 199618
 
4.0%
G 191476
 
3.8%
L 191476
 
3.8%
S 183334
 
3.6%
Other values (8) 222009
 
4.4%

gender_992L
Text

MISSING 

Distinct2
Distinct (%)< 0.1%
Missing2949075
Missing (%)99.2%
Memory size22.7 MiB
2024-02-13T20:54:10.603711image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters24916
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowF
2nd rowF
3rd rowM
4th rowM
5th rowM
ValueCountFrequency (%)
f 20981
84.2%
m 3935
 
15.8%
2024-02-13T20:54:10.805624image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
F 20981
84.2%
M 3935
 
15.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 24916
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
F 20981
84.2%
M 3935
 
15.8%

Most occurring scripts

ValueCountFrequency (%)
Latin 24916
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
F 20981
84.2%
M 3935
 
15.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 24916
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
F 20981
84.2%
M 3935
 
15.8%

housetype_905L
Text

MISSING 

Distinct6
Distinct (%)< 0.1%
Missing2873173
Missing (%)96.6%
Memory size22.7 MiB
2024-02-13T20:54:10.977658image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length12
Median length5
Mean length5.171993096
Min length4

Characters and Unicode

Total characters521430
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOWNED
2nd rowOWNED
3rd rowOWNED
4th rowOWNED
5th rowOWNED
ValueCountFrequency (%)
owned 94076
93.3%
parental 4212
 
4.2%
flat 1501
 
1.5%
company_flat 641
 
0.6%
coop_flat 222
 
0.2%
state_flat 166
 
0.2%
2024-02-13T20:54:11.286757image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
N 98929
19.0%
E 98454
18.9%
O 95161
18.3%
W 94076
18.0%
D 94076
18.0%
A 11761
 
2.3%
T 7074
 
1.4%
L 6742
 
1.3%
P 5075
 
1.0%
R 4212
 
0.8%
Other values (6) 5870
 
1.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 520401
99.8%
Connector Punctuation 1029
 
0.2%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 98929
19.0%
E 98454
18.9%
O 95161
18.3%
W 94076
18.1%
D 94076
18.1%
A 11761
 
2.3%
T 7074
 
1.4%
L 6742
 
1.3%
P 5075
 
1.0%
R 4212
 
0.8%
Other values (5) 4841
 
0.9%
Connector Punctuation
ValueCountFrequency (%)
_ 1029
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 520401
99.8%
Common 1029
 
0.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 98929
19.0%
E 98454
18.9%
O 95161
18.3%
W 94076
18.1%
D 94076
18.1%
A 11761
 
2.3%
T 7074
 
1.4%
L 6742
 
1.3%
P 5075
 
1.0%
R 4212
 
0.8%
Other values (5) 4841
 
0.9%
Common
ValueCountFrequency (%)
_ 1029
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 521430
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 98929
19.0%
E 98454
18.9%
O 95161
18.3%
W 94076
18.0%
D 94076
18.0%
A 11761
 
2.3%
T 7074
 
1.4%
L 6742
 
1.3%
P 5075
 
1.0%
R 4212
 
0.8%
Other values (6) 5870
 
1.1%

housingtype_772L
Text

MISSING 

Distinct6
Distinct (%)0.1%
Missing2964176
Missing (%)99.7%
Memory size22.7 MiB
2024-02-13T20:54:11.448130image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length12
Median length5
Mean length5.495873663
Min length4

Characters and Unicode

Total characters53942
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPARENTAL
2nd rowFLAT
3rd rowOWNED
4th rowOWNED
5th rowFLAT
ValueCountFrequency (%)
owned 7850
80.0%
parental 1436
 
14.6%
flat 376
 
3.8%
company_flat 95
 
1.0%
state_flat 38
 
0.4%
coop_flat 20
 
0.2%
2024-02-13T20:54:11.737421image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
N 9381
17.4%
E 9324
17.3%
O 7985
14.8%
W 7850
14.6%
D 7850
14.6%
A 3534
 
6.6%
T 2041
 
3.8%
L 1965
 
3.6%
P 1551
 
2.9%
R 1436
 
2.7%
Other values (6) 1025
 
1.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 53789
99.7%
Connector Punctuation 153
 
0.3%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 9381
17.4%
E 9324
17.3%
O 7985
14.8%
W 7850
14.6%
D 7850
14.6%
A 3534
 
6.6%
T 2041
 
3.8%
L 1965
 
3.7%
P 1551
 
2.9%
R 1436
 
2.7%
Other values (5) 872
 
1.6%
Connector Punctuation
ValueCountFrequency (%)
_ 153
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 53789
99.7%
Common 153
 
0.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 9381
17.4%
E 9324
17.3%
O 7985
14.8%
W 7850
14.6%
D 7850
14.6%
A 3534
 
6.6%
T 2041
 
3.8%
L 1965
 
3.7%
P 1551
 
2.9%
R 1436
 
2.7%
Other values (5) 872
 
1.6%
Common
ValueCountFrequency (%)
_ 153
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 53942
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 9381
17.4%
E 9324
17.3%
O 7985
14.8%
W 7850
14.6%
D 7850
14.6%
A 3534
 
6.6%
T 2041
 
3.8%
L 1965
 
3.6%
P 1551
 
2.9%
R 1436
 
2.7%
Other values (6) 1025
 
1.9%

incometype_1044T
Text

MISSING 

Distinct9
Distinct (%)< 0.1%
Missing1447332
Missing (%)48.7%
Memory size22.7 MiB
2024-02-13T20:54:11.932184image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length23
Median length17
Mean length15.97266973
Min length5

Characters and Unicode

Total characters24384820
Distinct characters21
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowSALARIED_GOVT
2nd rowSALARIED_GOVT
3rd rowEMPLOYED
4th rowEMPLOYED
5th rowEMPLOYED
ValueCountFrequency (%)
private_sector_employee 490562
32.1%
salaried_govt 373646
24.5%
retired_pensioner 311028
20.4%
employed 298158
19.5%
selfemployed 29199
 
1.9%
other 11436
 
0.7%
handicapped_2 7371
 
0.5%
handicapped_3 5258
 
0.3%
handicapped 1
 
< 0.1%
2024-02-13T20:54:12.275120image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
E 4778547
19.6%
R 2299290
9.4%
O 2004591
 
8.2%
_ 1678427
 
6.9%
T 1677234
 
6.9%
P 1644769
 
6.7%
I 1498894
 
6.1%
A 1263114
 
5.2%
L 1220764
 
5.0%
S 1204435
 
4.9%
Other values (11) 5114755
21.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 22693764
93.1%
Connector Punctuation 1678427
 
6.9%
Decimal Number 12629
 
0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 4778547
21.1%
R 2299290
10.1%
O 2004591
8.8%
T 1677234
 
7.4%
P 1644769
 
7.2%
I 1498894
 
6.6%
A 1263114
 
5.6%
L 1220764
 
5.4%
S 1204435
 
5.3%
D 1037291
 
4.6%
Other values (8) 4064835
17.9%
Decimal Number
ValueCountFrequency (%)
2 7371
58.4%
3 5258
41.6%
Connector Punctuation
ValueCountFrequency (%)
_ 1678427
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 22693764
93.1%
Common 1691056
 
6.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 4778547
21.1%
R 2299290
10.1%
O 2004591
8.8%
T 1677234
 
7.4%
P 1644769
 
7.2%
I 1498894
 
6.6%
A 1263114
 
5.6%
L 1220764
 
5.4%
S 1204435
 
5.3%
D 1037291
 
4.6%
Other values (8) 4064835
17.9%
Common
ValueCountFrequency (%)
_ 1678427
99.3%
2 7371
 
0.4%
3 5258
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 24384820
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E 4778547
19.6%
R 2299290
9.4%
O 2004591
 
8.2%
_ 1678427
 
6.9%
T 1677234
 
6.9%
P 1644769
 
6.7%
I 1498894
 
6.1%
A 1263114
 
5.2%
L 1220764
 
5.0%
S 1204435
 
4.9%
Other values (11) 5114755
21.0%

isreference_387L
Boolean

MISSING 

Distinct2
Distinct (%)< 0.1%
Missing2949075
Missing (%)99.2%
Memory size22.7 MiB
True
 
12458
False
 
12458
(Missing)
2949075 
ValueCountFrequency (%)
True 12458
 
0.4%
False 12458
 
0.4%
(Missing) 2949075
99.2%
2024-02-13T20:54:12.389999image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size22.7 MiB
2024-02-13T20:54:12.515680image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length12
Median length8
Mean length9.40439228
Min length8

Characters and Unicode

Total characters27968578
Distinct characters13
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowP10_39_147
2nd rowa55475b1
3rd rowa55475b1
4th rowa55475b1
5th rowP10_39_147
ValueCountFrequency (%)
a55475b1 1505452
50.6%
p10_39_147 848753
28.5%
p209_127_106 619786
20.8%
2024-02-13T20:54:12.831497image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5 4516356
16.1%
1 4442530
15.9%
7 2973991
10.6%
_ 2937078
10.5%
4 2354205
8.4%
0 2088325
7.5%
a 1505452
 
5.4%
b 1505452
 
5.4%
P 1468539
 
5.3%
9 1468539
 
5.3%
Other values (3) 2708111
9.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 20552057
73.5%
Lowercase Letter 3010904
 
10.8%
Connector Punctuation 2937078
 
10.5%
Uppercase Letter 1468539
 
5.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5 4516356
22.0%
1 4442530
21.6%
7 2973991
14.5%
4 2354205
11.5%
0 2088325
10.2%
9 1468539
 
7.1%
2 1239572
 
6.0%
3 848753
 
4.1%
6 619786
 
3.0%
Lowercase Letter
ValueCountFrequency (%)
a 1505452
50.0%
b 1505452
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2937078
100.0%
Uppercase Letter
ValueCountFrequency (%)
P 1468539
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 23489135
84.0%
Latin 4479443
 
16.0%

Most frequent character per script

Common
ValueCountFrequency (%)
5 4516356
19.2%
1 4442530
18.9%
7 2973991
12.7%
_ 2937078
12.5%
4 2354205
10.0%
0 2088325
8.9%
9 1468539
 
6.3%
2 1239572
 
5.3%
3 848753
 
3.6%
6 619786
 
2.6%
Latin
ValueCountFrequency (%)
a 1505452
33.6%
b 1505452
33.6%
P 1468539
32.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 27968578
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5 4516356
16.1%
1 4442530
15.9%
7 2973991
10.6%
_ 2937078
10.5%
4 2354205
8.4%
0 2088325
7.5%
a 1505452
 
5.4%
b 1505452
 
5.4%
P 1468539
 
5.3%
9 1468539
 
5.3%
Other values (3) 2708111
9.7%

mainoccupationinc_384A
Real number (ℝ)

MISSING 

Distinct6632
Distinct (%)0.4%
Missing1447332
Missing (%)48.7%
Infinite0
Infinite (%)0.0%
Mean57707.48346
Minimum0
Maximum200000
Zeros9
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size22.7 MiB
2024-02-13T20:54:12.992751image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile20000
Q136000
median50000
Q370000
95-th percentile120000
Maximum200000
Range200000
Interquartile range (IQR)34000

Descriptive statistics

Standard deviation33348.30285
Coefficient of variation (CV)0.5778852385
Kurtosis3.713818555
Mean57707.48346
Median Absolute Deviation (MAD)20000
Skewness1.664849512
Sum8.809964899 × 1010
Variance1112109303
MonotonicityNot monotonic
2024-02-13T20:54:13.180291image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
40000 149658
 
5.0%
50000 141374
 
4.8%
60000 133345
 
4.5%
30000 116608
 
3.9%
70000 89103
 
3.0%
100000 60078
 
2.0%
80000 58412
 
2.0%
36000 46507
 
1.6%
20000 44162
 
1.5%
24000 40383
 
1.4%
Other values (6622) 647029
21.8%
(Missing) 1447332
48.7%
ValueCountFrequency (%)
0 9
< 0.1%
0.2 2
 
< 0.1%
1 2
 
< 0.1%
1.2 1
 
< 0.1%
1.6 1
 
< 0.1%
ValueCountFrequency (%)
200000 12430
0.4%
199999.8 5
 
< 0.1%
199998 2
 
< 0.1%
199980 1
 
< 0.1%
199971.61 1
 
< 0.1%

maritalst_703L
Text

MISSING 

Distinct5
Distinct (%)< 0.1%
Missing2962646
Missing (%)99.6%
Memory size22.7 MiB
2024-02-13T20:54:13.347809image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length19
Median length7
Mean length7.203613927
Min length6

Characters and Unicode

Total characters81725
Distinct characters18
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSINGLE
2nd rowMARRIED
3rd rowSINGLE
4th rowMARRIED
5th rowMARRIED
ValueCountFrequency (%)
married 5970
52.6%
single 4059
35.8%
divorced 549
 
4.8%
living_with_partner 485
 
4.3%
widowed 282
 
2.5%
2024-02-13T20:54:13.654200image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
R 13459
16.5%
I 12315
15.1%
E 11345
13.9%
D 7632
9.3%
A 6455
7.9%
M 5970
7.3%
N 5029
 
6.2%
G 4544
 
5.6%
L 4544
 
5.6%
S 4059
 
5.0%
Other values (8) 6373
7.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 80755
98.8%
Connector Punctuation 970
 
1.2%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
R 13459
16.7%
I 12315
15.2%
E 11345
14.0%
D 7632
9.5%
A 6455
8.0%
M 5970
7.4%
N 5029
 
6.2%
G 4544
 
5.6%
L 4544
 
5.6%
S 4059
 
5.0%
Other values (7) 5403
6.7%
Connector Punctuation
ValueCountFrequency (%)
_ 970
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 80755
98.8%
Common 970
 
1.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
R 13459
16.7%
I 12315
15.2%
E 11345
14.0%
D 7632
9.5%
A 6455
8.0%
M 5970
7.4%
N 5029
 
6.2%
G 4544
 
5.6%
L 4544
 
5.6%
S 4059
 
5.0%
Other values (7) 5403
6.7%
Common
ValueCountFrequency (%)
_ 970
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 81725
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
R 13459
16.5%
I 12315
15.1%
E 11345
13.9%
D 7632
9.3%
A 6455
7.9%
M 5970
7.3%
N 5029
 
6.2%
G 4544
 
5.6%
L 4544
 
5.6%
S 4059
 
5.0%
Other values (8) 6373
7.8%

num_group1
Real number (ℝ)

ZEROS 

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.796531664
Minimum0
Maximum9
Zeros1526659
Zeros (%)51.3%
Negative0
Negative (%)0.0%
Memory size22.7 MiB
2024-02-13T20:54:13.777432image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile3
Maximum9
Range9
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.9777888443
Coefficient of variation (CV)1.227558035
Kurtosis0.2510041297
Mean0.796531664
Median Absolute Deviation (MAD)0
Skewness1.038853795
Sum2368878
Variance0.956071024
MonotonicityNot monotonic
2024-02-13T20:54:13.888416image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
0 1526659
51.3%
1 757320
25.5%
2 484214
 
16.3%
3 181768
 
6.1%
4 22453
 
0.8%
5 1466
 
< 0.1%
6 99
 
< 0.1%
7 8
 
< 0.1%
8 2
 
< 0.1%
9 2
 
< 0.1%
ValueCountFrequency (%)
0 1526659
51.3%
1 757320
25.5%
2 484214
 
16.3%
3 181768
 
6.1%
4 22453
 
0.8%
ValueCountFrequency (%)
9 2
 
< 0.1%
8 2
 
< 0.1%
7 8
 
< 0.1%
6 99
 
< 0.1%
5 1466
< 0.1%

personindex_1023L
Real number (ℝ)

MISSING  ZEROS 

Distinct7
Distinct (%)< 0.1%
Missing642283
Missing (%)21.6%
Infinite0
Infinite (%)0.0%
Mean0.4383567754
Minimum0
Maximum6
Zeros1526659
Zeros (%)51.3%
Negative0
Negative (%)0.0%
Memory size22.7 MiB
2024-02-13T20:54:13.993511image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile2
Maximum6
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.6596617567
Coefficient of variation (CV)1.504851285
Kurtosis0.432036164
Mean0.4383567754
Median Absolute Deviation (MAD)0
Skewness1.245270671
Sum1022120
Variance0.4351536333
MonotonicityNot monotonic
2024-02-13T20:54:14.108511image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 1526659
51.3%
1 591033
 
19.9%
2 211121
 
7.1%
3 2740
 
0.1%
4 151
 
< 0.1%
5 3
 
< 0.1%
6 1
 
< 0.1%
(Missing) 642283
21.6%
ValueCountFrequency (%)
0 1526659
51.3%
1 591033
 
19.9%
2 211121
 
7.1%
3 2740
 
0.1%
4 151
 
< 0.1%
ValueCountFrequency (%)
6 1
 
< 0.1%
5 3
 
< 0.1%
4 151
 
< 0.1%
3 2740
 
0.1%
2 211121
7.1%

persontype_1072L
Real number (ℝ)

Distinct3
Distinct (%)< 0.1%
Missing6117
Missing (%)0.2%
Infinite0
Infinite (%)0.0%
Mean2.034861992
Minimum1
Maximum5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size22.7 MiB
2024-02-13T20:54:14.220513image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q34
95-th percentile5
Maximum5
Range4
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.707170653
Coefficient of variation (CV)0.8389613937
Kurtosis-0.8075389231
Mean2.034861992
Median Absolute Deviation (MAD)0
Skewness1.069831992
Sum6039214
Variance2.914431638
MonotonicityNot monotonic
2024-02-13T20:54:14.334743image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=3)
ValueCountFrequency (%)
1 2161932
72.7%
5 653514
 
22.0%
4 152428
 
5.1%
(Missing) 6117
 
0.2%
ValueCountFrequency (%)
1 2161932
72.7%
4 152428
 
5.1%
5 653514
 
22.0%
ValueCountFrequency (%)
5 653514
 
22.0%
4 152428
 
5.1%
1 2161932
72.7%

persontype_792L
Real number (ℝ)

MISSING 

Distinct3
Distinct (%)< 0.1%
Missing642283
Missing (%)21.6%
Infinite0
Infinite (%)0.0%
Mean2.315690901
Minimum1
Maximum5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size22.7 MiB
2024-02-13T20:54:14.449058image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q35
95-th percentile5
Maximum5
Range4
Interquartile range (IQR)4

Descriptive statistics

Standard deviation1.826378165
Coefficient of variation (CV)0.7886968695
Kurtosis-1.47028854
Mean2.315690901
Median Absolute Deviation (MAD)0
Skewness0.6951560876
Sum5399515
Variance3.3356572
MonotonicityNot monotonic
2024-02-13T20:54:14.565058image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=3)
ValueCountFrequency (%)
1 1526659
51.3%
5 652660
21.9%
4 152389
 
5.1%
(Missing) 642283
21.6%
ValueCountFrequency (%)
1 1526659
51.3%
4 152389
 
5.1%
5 652660
21.9%
ValueCountFrequency (%)
5 652660
21.9%
4 152389
 
5.1%
1 1526659
51.3%
Distinct991
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size22.7 MiB
2024-02-13T20:54:14.898610image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length12
Median length11
Mean length9.387631637
Min length8

Characters and Unicode

Total characters27918732
Distinct characters18
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique152 ?
Unique (%)< 0.1%

Sample

1st rowP88_18_84
2nd rowa55475b1
3rd rowa55475b1
4th rowa55475b1
5th rowP103_93_94
ValueCountFrequency (%)
a55475b1 1447423
48.7%
p131_33_167 79574
 
2.7%
p197_47_166 69825
 
2.3%
p123_6_84 55824
 
1.9%
p98_137_111 42880
 
1.4%
p159_143_123 36207
 
1.2%
p62_144_102 35563
 
1.2%
p204_99_158 34868
 
1.2%
p19_11_176 31849
 
1.1%
p178_112_160 29988
 
1.0%
Other values (981) 1109990
37.3%
2024-02-13T20:54:15.413388image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 5180754
18.6%
5 5166806
18.5%
_ 3053136
10.9%
7 2645105
9.5%
4 2372638
8.5%
P 1526566
 
5.5%
a 1447423
 
5.2%
b 1447423
 
5.2%
6 1024509
 
3.7%
3 975246
 
3.5%
Other values (8) 3079126
11.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 20444176
73.2%
Connector Punctuation 3053136
 
10.9%
Lowercase Letter 2894852
 
10.4%
Uppercase Letter 1526568
 
5.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 5180754
25.3%
5 5166806
25.3%
7 2645105
12.9%
4 2372638
11.6%
6 1024509
 
5.0%
3 975246
 
4.8%
2 860122
 
4.2%
8 819632
 
4.0%
9 811448
 
4.0%
0 587916
 
2.9%
Lowercase Letter
ValueCountFrequency (%)
a 1447423
50.0%
b 1447423
50.0%
t 2
 
< 0.1%
h 2
 
< 0.1%
e 2
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
P 1526566
> 99.9%
Q 2
 
< 0.1%
Connector Punctuation
ValueCountFrequency (%)
_ 3053136
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 23497312
84.2%
Latin 4421420
 
15.8%

Most frequent character per script

Common
ValueCountFrequency (%)
1 5180754
22.0%
5 5166806
22.0%
_ 3053136
13.0%
7 2645105
11.3%
4 2372638
10.1%
6 1024509
 
4.4%
3 975246
 
4.2%
2 860122
 
3.7%
8 819632
 
3.5%
9 811448
 
3.5%
Latin
ValueCountFrequency (%)
P 1526566
34.5%
a 1447423
32.7%
b 1447423
32.7%
Q 2
 
< 0.1%
t 2
 
< 0.1%
h 2
 
< 0.1%
e 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 27918732
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 5180754
18.6%
5 5166806
18.5%
_ 3053136
10.9%
7 2645105
9.5%
4 2372638
8.5%
P 1526566
 
5.5%
a 1447423
 
5.2%
b 1447423
 
5.2%
6 1024509
 
3.7%
3 975246
 
3.5%
Other values (8) 3079126
11.0%
Distinct3531
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size22.7 MiB
2024-02-13T20:54:15.812777image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length13
Median length8
Mean length9.259763395
Min length7

Characters and Unicode

Total characters27538453
Distinct characters33
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique64 ?
Unique (%)< 0.1%

Sample

1st rowP167_100_165
2nd rowa55475b1
3rd rowa55475b1
4th rowa55475b1
5th rowP176_37_166
ValueCountFrequency (%)
a55475b1 1515676
51.0%
p161_14_174 96828
 
3.3%
p144_138_111 81525
 
2.7%
p46_103_143 54825
 
1.8%
p91_47_168 53660
 
1.8%
p62_116_179 33450
 
1.1%
p212_16_169 30579
 
1.0%
p157_35_170 29059
 
1.0%
p11_15_81 28689
 
1.0%
p85_138_173 26491
 
0.9%
Other values (3521) 1023209
34.4%
2024-02-13T20:54:16.337082image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5 5325384
19.3%
1 5076787
18.4%
_ 2916630
10.6%
4 2755042
10.0%
7 2434168
8.8%
a 1516042
 
5.5%
b 1515676
 
5.5%
P 1456545
 
5.3%
8 980009
 
3.6%
6 966794
 
3.5%
Other values (23) 2595376
9.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 20125819
73.1%
Lowercase Letter 3037689
 
11.0%
Connector Punctuation 2916630
 
10.6%
Uppercase Letter 1458315
 
5.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 1516042
49.9%
b 1515676
49.9%
e 1777
 
0.1%
t 753
 
< 0.1%
r 670
 
< 0.1%
o 597
 
< 0.1%
m 577
 
< 0.1%
l 282
 
< 0.1%
u 215
 
< 0.1%
g 177
 
< 0.1%
Other values (10) 923
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
5 5325384
26.5%
1 5076787
25.2%
4 2755042
13.7%
7 2434168
12.1%
8 980009
 
4.9%
6 966794
 
4.8%
3 829255
 
4.1%
2 651988
 
3.2%
9 579147
 
2.9%
0 527245
 
2.6%
Uppercase Letter
ValueCountFrequency (%)
P 1456545
99.9%
Q 1770
 
0.1%
Connector Punctuation
ValueCountFrequency (%)
_ 2916630
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 23042449
83.7%
Latin 4496004
 
16.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 1516042
33.7%
b 1515676
33.7%
P 1456545
32.4%
e 1777
 
< 0.1%
Q 1770
 
< 0.1%
t 753
 
< 0.1%
r 670
 
< 0.1%
o 597
 
< 0.1%
m 577
 
< 0.1%
l 282
 
< 0.1%
Other values (12) 1315
 
< 0.1%
Common
ValueCountFrequency (%)
5 5325384
23.1%
1 5076787
22.0%
_ 2916630
12.7%
4 2755042
12.0%
7 2434168
10.6%
8 980009
 
4.3%
6 966794
 
4.2%
3 829255
 
3.6%
2 651988
 
2.8%
9 579147
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 27538453
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5 5325384
19.3%
1 5076787
18.4%
_ 2916630
10.6%
4 2755042
10.0%
7 2434168
8.8%
a 1516042
 
5.5%
b 1515676
 
5.5%
P 1456545
 
5.3%
8 980009
 
3.6%
6 966794
 
3.5%
Other values (23) 2595376
9.4%
Distinct10
Distinct (%)< 0.1%
Missing2168942
Missing (%)72.9%
Memory size22.7 MiB
2024-02-13T20:54:16.544025image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length14
Median length12
Mean length7.319615328
Min length5

Characters and Unicode

Total characters5892649
Distinct characters19
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSPOUSE
2nd rowCOLLEAGUE
3rd rowSIBLING
4th rowOTHER_RELATIVE
5th rowSIBLING
ValueCountFrequency (%)
spouse 152389
18.9%
sibling 151484
18.8%
child 143018
17.8%
other_relative 112570
14.0%
friend 100453
12.5%
parent 65009
8.1%
colleague 48795
 
6.1%
other 20154
 
2.5%
neighbor 9991
 
1.2%
grand_parent 1186
 
0.1%
2024-02-13T20:54:16.865181image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
E 784482
13.3%
I 669000
11.4%
L 504662
 
8.6%
S 456262
 
7.7%
R 423119
 
7.2%
O 343899
 
5.8%
N 329309
 
5.6%
T 311489
 
5.3%
H 285733
 
4.8%
D 244657
 
4.2%
Other values (9) 1540037
26.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 5778893
98.1%
Connector Punctuation 113756
 
1.9%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 784482
13.6%
I 669000
11.6%
L 504662
 
8.7%
S 456262
 
7.9%
R 423119
 
7.3%
O 343899
 
6.0%
N 329309
 
5.7%
T 311489
 
5.4%
H 285733
 
4.9%
D 244657
 
4.2%
Other values (8) 1426281
24.7%
Connector Punctuation
ValueCountFrequency (%)
_ 113756
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5778893
98.1%
Common 113756
 
1.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 784482
13.6%
I 669000
11.6%
L 504662
 
8.7%
S 456262
 
7.9%
R 423119
 
7.3%
O 343899
 
6.0%
N 329309
 
5.7%
T 311489
 
5.4%
H 285733
 
4.9%
D 244657
 
4.2%
Other values (8) 1426281
24.7%
Common
ValueCountFrequency (%)
_ 113756
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5892649
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E 784482
13.3%
I 669000
11.4%
L 504662
 
8.6%
S 456262
 
7.7%
R 423119
 
7.2%
O 343899
 
5.8%
N 329309
 
5.6%
T 311489
 
5.3%
H 285733
 
4.8%
D 244657
 
4.2%
Other values (9) 1540037
26.1%
Distinct10
Distinct (%)< 0.1%
Missing2168049
Missing (%)72.9%
Memory size22.7 MiB
2024-02-13T20:54:17.060318image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length14
Median length12
Mean length7.320624809
Min length5

Characters and Unicode

Total characters5899999
Distinct characters19
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSPOUSE
2nd rowCOLLEAGUE
3rd rowSIBLING
4th rowOTHER_RELATIVE
5th rowSIBLING
ValueCountFrequency (%)
spouse 152428
18.9%
sibling 151666
18.8%
child 143151
17.8%
other_relative 112775
14.0%
friend 100595
12.5%
parent 65081
8.1%
colleague 48847
 
6.1%
other 20157
 
2.5%
neighbor 10051
 
1.2%
grand_parent 1191
 
0.1%
2024-02-13T20:54:17.391280image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
E 785522
13.3%
I 669904
11.4%
L 505286
 
8.6%
S 456522
 
7.7%
R 423816
 
7.2%
O 344258
 
5.8%
N 329775
 
5.6%
T 311979
 
5.3%
H 286134
 
4.8%
D 244937
 
4.2%
Other values (9) 1541866
26.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 5786033
98.1%
Connector Punctuation 113966
 
1.9%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 785522
13.6%
I 669904
11.6%
L 505286
 
8.7%
S 456522
 
7.9%
R 423816
 
7.3%
O 344258
 
5.9%
N 329775
 
5.7%
T 311979
 
5.4%
H 286134
 
4.9%
D 244937
 
4.2%
Other values (8) 1427900
24.7%
Connector Punctuation
ValueCountFrequency (%)
_ 113966
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5786033
98.1%
Common 113966
 
1.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 785522
13.6%
I 669904
11.6%
L 505286
 
8.7%
S 456522
 
7.9%
R 423816
 
7.3%
O 344258
 
5.9%
N 329775
 
5.7%
T 311979
 
5.4%
H 286134
 
4.9%
D 244937
 
4.2%
Other values (8) 1427900
24.7%
Common
ValueCountFrequency (%)
_ 113966
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5899999
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E 785522
13.3%
I 669904
11.4%
L 505286
 
8.6%
S 456522
 
7.7%
R 423816
 
7.2%
O 344258
 
5.8%
N 329775
 
5.6%
T 311979
 
5.3%
H 286134
 
4.8%
D 244937
 
4.2%
Other values (9) 1541866
26.1%

remitter_829L
Boolean

CONSTANT  MISSING 

Distinct1
Distinct (%)< 0.1%
Missing2168942
Missing (%)72.9%
Memory size22.7 MiB
False
805049 
(Missing)
2168942 
ValueCountFrequency (%)
False 805049
 
27.1%
(Missing) 2168942
72.9%
2024-02-13T20:54:17.521857image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Distinct3
Distinct (%)< 0.1%
Missing6117
Missing (%)0.2%
Memory size22.7 MiB
2024-02-13T20:54:17.617229image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters5935748
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCL
2nd rowEM
3rd rowPE
4th rowPE
5th rowCL
ValueCountFrequency (%)
cl 1625689
54.8%
pe 805942
27.2%
em 536243
 
18.1%
2024-02-13T20:54:17.861487image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 1625689
27.4%
L 1625689
27.4%
E 1342185
22.6%
P 805942
13.6%
M 536243
 
9.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 5935748
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C 1625689
27.4%
L 1625689
27.4%
E 1342185
22.6%
P 805942
13.6%
M 536243
 
9.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5935748
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 1625689
27.4%
L 1625689
27.4%
E 1342185
22.6%
P 805942
13.6%
M 536243
 
9.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5935748
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 1625689
27.4%
L 1625689
27.4%
E 1342185
22.6%
P 805942
13.6%
M 536243
 
9.0%

role_993L
Text

CONSTANT  MISSING 

Distinct1
Distinct (%)< 0.1%
Missing2949075
Missing (%)99.2%
Memory size22.7 MiB
2024-02-13T20:54:17.980863image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters99664
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFULL
2nd rowFULL
3rd rowFULL
4th rowFULL
5th rowFULL
ValueCountFrequency (%)
full 24916
100.0%
2024-02-13T20:54:18.212845image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
L 49832
50.0%
F 24916
25.0%
U 24916
25.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 99664
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
L 49832
50.0%
F 24916
25.0%
U 24916
25.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 99664
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
L 49832
50.0%
F 24916
25.0%
U 24916
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 99664
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
L 49832
50.0%
F 24916
25.0%
U 24916
25.0%

safeguarantyflag_411L
Boolean

IMBALANCE  MISSING 

Distinct2
Distinct (%)< 0.1%
Missing1447334
Missing (%)48.7%
Memory size22.7 MiB
True
1445065 
False
 
81592
(Missing)
1447334 
ValueCountFrequency (%)
True 1445065
48.6%
False 81592
 
2.7%
(Missing) 1447334
48.7%
2024-02-13T20:54:18.339682image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

sex_738L
Text

MISSING 

Distinct2
Distinct (%)< 0.1%
Missing1447332
Missing (%)48.7%
Memory size22.7 MiB
2024-02-13T20:54:18.404479image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1526659
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowF
2nd rowM
3rd rowF
4th rowF
5th rowF
ValueCountFrequency (%)
f 952776
62.4%
m 573883
37.6%
2024-02-13T20:54:18.613468image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
F 952776
62.4%
M 573883
37.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 1526659
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
F 952776
62.4%
M 573883
37.6%

Most occurring scripts

ValueCountFrequency (%)
Latin 1526659
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
F 952776
62.4%
M 573883
37.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1526659
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
F 952776
62.4%
M 573883
37.6%
Distinct8
Distinct (%)< 0.1%
Missing6117
Missing (%)0.2%
Memory size22.7 MiB
2024-02-13T20:54:18.782497image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length17
Median length14
Mean length10.58595075
Min length5

Characters and Unicode

Total characters31417768
Distinct characters19
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowPRIMARY_MOBILE
2nd rowPHONE
3rd rowPHONE
4th rowPHONE
5th rowPRIMARY_MOBILE
ValueCountFrequency (%)
primary_mobile 1770812
59.7%
phone 1087729
36.7%
home_phone 95036
 
3.2%
alternative_phone 9932
 
0.3%
secondary_mobile 3959
 
0.1%
primary_email 392
 
< 0.1%
whatsapp 13
 
< 0.1%
twitter 1
 
< 0.1%
2024-02-13T20:54:19.087229image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
M 3641403
11.6%
R 3556300
11.3%
I 3556300
11.3%
E 3086720
9.8%
O 3066463
9.8%
P 2963927
9.4%
_ 1880131
6.0%
A 1795445
5.7%
L 1785095
5.7%
Y 1775163
5.7%
Other values (9) 4310821
13.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 29537637
94.0%
Connector Punctuation 1880131
 
6.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M 3641403
12.3%
R 3556300
12.0%
I 3556300
12.0%
E 3086720
10.5%
O 3066463
10.4%
P 2963927
10.0%
A 1795445
6.1%
L 1785095
6.0%
Y 1775163
6.0%
B 1774771
6.0%
Other values (8) 2536050
8.6%
Connector Punctuation
ValueCountFrequency (%)
_ 1880131
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 29537637
94.0%
Common 1880131
 
6.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
M 3641403
12.3%
R 3556300
12.0%
I 3556300
12.0%
E 3086720
10.5%
O 3066463
10.4%
P 2963927
10.0%
A 1795445
6.1%
L 1785095
6.0%
Y 1775163
6.0%
B 1774771
6.0%
Other values (8) 2536050
8.6%
Common
ValueCountFrequency (%)
_ 1880131
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 31417768
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M 3641403
11.6%
R 3556300
11.3%
I 3556300
11.3%
E 3086720
9.8%
O 3066463
9.8%
P 2963927
9.4%
_ 1880131
6.0%
A 1795445
5.7%
L 1785095
5.7%
Y 1775163
5.7%
Other values (9) 4310821
13.7%