Overview

Dataset statistics

Number of variables11
Number of observations1643410
Missing cells4828073
Missing cells (%)26.7%
Total size in memory137.9 MiB
Average record size in memory88.0 B

Variable types

Numeric3
Text8

Alerts

addres_role_871L has 1575736 (95.9%) missing valuesMissing
empls_employedfrom_796D has 1637653 (99.6%) missing valuesMissing
relatedpersons_role_762T has 1614684 (98.3%) missing valuesMissing
num_group1 has 1463928 (89.1%) zerosZeros
num_group2 has 1561280 (95.0%) zerosZeros

Reproduction

Analysis started2024-02-13 19:54:21.492391
Analysis finished2024-02-13 19:54:26.576244
Duration5.08 seconds
Software versionydata-profiling vv4.6.4
Download configurationconfig.json

Variables

case_id
Real number (ℝ)

Distinct1435105
Distinct (%)87.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1264005.015
Minimum5
Maximum2703454
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size12.5 MiB
2024-02-13T20:54:26.735456image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum5
5-th percentile125222.45
Q1761958.25
median1323515.5
Q31695936.75
95-th percentile2622667.55
Maximum2703454
Range2703449
Interquartile range (IQR)933978.5

Descriptive statistics

Standard deviation699545.4755
Coefficient of variation (CV)0.5534356803
Kurtosis-0.458723444
Mean1264005.015
Median Absolute Deviation (MAD)474319
Skewness0.2388533482
Sum2.077278482 × 1012
Variance4.893638723 × 1011
MonotonicityIncreasing
2024-02-13T20:54:26.916191image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
140528 34
 
< 0.1%
1336868 34
 
< 0.1%
203731 32
 
< 0.1%
169916 31
 
< 0.1%
259366 31
 
< 0.1%
2631616 28
 
< 0.1%
254957 26
 
< 0.1%
648623 24
 
< 0.1%
1427304 23
 
< 0.1%
971833 23
 
< 0.1%
Other values (1435095) 1643124
> 99.9%
ValueCountFrequency (%)
5 1
 
< 0.1%
6 8
< 0.1%
7 1
 
< 0.1%
8 1
 
< 0.1%
9 1
 
< 0.1%
ValueCountFrequency (%)
2703454 1
< 0.1%
2703453 1
< 0.1%
2703452 1
< 0.1%
2703451 1
< 0.1%
2703450 1
< 0.1%
Distinct508
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size12.5 MiB
2024-02-13T20:54:27.324104image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length12
Median length8
Mean length8.092886133
Min length7

Characters and Unicode

Total characters13299930
Distinct characters20
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique56 ?
Unique (%)< 0.1%

Sample

1st rowa55475b1
2nd rowP55_110_32
3rd rowP55_110_32
4th rowP204_92_178
5th rowP191_109_75
ValueCountFrequency (%)
a55475b1 1582872
96.3%
p125_48_164 9669
 
0.6%
p155_139_77 4093
 
0.2%
p114_74_190 2552
 
0.2%
p111_2_12 2468
 
0.2%
p215_163_136 1764
 
0.1%
p88_3_41 1537
 
0.1%
p37_84_33 1249
 
0.1%
p55_110_32 1163
 
0.1%
p107_131_181 1058
 
0.1%
Other values (498) 34985
 
2.1%
2024-02-13T20:54:27.892757image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5 4789528
36.0%
1 1717341
 
12.9%
4 1632374
 
12.3%
7 1621337
 
12.2%
a 1582872
 
11.9%
b 1582872
 
11.9%
_ 121076
 
0.9%
P 60534
 
0.5%
8 39380
 
0.3%
2 39202
 
0.3%
Other values (10) 113414
 
0.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 9952558
74.8%
Lowercase Letter 3165758
 
23.8%
Connector Punctuation 121076
 
0.9%
Uppercase Letter 60538
 
0.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5 4789528
48.1%
1 1717341
 
17.3%
4 1632374
 
16.4%
7 1621337
 
16.3%
8 39380
 
0.4%
2 39202
 
0.4%
6 34765
 
0.3%
3 31907
 
0.3%
9 24574
 
0.2%
0 22150
 
0.2%
Lowercase Letter
ValueCountFrequency (%)
a 1582872
50.0%
b 1582872
50.0%
e 6
 
< 0.1%
k 2
 
< 0.1%
p 2
 
< 0.1%
t 2
 
< 0.1%
h 2
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
P 60534
> 99.9%
Q 4
 
< 0.1%
Connector Punctuation
ValueCountFrequency (%)
_ 121076
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 10073634
75.7%
Latin 3226296
 
24.3%

Most frequent character per script

Common
ValueCountFrequency (%)
5 4789528
47.5%
1 1717341
 
17.0%
4 1632374
 
16.2%
7 1621337
 
16.1%
_ 121076
 
1.2%
8 39380
 
0.4%
2 39202
 
0.4%
6 34765
 
0.3%
3 31907
 
0.3%
9 24574
 
0.2%
Latin
ValueCountFrequency (%)
a 1582872
49.1%
b 1582872
49.1%
P 60534
 
1.9%
e 6
 
< 0.1%
Q 4
 
< 0.1%
k 2
 
< 0.1%
p 2
 
< 0.1%
t 2
 
< 0.1%
h 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13299930
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5 4789528
36.0%
1 1717341
 
12.9%
4 1632374
 
12.3%
7 1621337
 
12.2%
a 1582872
 
11.9%
b 1582872
 
11.9%
_ 121076
 
0.9%
P 60534
 
0.5%
8 39380
 
0.3%
2 39202
 
0.3%
Other values (10) 113414
 
0.9%

addres_role_871L
Text

MISSING 

Distinct8
Distinct (%)< 0.1%
Missing1575736
Missing (%)95.9%
Memory size12.5 MiB
2024-02-13T20:54:28.082081image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length21
Median length9
Mean length8.374678606
Min length7

Characters and Unicode

Total characters566748
Distinct characters20
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowCONTACT
2nd rowPERMANENT
3rd rowCONTACT
4th rowCONTACT
5th rowCONTACT
ValueCountFrequency (%)
permanent 37338
55.2%
contact 21918
32.4%
temporary 7193
 
10.6%
registered 1187
 
1.8%
migrated_registration 19
 
< 0.1%
migrated_living 13
 
< 0.1%
migrated_work 5
 
< 0.1%
migrated_other 1
 
< 0.1%
2024-02-13T20:54:28.396677image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
N 96626
17.0%
T 89631
15.8%
E 85488
15.1%
A 66506
11.7%
R 54180
9.6%
M 44569
7.9%
P 44531
7.9%
C 43836
7.7%
O 29136
 
5.1%
Y 7193
 
1.3%
Other values (10) 5052
 
0.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 566710
> 99.9%
Connector Punctuation 38
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 96626
17.1%
T 89631
15.8%
E 85488
15.1%
A 66506
11.7%
R 54180
9.6%
M 44569
7.9%
P 44531
7.9%
C 43836
7.7%
O 29136
 
5.1%
Y 7193
 
1.3%
Other values (9) 5014
 
0.9%
Connector Punctuation
ValueCountFrequency (%)
_ 38
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 566710
> 99.9%
Common 38
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 96626
17.1%
T 89631
15.8%
E 85488
15.1%
A 66506
11.7%
R 54180
9.6%
M 44569
7.9%
P 44531
7.9%
C 43836
7.7%
O 29136
 
5.1%
Y 7193
 
1.3%
Other values (9) 5014
 
0.9%
Common
ValueCountFrequency (%)
_ 38
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 566748
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 96626
17.0%
T 89631
15.8%
E 85488
15.1%
A 66506
11.7%
R 54180
9.6%
M 44569
7.9%
P 44531
7.9%
C 43836
7.7%
O 29136
 
5.1%
Y 7193
 
1.3%
Other values (10) 5052
 
0.9%
Distinct2027
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size12.5 MiB
2024-02-13T20:54:28.797287image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length13
Median length8
Mean length8.106648371
Min length7

Characters and Unicode

Total characters13322547
Distinct characters28
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique137 ?
Unique (%)< 0.1%

Sample

1st rowa55475b1
2nd rowP10_68_40
3rd rowP10_68_40
4th rowP65_136_169
5th rowP10_68_40
ValueCountFrequency (%)
a55475b1 1576370
95.9%
p161_14_174 5968
 
0.4%
p144_138_111 3405
 
0.2%
p46_103_143 3296
 
0.2%
p85_138_173 2371
 
0.1%
p118_161_181 2132
 
0.1%
p11_15_81 2124
 
0.1%
p212_16_169 1980
 
0.1%
p133_34_165 1612
 
0.1%
p157_35_170 1536
 
0.1%
Other values (2017) 42616
 
2.6%
2024-02-13T20:54:29.368651image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5 4764495
35.8%
1 1746591
 
13.1%
4 1635283
 
12.3%
7 1616195
 
12.1%
a 1576377
 
11.8%
b 1576370
 
11.8%
_ 134080
 
1.0%
P 67013
 
0.5%
6 44505
 
0.3%
8 44256
 
0.3%
Other values (18) 117382
 
0.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 9968574
74.8%
Lowercase Letter 3152853
 
23.7%
Connector Punctuation 134080
 
1.0%
Uppercase Letter 67040
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 1576377
50.0%
b 1576370
50.0%
e 30
 
< 0.1%
t 14
 
< 0.1%
r 14
 
< 0.1%
o 11
 
< 0.1%
m 9
 
< 0.1%
l 5
 
< 0.1%
i 5
 
< 0.1%
z 5
 
< 0.1%
Other values (5) 13
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
5 4764495
47.8%
1 1746591
 
17.5%
4 1635283
 
16.4%
7 1616195
 
16.2%
6 44505
 
0.4%
8 44256
 
0.4%
3 42240
 
0.4%
2 27427
 
0.3%
9 24024
 
0.2%
0 23558
 
0.2%
Uppercase Letter
ValueCountFrequency (%)
P 67013
> 99.9%
Q 27
 
< 0.1%
Connector Punctuation
ValueCountFrequency (%)
_ 134080
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 10102654
75.8%
Latin 3219893
 
24.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 1576377
49.0%
b 1576370
49.0%
P 67013
 
2.1%
e 30
 
< 0.1%
Q 27
 
< 0.1%
t 14
 
< 0.1%
r 14
 
< 0.1%
o 11
 
< 0.1%
m 9
 
< 0.1%
l 5
 
< 0.1%
Other values (7) 23
 
< 0.1%
Common
ValueCountFrequency (%)
5 4764495
47.2%
1 1746591
 
17.3%
4 1635283
 
16.2%
7 1616195
 
16.0%
_ 134080
 
1.3%
6 44505
 
0.4%
8 44256
 
0.4%
3 42240
 
0.4%
2 27427
 
0.3%
9 24024
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13322547
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5 4764495
35.8%
1 1746591
 
13.1%
4 1635283
 
12.3%
7 1616195
 
12.1%
a 1576377
 
11.8%
b 1576370
 
11.8%
_ 134080
 
1.0%
P 67013
 
0.5%
6 44505
 
0.3%
8 44256
 
0.3%
Other values (18) 117382
 
0.9%
Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size12.5 MiB
2024-02-13T20:54:29.559903image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length12
Median length8
Mean length8.077852149
Min length8

Characters and Unicode

Total characters13275223
Distinct characters14
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowa55475b1
2nd rowP38_92_157
3rd rowa55475b1
4th rowP38_92_157
5th rowP7_147_157
ValueCountFrequency (%)
a55475b1 1587829
96.6%
p38_92_157 29333
 
1.8%
p177_137_98 9179
 
0.6%
p7_147_157 9120
 
0.6%
p125_105_50 4962
 
0.3%
p115_147_77 1231
 
0.1%
p125_14_176 1088
 
0.1%
p58_79_51 307
 
< 0.1%
p124_137_181 271
 
< 0.1%
p206_38_166 86
 
< 0.1%
2024-02-13T20:54:29.883987image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5 4819759
36.3%
7 1677418
 
12.6%
1 1672126
 
12.6%
4 1599547
 
12.0%
a 1587829
 
12.0%
b 1587829
 
12.0%
_ 111162
 
0.8%
P 55581
 
0.4%
8 39176
 
0.3%
3 38873
 
0.3%
Other values (4) 85923
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 9932822
74.8%
Lowercase Letter 3175658
 
23.9%
Connector Punctuation 111162
 
0.8%
Uppercase Letter 55581
 
0.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5 4819759
48.5%
7 1677418
 
16.9%
1 1672126
 
16.8%
4 1599547
 
16.1%
8 39176
 
0.4%
3 38873
 
0.4%
9 38823
 
0.4%
2 35744
 
0.4%
0 10010
 
0.1%
6 1346
 
< 0.1%
Lowercase Letter
ValueCountFrequency (%)
a 1587829
50.0%
b 1587829
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 111162
100.0%
Uppercase Letter
ValueCountFrequency (%)
P 55581
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 10043984
75.7%
Latin 3231239
 
24.3%

Most frequent character per script

Common
ValueCountFrequency (%)
5 4819759
48.0%
7 1677418
 
16.7%
1 1672126
 
16.6%
4 1599547
 
15.9%
_ 111162
 
1.1%
8 39176
 
0.4%
3 38873
 
0.4%
9 38823
 
0.4%
2 35744
 
0.4%
0 10010
 
0.1%
Latin
ValueCountFrequency (%)
a 1587829
49.1%
b 1587829
49.1%
P 55581
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13275223
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5 4819759
36.3%
7 1677418
 
12.6%
1 1672126
 
12.6%
4 1599547
 
12.0%
a 1587829
 
12.0%
b 1587829
 
12.0%
_ 111162
 
0.8%
P 55581
 
0.4%
8 39176
 
0.3%
3 38873
 
0.3%
Other values (4) 85923
 
0.6%
Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size12.5 MiB
2024-02-13T20:54:30.054498image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length11
Median length8
Mean length8.041208828
Min length8

Characters and Unicode

Total characters13215003
Distinct characters14
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowa55475b1
2nd rowP164_110_33
3rd rowa55475b1
4th rowP164_110_33
5th rowa55475b1
ValueCountFrequency (%)
a55475b1 1618686
98.5%
p22_131_138 9310
 
0.6%
p164_110_33 6527
 
0.4%
p28_32_178 6291
 
0.4%
p148_57_109 1871
 
0.1%
p112_86_147 468
 
< 0.1%
p191_80_124 114
 
< 0.1%
p7_47_145 79
 
< 0.1%
p164_122_65 61
 
< 0.1%
p82_144_169 3
 
< 0.1%
2024-02-13T20:54:30.368759image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5 4858069
36.8%
1 1678183
 
12.7%
4 1627891
 
12.3%
7 1627474
 
12.3%
a 1618686
 
12.2%
b 1618686
 
12.2%
_ 49448
 
0.4%
3 37965
 
0.3%
2 31909
 
0.2%
P 24724
 
0.2%
Other values (4) 41968
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 9903459
74.9%
Lowercase Letter 3237372
 
24.5%
Connector Punctuation 49448
 
0.4%
Uppercase Letter 24724
 
0.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5 4858069
49.1%
1 1678183
 
16.9%
4 1627891
 
16.4%
7 1627474
 
16.4%
3 37965
 
0.4%
2 31909
 
0.3%
8 24348
 
0.2%
0 8512
 
0.1%
6 7120
 
0.1%
9 1988
 
< 0.1%
Lowercase Letter
ValueCountFrequency (%)
a 1618686
50.0%
b 1618686
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 49448
100.0%
Uppercase Letter
ValueCountFrequency (%)
P 24724
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 9952907
75.3%
Latin 3262096
 
24.7%

Most frequent character per script

Common
ValueCountFrequency (%)
5 4858069
48.8%
1 1678183
 
16.9%
4 1627891
 
16.4%
7 1627474
 
16.4%
_ 49448
 
0.5%
3 37965
 
0.4%
2 31909
 
0.3%
8 24348
 
0.2%
0 8512
 
0.1%
6 7120
 
0.1%
Latin
ValueCountFrequency (%)
a 1618686
49.6%
b 1618686
49.6%
P 24724
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13215003
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5 4858069
36.8%
1 1678183
 
12.7%
4 1627891
 
12.3%
7 1627474
 
12.3%
a 1618686
 
12.2%
b 1618686
 
12.2%
_ 49448
 
0.4%
3 37965
 
0.3%
2 31909
 
0.2%
P 24724
 
0.2%
Other values (4) 41968
 
0.3%
Distinct801
Distinct (%)13.9%
Missing1637653
Missing (%)99.6%
Memory size12.5 MiB
2024-02-13T20:54:30.774129image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters57570
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique453 ?
Unique (%)7.9%

Sample

1st row2018-06-15
2nd row2011-08-15
3rd row1994-05-15
4th row2013-01-15
5th row2014-09-15
ValueCountFrequency (%)
2017-01-15 228
 
4.0%
2016-01-15 196
 
3.4%
2015-01-15 181
 
3.1%
2018-01-15 125
 
2.2%
2013-01-15 113
 
2.0%
2014-01-15 106
 
1.8%
2012-01-15 102
 
1.8%
2007-09-15 71
 
1.2%
2010-01-15 69
 
1.2%
2007-01-15 56
 
1.0%
Other values (791) 4510
78.3%
2024-02-13T20:54:31.327151image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 13459
23.4%
1 12287
21.3%
- 11514
20.0%
2 6876
11.9%
5 6417
11.1%
9 1699
 
3.0%
6 1289
 
2.2%
7 1244
 
2.2%
8 1163
 
2.0%
4 866
 
1.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 46056
80.0%
Dash Punctuation 11514
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 13459
29.2%
1 12287
26.7%
2 6876
14.9%
5 6417
13.9%
9 1699
 
3.7%
6 1289
 
2.8%
7 1244
 
2.7%
8 1163
 
2.5%
4 866
 
1.9%
3 756
 
1.6%
Dash Punctuation
ValueCountFrequency (%)
- 11514
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 57570
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 13459
23.4%
1 12287
21.3%
- 11514
20.0%
2 6876
11.9%
5 6417
11.1%
9 1699
 
3.0%
6 1289
 
2.2%
7 1244
 
2.2%
8 1163
 
2.0%
4 866
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 57570
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 13459
23.4%
1 12287
21.3%
- 11514
20.0%
2 6876
11.9%
5 6417
11.1%
9 1699
 
3.0%
6 1289
 
2.2%
7 1244
 
2.2%
8 1163
 
2.0%
4 866
 
1.5%
Distinct7153
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size12.5 MiB
2024-02-13T20:54:31.709382image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length16
Median length8
Mean length8.011236393
Min length7

Characters and Unicode

Total characters13165746
Distinct characters34
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6991 ?
Unique (%)0.4%

Sample

1st rowa55475b1
2nd rowa55475b1
3rd rowa55475b1
4th rowa55475b1
5th rowa55475b1
ValueCountFrequency (%)
a55475b1 1636009
99.5%
p114_118_163 17
 
< 0.1%
p179_55_175 7
 
< 0.1%
p26_112_122 6
 
< 0.1%
p133_138_183 6
 
< 0.1%
p9_69_94 6
 
< 0.1%
p74_31_177 6
 
< 0.1%
p38_11_59 6
 
< 0.1%
p149_35_169 5
 
< 0.1%
p204_145_180 5
 
< 0.1%
Other values (7143) 7337
 
0.4%
2024-02-13T20:54:32.215915image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5 4912435
37.3%
1 1651792
 
12.5%
7 1641239
 
12.5%
4 1640352
 
12.5%
a 1636016
 
12.4%
b 1636010
 
12.4%
_ 14802
 
0.1%
P 7366
 
0.1%
2 4892
 
< 0.1%
6 4858
 
< 0.1%
Other values (24) 15984
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 9871374
75.0%
Lowercase Letter 3272169
 
24.9%
Connector Punctuation 14802
 
0.1%
Uppercase Letter 7401
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 1636016
50.0%
b 1636010
50.0%
e 20
 
< 0.1%
o 15
 
< 0.1%
t 12
 
< 0.1%
i 10
 
< 0.1%
h 10
 
< 0.1%
s 9
 
< 0.1%
u 9
 
< 0.1%
n 9
 
< 0.1%
Other values (11) 49
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
5 4912435
49.8%
1 1651792
 
16.7%
7 1641239
 
16.6%
4 1640352
 
16.6%
2 4892
 
< 0.1%
6 4858
 
< 0.1%
3 4191
 
< 0.1%
8 4114
 
< 0.1%
0 3765
 
< 0.1%
9 3736
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
P 7366
99.5%
Q 35
 
0.5%
Connector Punctuation
ValueCountFrequency (%)
_ 14802
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 9886176
75.1%
Latin 3279570
 
24.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 1636016
49.9%
b 1636010
49.9%
P 7366
 
0.2%
Q 35
 
< 0.1%
e 20
 
< 0.1%
o 15
 
< 0.1%
t 12
 
< 0.1%
i 10
 
< 0.1%
h 10
 
< 0.1%
s 9
 
< 0.1%
Other values (13) 67
 
< 0.1%
Common
ValueCountFrequency (%)
5 4912435
49.7%
1 1651792
 
16.7%
7 1641239
 
16.6%
4 1640352
 
16.6%
_ 14802
 
0.1%
2 4892
 
< 0.1%
6 4858
 
< 0.1%
3 4191
 
< 0.1%
8 4114
 
< 0.1%
0 3765
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13165746
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5 4912435
37.3%
1 1651792
 
12.5%
7 1641239
 
12.5%
4 1640352
 
12.5%
a 1636016
 
12.4%
b 1636010
 
12.4%
_ 14802
 
0.1%
P 7366
 
0.1%
2 4892
 
< 0.1%
6 4858
 
< 0.1%
Other values (24) 15984
 
0.1%

num_group1
Real number (ℝ)

ZEROS 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1115424635
Minimum0
Maximum4
Zeros1463928
Zeros (%)89.1%
Negative0
Negative (%)0.0%
Memory size12.5 MiB
2024-02-13T20:54:32.352947image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum4
Range4
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.3224078582
Coefficient of variation (CV)2.890449502
Kurtosis6.316380546
Mean0.1115424635
Median Absolute Deviation (MAD)0
Skewness2.699905629
Sum183310
Variance0.103946827
MonotonicityNot monotonic
2024-02-13T20:54:32.472185image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=5)
ValueCountFrequency (%)
0 1463928
89.1%
1 175805
 
10.7%
2 3529
 
0.2%
3 145
 
< 0.1%
4 3
 
< 0.1%
ValueCountFrequency (%)
0 1463928
89.1%
1 175805
 
10.7%
2 3529
 
0.2%
3 145
 
< 0.1%
4 3
 
< 0.1%
ValueCountFrequency (%)
4 3
 
< 0.1%
3 145
 
< 0.1%
2 3529
 
0.2%
1 175805
 
10.7%
0 1463928
89.1%

num_group2
Real number (ℝ)

ZEROS 

Distinct32
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1237013283
Minimum0
Maximum31
Zeros1561280
Zeros (%)95.0%
Negative0
Negative (%)0.0%
Memory size12.5 MiB
2024-02-13T20:54:32.625283image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum31
Range31
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.7612453743
Coefficient of variation (CV)6.15389814
Kurtosis156.705602
Mean0.1237013283
Median Absolute Deviation (MAD)0
Skewness10.32953806
Sum203292
Variance0.57949452
MonotonicityNot monotonic
2024-02-13T20:54:32.771886image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=32)
ValueCountFrequency (%)
0 1561280
95.0%
1 44507
 
2.7%
2 11110
 
0.7%
3 8411
 
0.5%
4 5356
 
0.3%
5 4213
 
0.3%
6 2831
 
0.2%
7 1979
 
0.1%
8 1232
 
0.1%
9 860
 
0.1%
Other values (22) 1631
 
0.1%
ValueCountFrequency (%)
0 1561280
95.0%
1 44507
 
2.7%
2 11110
 
0.7%
3 8411
 
0.5%
4 5356
 
0.3%
ValueCountFrequency (%)
31 2
 
< 0.1%
30 3
< 0.1%
29 5
< 0.1%
28 5
< 0.1%
27 5
< 0.1%
Distinct10
Distinct (%)< 0.1%
Missing1614684
Missing (%)98.3%
Memory size12.5 MiB
2024-02-13T20:54:32.963697image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length14
Median length12
Mean length8.217259625
Min length5

Characters and Unicode

Total characters236049
Distinct characters19
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOTHER_RELATIVE
2nd rowOTHER_RELATIVE
3rd rowPARENT
4th rowPARENT
5th rowCOLLEAGUE
ValueCountFrequency (%)
other_relative 6211
21.6%
sibling 5406
18.8%
friend 5293
18.4%
colleague 3160
11.0%
parent 2098
 
7.3%
other 2080
 
7.2%
child 1955
 
6.8%
spouse 1476
 
5.1%
neighbor 782
 
2.7%
grand_parent 265
 
0.9%
2024-02-13T20:54:33.287520image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
E 36947
15.7%
I 25053
10.6%
R 23205
9.8%
L 19892
 
8.4%
T 16865
 
7.1%
N 14109
 
6.0%
O 13709
 
5.8%
A 11999
 
5.1%
H 11028
 
4.7%
G 9613
 
4.1%
Other values (9) 53629
22.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 229573
97.3%
Connector Punctuation 6476
 
2.7%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 36947
16.1%
I 25053
10.9%
R 23205
10.1%
L 19892
8.7%
T 16865
 
7.3%
N 14109
 
6.1%
O 13709
 
6.0%
A 11999
 
5.2%
H 11028
 
4.8%
G 9613
 
4.2%
Other values (8) 47153
20.5%
Connector Punctuation
ValueCountFrequency (%)
_ 6476
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 229573
97.3%
Common 6476
 
2.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 36947
16.1%
I 25053
10.9%
R 23205
10.1%
L 19892
8.7%
T 16865
 
7.3%
N 14109
 
6.1%
O 13709
 
6.0%
A 11999
 
5.2%
H 11028
 
4.8%
G 9613
 
4.2%
Other values (8) 47153
20.5%
Common
ValueCountFrequency (%)
_ 6476
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 236049
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E 36947
15.7%
I 25053
10.6%
R 23205
9.8%
L 19892
 
8.4%
T 16865
 
7.1%
N 14109
 
6.0%
O 13709
 
5.8%
A 11999
 
5.1%
H 11028
 
4.7%
G 9613
 
4.1%
Other values (9) 53629
22.7%