Overview

Dataset statistics

Number of variables6
Number of observations14075487
Missing cells16236709
Missing cells (%)19.2%
Total size in memory644.3 MiB
Average record size in memory48.0 B

Variable types

Numeric3
Text3

Alerts

conts_type_509L has 2394056 (17.0%) missing valuesMissing
credacc_cards_status_52L has 13733404 (97.6%) missing valuesMissing
num_group1 has 2276307 (16.2%) zerosZeros
num_group2 has 6525978 (46.4%) zerosZeros

Reproduction

Analysis started2024-02-13 19:37:46.943271
Analysis finished2024-02-13 19:38:08.967824
Duration22.02 seconds
Software versionydata-profiling vv4.6.4
Download configurationconfig.json

Variables

case_id
Real number (ℝ)

Distinct1221522
Distinct (%)8.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1454198.017
Minimum2
Maximum2703454
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size107.4 MiB
2024-02-13T20:38:09.114789image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile144108
Q11237440
median1575626
Q31861301
95-th percentile2657763
Maximum2703454
Range2703452
Interquartile range (IQR)623861

Descriptive statistics

Standard deviation787508.3638
Coefficient of variation (CV)0.5415413545
Kurtosis-0.7224045076
Mean1454198.017
Median Absolute Deviation (MAD)307009
Skewness-0.3252931327
Sum2.046854528 × 1013
Variance6.201694231 × 1011
MonotonicityIncreasing
2024-02-13T20:38:09.298787image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1246034 79
 
< 0.1%
1390922 79
 
< 0.1%
1344039 78
 
< 0.1%
1585694 77
 
< 0.1%
2647430 77
 
< 0.1%
1495888 77
 
< 0.1%
1260644 76
 
< 0.1%
1416022 76
 
< 0.1%
158480 74
 
< 0.1%
1393522 74
 
< 0.1%
Other values (1221512) 14074720
> 99.9%
ValueCountFrequency (%)
2 4
< 0.1%
3 3
< 0.1%
4 2
 
< 0.1%
5 1
 
< 0.1%
6 5
< 0.1%
ValueCountFrequency (%)
2703454 6
 
< 0.1%
2703453 24
< 0.1%
2703452 6
 
< 0.1%
2703451 17
< 0.1%
2703450 29
< 0.1%
Distinct9
Distinct (%)< 0.1%
Missing109249
Missing (%)0.8%
Memory size107.4 MiB
2024-02-13T20:38:09.484586image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length11
Median length8
Mean length8.001376677
Min length8

Characters and Unicode

Total characters111749131
Distinct characters13
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowa55475b1
2nd rowa55475b1
3rd rowa55475b1
4th rowa55475b1
5th rowa55475b1
ValueCountFrequency (%)
a55475b1 13958443
99.9%
p33_145_161 3309
 
< 0.1%
p201_63_60 2960
 
< 0.1%
p19_60_110 1135
 
< 0.1%
p23_105_103 117
 
< 0.1%
p133_119_56 100
 
< 0.1%
p41_107_150 92
 
< 0.1%
p17_56_144 63
 
< 0.1%
p127_74_114 19
 
< 0.1%
2024-02-13T20:38:09.808583image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5 41879010
37.5%
1 13975728
 
12.5%
4 13962008
 
12.5%
7 13958636
 
12.5%
a 13958443
 
12.5%
b 13958443
 
12.5%
_ 15590
 
< 0.1%
6 10527
 
< 0.1%
3 10012
 
< 0.1%
0 8608
 
< 0.1%
Other values (3) 12126
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 83808860
75.0%
Lowercase Letter 27916886
 
25.0%
Connector Punctuation 15590
 
< 0.1%
Uppercase Letter 7795
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5 41879010
50.0%
1 13975728
 
16.7%
4 13962008
 
16.7%
7 13958636
 
16.7%
6 10527
 
< 0.1%
3 10012
 
< 0.1%
0 8608
 
< 0.1%
2 3096
 
< 0.1%
9 1235
 
< 0.1%
Lowercase Letter
ValueCountFrequency (%)
a 13958443
50.0%
b 13958443
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 15590
100.0%
Uppercase Letter
ValueCountFrequency (%)
P 7795
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 83824450
75.0%
Latin 27924681
 
25.0%

Most frequent character per script

Common
ValueCountFrequency (%)
5 41879010
50.0%
1 13975728
 
16.7%
4 13962008
 
16.7%
7 13958636
 
16.7%
_ 15590
 
< 0.1%
6 10527
 
< 0.1%
3 10012
 
< 0.1%
0 8608
 
< 0.1%
2 3096
 
< 0.1%
9 1235
 
< 0.1%
Latin
ValueCountFrequency (%)
a 13958443
50.0%
b 13958443
50.0%
P 7795
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 111749131
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5 41879010
37.5%
1 13975728
 
12.5%
4 13962008
 
12.5%
7 13958636
 
12.5%
a 13958443
 
12.5%
b 13958443
 
12.5%
_ 15590
 
< 0.1%
6 10527
 
< 0.1%
3 10012
 
< 0.1%
0 8608
 
< 0.1%
Other values (3) 12126
 
< 0.1%

conts_type_509L
Text

MISSING 

Distinct9
Distinct (%)< 0.1%
Missing2394056
Missing (%)17.0%
Memory size107.4 MiB
2024-02-13T20:38:09.995978image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length17
Median length14
Mean length12.71871049
Min length5

Characters and Unicode

Total characters148572739
Distinct characters20
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowEMPLOYMENT_PHONE
2nd rowEMPLOYMENT_PHONE
3rd rowPRIMARY_MOBILE
4th rowPRIMARY_MOBILE
5th rowPRIMARY_MOBILE
ValueCountFrequency (%)
primary_mobile 6294386
53.9%
home_phone 2472180
 
21.2%
employment_phone 1679417
 
14.4%
phone 924958
 
7.9%
primary_email 257126
 
2.2%
alternative_phone 37833
 
0.3%
secondary_mobile 15504
 
0.1%
whatsapp 25
 
< 0.1%
skype 2
 
< 0.1%
2024-02-13T20:38:10.314564image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
M 18949542
12.8%
E 17603590
11.8%
O 15591379
10.5%
P 13345369
9.0%
I 13156361
8.9%
R 13156361
8.9%
_ 10756446
7.2%
L 8284266
 
5.6%
Y 8246435
 
5.6%
H 7586593
 
5.1%
Other values (10) 21896397
14.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 137816293
92.8%
Connector Punctuation 10756446
 
7.2%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M 18949542
13.7%
E 17603590
12.8%
O 15591379
11.3%
P 13345369
9.7%
I 13156361
9.5%
R 13156361
9.5%
L 8284266
6.0%
Y 8246435
6.0%
H 7586593
5.5%
A 6899858
 
5.0%
Other values (9) 14996539
10.9%
Connector Punctuation
ValueCountFrequency (%)
_ 10756446
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 137816293
92.8%
Common 10756446
 
7.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
M 18949542
13.7%
E 17603590
12.8%
O 15591379
11.3%
P 13345369
9.7%
I 13156361
9.5%
R 13156361
9.5%
L 8284266
6.0%
Y 8246435
6.0%
H 7586593
5.5%
A 6899858
 
5.0%
Other values (9) 14996539
10.9%
Common
ValueCountFrequency (%)
_ 10756446
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 148572739
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M 18949542
12.8%
E 17603590
11.8%
O 15591379
10.5%
P 13345369
9.0%
I 13156361
8.9%
R 13156361
8.9%
_ 10756446
7.2%
L 8284266
 
5.6%
Y 8246435
 
5.6%
H 7586593
 
5.1%
Other values (10) 21896397
14.7%
Distinct6
Distinct (%)< 0.1%
Missing13733404
Missing (%)97.6%
Memory size107.4 MiB
2024-02-13T20:38:10.483487image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length11
Median length9
Mean length7.840579625
Min length6

Characters and Unicode

Total characters2682129
Distinct characters17
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCANCELLED
2nd rowCANCELLED
3rd rowCANCELLED
4th rowCANCELLED
5th rowINACTIVE
ValueCountFrequency (%)
cancelled 167031
48.8%
active 109642
32.1%
inactive 62968
 
18.4%
blocked 2098
 
0.6%
renewed 304
 
0.1%
unconfirmed 40
 
< 0.1%
2024-02-13T20:38:10.787242image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
E 509722
19.0%
C 508810
19.0%
A 339641
12.7%
L 336160
12.5%
I 235618
8.8%
N 230383
8.6%
V 172610
 
6.4%
T 172610
 
6.4%
D 169473
 
6.3%
O 2138
 
0.1%
Other values (7) 4964
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 2682129
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 509722
19.0%
C 508810
19.0%
A 339641
12.7%
L 336160
12.5%
I 235618
8.8%
N 230383
8.6%
V 172610
 
6.4%
T 172610
 
6.4%
D 169473
 
6.3%
O 2138
 
0.1%
Other values (7) 4964
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Latin 2682129
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 509722
19.0%
C 508810
19.0%
A 339641
12.7%
L 336160
12.5%
I 235618
8.8%
N 230383
8.6%
V 172610
 
6.4%
T 172610
 
6.4%
D 169473
 
6.3%
O 2138
 
0.1%
Other values (7) 4964
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2682129
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E 509722
19.0%
C 508810
19.0%
A 339641
12.7%
L 336160
12.5%
I 235618
8.8%
N 230383
8.6%
V 172610
 
6.4%
T 172610
 
6.4%
D 169473
 
6.3%
O 2138
 
0.1%
Other values (7) 4964
 
0.2%

num_group1
Real number (ℝ)

ZEROS 

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.551473281
Minimum0
Maximum19
Zeros2276307
Zeros (%)16.2%
Negative0
Negative (%)0.0%
Memory size107.4 MiB
2024-02-13T20:38:10.920214image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q37
95-th percentile14
Maximum19
Range19
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.390422118
Coefficient of variation (CV)0.9646155974
Kurtosis0.8367524554
Mean4.551473281
Median Absolute Deviation (MAD)2
Skewness1.183532817
Sum64064203
Variance19.27580637
MonotonicityNot monotonic
2024-02-13T20:38:11.045251image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
0 2276307
16.2%
1 1976391
14.0%
2 1685107
12.0%
3 1421703
10.1%
4 1190551
8.5%
5 990136
7.0%
6 820840
 
5.8%
7 680934
 
4.8%
8 563599
 
4.0%
9 467994
 
3.3%
Other values (10) 2001925
14.2%
ValueCountFrequency (%)
0 2276307
16.2%
1 1976391
14.0%
2 1685107
12.0%
3 1421703
10.1%
4 1190551
8.5%
ValueCountFrequency (%)
19 84591
0.6%
18 98875
0.7%
17 115979
0.8%
16 136458
1.0%
15 161700
1.1%

num_group2
Real number (ℝ)

ZEROS 

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.7403147756
Minimum0
Maximum11
Zeros6525978
Zeros (%)46.4%
Negative0
Negative (%)0.0%
Memory size107.4 MiB
2024-02-13T20:38:11.189099image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q31
95-th percentile2
Maximum11
Range11
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.8025843052
Coefficient of variation (CV)1.084112234
Kurtosis-0.2042270868
Mean0.7403147756
Median Absolute Deviation (MAD)1
Skewness0.7684751112
Sum10420291
Variance0.6441415669
MonotonicityNot monotonic
2024-02-13T20:38:11.323156image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
0 6525978
46.4%
1 4976173
35.4%
2 2286939
 
16.2%
3 276223
 
2.0%
4 9394
 
0.1%
5 717
 
< 0.1%
6 46
 
< 0.1%
7 9
 
< 0.1%
8 4
 
< 0.1%
9 2
 
< 0.1%
Other values (2) 2
 
< 0.1%
ValueCountFrequency (%)
0 6525978
46.4%
1 4976173
35.4%
2 2286939
 
16.2%
3 276223
 
2.0%
4 9394
 
0.1%
ValueCountFrequency (%)
11 1
 
< 0.1%
10 1
 
< 0.1%
9 2
 
< 0.1%
8 4
< 0.1%
7 9
< 0.1%