Overview

Dataset statistics

Number of variables5
Number of observations3343800
Missing cells0
Missing cells (%)0.0%
Total size in memory127.6 MiB
Average record size in memory40.0 B

Variable types

Numeric3
Text2

Alerts

num_group1 has 482265 (14.4%) zerosZeros

Reproduction

Analysis started2024-02-13 19:58:15.065329
Analysis finished2024-02-13 19:58:20.376181
Duration5.31 seconds
Software versionydata-profiling vv4.6.4
Download configurationconfig.json

Variables

case_id
Real number (ℝ)

Distinct482265
Distinct (%)14.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1161306.38
Minimum357
Maximum2629815
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size25.5 MiB
2024-02-13T20:58:20.543182image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum357
5-th percentile114792
Q1700623
median1301411
Q31471673
95-th percentile2585638
Maximum2629815
Range2629458
Interquartile range (IQR)771050

Descriptive statistics

Standard deviation657994.9559
Coefficient of variation (CV)0.5665989331
Kurtosis0.109580793
Mean1161306.38
Median Absolute Deviation (MAD)493315
Skewness0.449679431
Sum3.883176274 × 1012
Variance4.32957362 × 1011
MonotonicityIncreasing
2024-02-13T20:58:20.806596image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
608018 121
 
< 0.1%
627764 121
 
< 0.1%
700836 111
 
< 0.1%
659955 99
 
< 0.1%
1339712 69
 
< 0.1%
1443276 64
 
< 0.1%
161770 63
 
< 0.1%
677615 63
 
< 0.1%
1569035 60
 
< 0.1%
1318861 59
 
< 0.1%
Other values (482255) 3342970
> 99.9%
ValueCountFrequency (%)
357 6
< 0.1%
381 6
< 0.1%
388 6
< 0.1%
405 6
< 0.1%
409 7
< 0.1%
ValueCountFrequency (%)
2629815 11
< 0.1%
2629812 2
 
< 0.1%
2629809 3
 
< 0.1%
2629808 9
< 0.1%
2629807 8
< 0.1%
Distinct152835
Distinct (%)4.6%
Missing0
Missing (%)0.0%
Memory size25.5 MiB
2024-02-13T20:58:21.340595image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length12
Median length8
Mean length8.048733477
Min length8

Characters and Unicode

Total characters26913355
Distinct characters25
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique22669 ?
Unique (%)0.7%

Sample

1st rowc91b12ff
2nd rowc91b12ff
3rd rowc91b12ff
4th rowc91b12ff
5th rowc91b12ff
ValueCountFrequency (%)
5e180ef0 209085
 
6.3%
6a3d9351 18121
 
0.5%
f10df922 14002
 
0.4%
p114_118_163 11723
 
0.4%
p157_88_183 10887
 
0.3%
7444479d 10504
 
0.3%
a645aae1 8835
 
0.3%
b09374c3 8508
 
0.3%
36a9355c 8316
 
0.2%
a409d8fa 8263
 
0.2%
Other values (152825) 3035556
90.8%
2024-02-13T20:58:22.224595image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 1956238
 
7.3%
e 1947697
 
7.2%
1 1936881
 
7.2%
5 1811933
 
6.7%
8 1795831
 
6.7%
f 1713339
 
6.4%
4 1633356
 
6.1%
3 1629676
 
6.1%
d 1597050
 
5.9%
9 1580769
 
5.9%
Other values (15) 9310585
34.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 16949878
63.0%
Lowercase Letter 9790872
36.4%
Connector Punctuation 115070
 
0.4%
Uppercase Letter 57535
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1947697
19.9%
f 1713339
17.5%
d 1597050
16.3%
a 1574649
16.1%
c 1504566
15.4%
b 1453491
14.8%
l 28
 
< 0.1%
w 14
 
< 0.1%
i 14
 
< 0.1%
p 8
 
< 0.1%
Other values (2) 16
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
0 1956238
11.5%
1 1936881
11.4%
5 1811933
10.7%
8 1795831
10.6%
4 1633356
9.6%
3 1629676
9.6%
9 1580769
9.3%
7 1553021
9.2%
6 1528513
9.0%
2 1523660
9.0%
Uppercase Letter
ValueCountFrequency (%)
P 57513
> 99.9%
Q 22
 
< 0.1%
Connector Punctuation
ValueCountFrequency (%)
_ 115070
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 17064948
63.4%
Latin 9848407
36.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1947697
19.8%
f 1713339
17.4%
d 1597050
16.2%
a 1574649
16.0%
c 1504566
15.3%
b 1453491
14.8%
P 57513
 
0.6%
l 28
 
< 0.1%
Q 22
 
< 0.1%
w 14
 
< 0.1%
Other values (4) 38
 
< 0.1%
Common
ValueCountFrequency (%)
0 1956238
11.5%
1 1936881
11.4%
5 1811933
10.6%
8 1795831
10.5%
4 1633356
9.6%
3 1629676
9.5%
9 1580769
9.3%
7 1553021
9.1%
6 1528513
9.0%
2 1523660
8.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 26913355
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1956238
 
7.3%
e 1947697
 
7.2%
1 1936881
 
7.2%
5 1811933
 
6.7%
8 1795831
 
6.7%
f 1713339
 
6.4%
4 1633356
 
6.1%
3 1629676
 
6.1%
d 1597050
 
5.9%
9 1580769
 
5.9%
Other values (15) 9310585
34.6%

num_group1
Real number (ℝ)

ZEROS 

Distinct121
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.893117411
Minimum0
Maximum120
Zeros482265
Zeros (%)14.4%
Negative0
Negative (%)0.0%
Memory size25.5 MiB
2024-02-13T20:58:22.502606image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q35
95-th percentile11
Maximum120
Range120
Interquartile range (IQR)4

Descriptive statistics

Standard deviation3.736778505
Coefficient of variation (CV)0.9598422319
Kurtosis28.23892001
Mean3.893117411
Median Absolute Deviation (MAD)2
Skewness2.791610254
Sum13017806
Variance13.9635136
MonotonicityNot monotonic
2024-02-13T20:58:22.761595image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 482265
14.4%
1 463176
13.9%
2 447333
13.4%
3 428710
12.8%
4 407285
12.2%
5 356851
10.7%
6 177060
 
5.3%
7 130208
 
3.9%
8 105525
 
3.2%
9 87921
 
2.6%
Other values (111) 257466
7.7%
ValueCountFrequency (%)
0 482265
14.4%
1 463176
13.9%
2 447333
13.4%
3 428710
12.8%
4 407285
12.2%
ValueCountFrequency (%)
120 2
< 0.1%
119 2
< 0.1%
118 2
< 0.1%
117 2
< 0.1%
116 2
< 0.1%

pmtamount_36A
Real number (ℝ)

Distinct572321
Distinct (%)17.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2260.537387
Minimum0
Maximum87115.6
Zeros28
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size25.5 MiB
2024-02-13T20:58:23.048300image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile200
Q1745.46
median1365.454
Q32632.2
95-th percentile6931.88547
Maximum87115.6
Range87115.6
Interquartile range (IQR)1886.74

Descriptive statistics

Standard deviation3161.294121
Coefficient of variation (CV)1.398470177
Kurtosis46.27256428
Mean2260.537387
Median Absolute Deviation (MAD)799.454
Skewness5.472864184
Sum7558784916
Variance9993780.517
MonotonicityNot monotonic
2024-02-13T20:58:23.293307image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
850 156210
 
4.7%
1000 66534
 
2.0%
600 37836
 
1.1%
565.60004 35777
 
1.1%
2000 30371
 
0.9%
1200 27687
 
0.8%
900 23582
 
0.7%
1400 18448
 
0.6%
800 17254
 
0.5%
1600 16166
 
0.5%
Other values (572311) 2913935
87.1%
ValueCountFrequency (%)
0 28
< 0.1%
0.002 33
< 0.1%
0.004 7
 
< 0.1%
0.006 1
 
< 0.1%
0.008 6
 
< 0.1%
ValueCountFrequency (%)
87115.6 1
 
< 0.1%
68760.805 1
 
< 0.1%
43134.2 1
 
< 0.1%
42500 1670
< 0.1%
42499.8 2
 
< 0.1%
Distinct325
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size25.5 MiB
2024-02-13T20:58:23.875299image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters33438000
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row2018-08-08
2nd row2018-11-28
3rd row2018-09-10
4th row2019-01-04
5th row2018-10-08
ValueCountFrequency (%)
2019-04-02 43358
 
1.3%
2019-04-03 41765
 
1.2%
2019-03-08 41188
 
1.2%
2019-03-07 39847
 
1.2%
2019-01-07 39016
 
1.2%
2019-03-11 37690
 
1.1%
2019-04-09 36223
 
1.1%
2019-02-06 33320
 
1.0%
2019-01-04 33155
 
1.0%
2019-03-14 31469
 
0.9%
Other values (315) 2966769
88.7%
2024-02-13T20:58:24.610302image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 7941380
23.7%
- 6687600
20.0%
1 6086606
18.2%
2 5081042
15.2%
9 3294466
9.9%
8 1185812
 
3.5%
3 829824
 
2.5%
4 646209
 
1.9%
5 580237
 
1.7%
7 561028
 
1.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 26750400
80.0%
Dash Punctuation 6687600
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 7941380
29.7%
1 6086606
22.8%
2 5081042
19.0%
9 3294466
12.3%
8 1185812
 
4.4%
3 829824
 
3.1%
4 646209
 
2.4%
5 580237
 
2.2%
7 561028
 
2.1%
6 543796
 
2.0%
Dash Punctuation
ValueCountFrequency (%)
- 6687600
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 33438000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 7941380
23.7%
- 6687600
20.0%
1 6086606
18.2%
2 5081042
15.2%
9 3294466
9.9%
8 1185812
 
3.5%
3 829824
 
2.5%
4 646209
 
1.9%
5 580237
 
1.7%
7 561028
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 33438000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 7941380
23.7%
- 6687600
20.0%
1 6086606
18.2%
2 5081042
15.2%
9 3294466
9.9%
8 1185812
 
3.5%
3 829824
 
2.5%
4 646209
 
1.9%
5 580237
 
1.7%
7 561028
 
1.7%