Overview

Dataset statistics

Number of variables6
Number of observations1286755
Missing cells10722
Missing cells (%)0.1%
Total size in memory58.9 MiB
Average record size in memory48.0 B

Variable types

Numeric5
Text1

Alerts

pmts_dpdvalue_108P is highly skewed (γ1 = 217.5514208)Skewed
pmts_pmtsoverdue_635A is highly skewed (γ1 = 318.3365368)Skewed
num_group1 has 723766 (56.2%) zerosZeros
num_group2 has 81889 (6.4%) zerosZeros
pmts_dpdvalue_108P has 1137032 (88.4%) zerosZeros
pmts_pmtsoverdue_635A has 1136910 (88.4%) zerosZeros

Reproduction

Analysis started2024-02-13 19:53:21.322245
Analysis finished2024-02-13 19:53:22.209261
Duration0.89 seconds
Software versionydata-profiling vv4.6.4
Download configurationconfig.json

Variables

case_id
Real number (ℝ)

Distinct36447
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1229443.91
Minimum467
Maximum2703436
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.8 MiB
2024-02-13T20:53:22.394583image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum467
5-th percentile127544
Q1741898
median1416105
Q31781534
95-th percentile1939410
Maximum2703436
Range2702969
Interquartile range (IQR)1039636

Descriptive statistics

Standard deviation679992.3043
Coefficient of variation (CV)0.5530893267
Kurtosis-0.8850089794
Mean1229443.91
Median Absolute Deviation (MAD)462324
Skewness-0.2751192657
Sum1.581993098 × 1012
Variance4.62389534 × 1011
MonotonicityIncreasing
2024-02-13T20:53:22.650863image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1828337 301
 
< 0.1%
938117 297
 
< 0.1%
1690585 243
 
< 0.1%
1394674 242
 
< 0.1%
1027034 241
 
< 0.1%
1640901 233
 
< 0.1%
33087 230
 
< 0.1%
1923653 225
 
< 0.1%
1596778 216
 
< 0.1%
925676 212
 
< 0.1%
Other values (36437) 1284315
99.8%
ValueCountFrequency (%)
467 30
 
< 0.1%
1445 83
< 0.1%
1934 79
< 0.1%
3159 3
 
< 0.1%
3208 15
 
< 0.1%
ValueCountFrequency (%)
2703436 66
< 0.1%
2703377 36
< 0.1%
2703357 54
< 0.1%
2702661 5
 
< 0.1%
2701996 30
< 0.1%

num_group1
Real number (ℝ)

ZEROS 

Distinct21
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.746740444
Minimum0
Maximum20
Zeros723766
Zeros (%)56.2%
Negative0
Negative (%)0.0%
Memory size9.8 MiB
2024-02-13T20:53:22.818354image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile3
Maximum20
Range20
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.121661243
Coefficient of variation (CV)1.50207646
Kurtosis11.21017492
Mean0.746740444
Median Absolute Deviation (MAD)0
Skewness2.39057725
Sum960872
Variance1.258123943
MonotonicityNot monotonic
2024-02-13T20:53:23.011318image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
0 723766
56.2%
1 327250
25.4%
2 139937
 
10.9%
3 58034
 
4.5%
4 22693
 
1.8%
5 8694
 
0.7%
6 3354
 
0.3%
7 1414
 
0.1%
8 780
 
0.1%
9 349
 
< 0.1%
Other values (11) 484
 
< 0.1%
ValueCountFrequency (%)
0 723766
56.2%
1 327250
25.4%
2 139937
 
10.9%
3 58034
 
4.5%
4 22693
 
1.8%
ValueCountFrequency (%)
20 6
 
< 0.1%
19 36
< 0.1%
18 10
 
< 0.1%
17 5
 
< 0.1%
16 16
< 0.1%

num_group2
Real number (ℝ)

ZEROS 

Distinct37
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12.31972909
Minimum0
Maximum36
Zeros81889
Zeros (%)6.4%
Negative0
Negative (%)0.0%
Memory size9.8 MiB
2024-02-13T20:53:23.170256image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q14
median10
Q319
95-th percentile32
Maximum36
Range36
Interquartile range (IQR)15

Descriptive statistics

Standard deviation10.01712949
Coefficient of variation (CV)0.813096572
Kurtosis-0.6684618538
Mean12.31972909
Median Absolute Deviation (MAD)7
Skewness0.6669823053
Sum15852473
Variance100.3428832
MonotonicityNot monotonic
2024-02-13T20:53:23.322172image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=37)
ValueCountFrequency (%)
0 81889
 
6.4%
1 78332
 
6.1%
2 73867
 
5.7%
3 69081
 
5.4%
4 64582
 
5.0%
5 60414
 
4.7%
6 56217
 
4.4%
7 52446
 
4.1%
8 49081
 
3.8%
9 45908
 
3.6%
Other values (27) 654938
50.9%
ValueCountFrequency (%)
0 81889
6.4%
1 78332
6.1%
2 73867
5.7%
3 69081
5.4%
4 64582
5.0%
ValueCountFrequency (%)
36 8191
0.6%
35 15029
1.2%
34 15496
1.2%
33 15961
1.2%
32 16457
1.3%
Distinct58
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.8 MiB
2024-02-13T20:53:23.632815image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters12867550
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2018-05-15
2nd row2018-11-15
3rd row2018-04-15
4th row2016-10-15
5th row2017-04-15
ValueCountFrequency (%)
2019-04-15 49266
 
3.8%
2019-05-15 48776
 
3.8%
2019-03-15 47654
 
3.7%
2019-06-15 46214
 
3.6%
2019-02-15 45438
 
3.5%
2019-01-15 44165
 
3.4%
2019-09-15 43817
 
3.4%
2019-10-15 43259
 
3.4%
2018-12-15 42182
 
3.3%
2019-07-15 41856
 
3.3%
Other values (48) 834128
64.8%
2024-02-13T20:53:24.073513image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 2965267
23.0%
- 2573510
20.0%
0 2504213
19.5%
2 1646205
12.8%
5 1396423
10.9%
9 630499
 
4.9%
8 477252
 
3.7%
7 306323
 
2.4%
6 153010
 
1.2%
4 107752
 
0.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 10294040
80.0%
Dash Punctuation 2573510
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 2965267
28.8%
0 2504213
24.3%
2 1646205
16.0%
5 1396423
13.6%
9 630499
 
6.1%
8 477252
 
4.6%
7 306323
 
3.0%
6 153010
 
1.5%
4 107752
 
1.0%
3 107096
 
1.0%
Dash Punctuation
ValueCountFrequency (%)
- 2573510
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 12867550
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 2965267
23.0%
- 2573510
20.0%
0 2504213
19.5%
2 1646205
12.8%
5 1396423
10.9%
9 630499
 
4.9%
8 477252
 
3.7%
7 306323
 
2.4%
6 153010
 
1.2%
4 107752
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12867550
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 2965267
23.0%
- 2573510
20.0%
0 2504213
19.5%
2 1646205
12.8%
5 1396423
10.9%
9 630499
 
4.9%
8 477252
 
3.7%
7 306323
 
2.4%
6 153010
 
1.2%
4 107752
 
0.8%

pmts_dpdvalue_108P
Real number (ℝ)

SKEWED  ZEROS 

Distinct62861
Distinct (%)4.9%
Missing5361
Missing (%)0.4%
Infinite0
Infinite (%)0.0%
Mean24370.4546
Minimum0
Maximum185124192
Zeros1137032
Zeros (%)88.4%
Negative0
Negative (%)0.0%
Memory size9.8 MiB
2024-02-13T20:53:24.235349image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile47816
Maximum185124192
Range185124192
Interquartile range (IQR)0

Descriptive statistics

Standard deviation574795.5394
Coefficient of variation (CV)23.58575369
Kurtosis62904.50888
Mean24370.4546
Median Absolute Deviation (MAD)0
Skewness217.5514208
Sum3.12281543 × 1010
Variance3.303899121 × 1011
MonotonicityNot monotonic
2024-02-13T20:53:24.400312image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1137032
88.4%
4500 418
 
< 0.1%
13500 318
 
< 0.1%
9000 309
 
< 0.1%
1 287
 
< 0.1%
17550 172
 
< 0.1%
50400 129
 
< 0.1%
45 128
 
< 0.1%
35100 120
 
< 0.1%
21600 117
 
< 0.1%
Other values (62851) 142364
 
11.1%
(Missing) 5361
 
0.4%
ValueCountFrequency (%)
0 1137032
88.4%
1 287
 
< 0.1%
2 94
 
< 0.1%
3 73
 
< 0.1%
4 80
 
< 0.1%
ValueCountFrequency (%)
185124192 7
< 0.1%
120050072 1
 
< 0.1%
65954828 16
< 0.1%
38912444 1
 
< 0.1%
38749176 1
 
< 0.1%

pmts_pmtsoverdue_635A
Real number (ℝ)

SKEWED  ZEROS 

Distinct4163
Distinct (%)0.3%
Missing5361
Missing (%)0.4%
Infinite0
Infinite (%)0.0%
Mean11.85945366
Minimum0
Maximum147470.61
Zeros1136910
Zeros (%)88.4%
Negative0
Negative (%)0.0%
Memory size9.8 MiB
2024-02-13T20:53:24.572253image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile18.2
Maximum147470.61
Range147470.61
Interquartile range (IQR)0

Descriptive statistics

Standard deviation455.1762424
Coefficient of variation (CV)38.38087786
Kurtosis103110.9284
Mean11.85945366
Median Absolute Deviation (MAD)0
Skewness318.3365368
Sum15196632.76
Variance207185.4116
MonotonicityNot monotonic
2024-02-13T20:53:24.733288image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1136910
88.4%
0.2 14403
 
1.1%
0.4 5374
 
0.4%
0.8 5142
 
0.4%
0.6 4677
 
0.4%
1 3578
 
0.3%
1.6 2977
 
0.2%
1.4 2565
 
0.2%
2.2 2160
 
0.2%
1.2 2115
 
0.2%
Other values (4153) 101493
 
7.9%
(Missing) 5361
 
0.4%
ValueCountFrequency (%)
0 1136910
88.4%
0.2 14403
 
1.1%
0.4 5374
 
0.4%
0.6 4677
 
0.4%
0.8 5142
 
0.4%
ValueCountFrequency (%)
147470.61 2
< 0.1%
147463.4 3
< 0.1%
147456.8 3
< 0.1%
147448.8 4
< 0.1%
1016.2 1
 
< 0.1%