Overview

Dataset statistics

Number of variables5
Number of observations145086
Missing cells79682
Missing cells (%)11.0%
Total size in memory5.5 MiB
Average record size in memory40.0 B

Variable types

Numeric3
Text2

Alerts

contractenddate_991D has 79682 (54.9%) missing valuesMissing
amount_416A is highly skewed (γ1 = 50.47357923)Skewed
amount_416A has 58993 (40.7%) zerosZeros
num_group1 has 105111 (72.4%) zerosZeros

Reproduction

Analysis started2024-02-13 19:53:29.415534
Analysis finished2024-02-13 19:53:29.766475
Duration0.35 seconds
Software versionydata-profiling vv4.6.4
Download configurationconfig.json

Variables

case_id
Real number (ℝ)

Distinct105111
Distinct (%)72.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1466214.05
Minimum225
Maximum2703453
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.1 MiB
2024-02-13T20:53:29.918479image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum225
5-th percentile127833
Q1660041
median1556939
Q32530539
95-th percentile2666481
Maximum2703453
Range2703228
Interquartile range (IQR)1870498

Descriptive statistics

Standard deviation886528.9589
Coefficient of variation (CV)0.6046381558
Kurtosis-1.15861654
Mean1466214.05
Median Absolute Deviation (MAD)906694
Skewness-0.2196994507
Sum2.127271316 × 1011
Variance7.85933595 × 1011
MonotonicityIncreasing
2024-02-13T20:53:30.094599image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1377353 65
 
< 0.1%
1306349 32
 
< 0.1%
783268 32
 
< 0.1%
151842 31
 
< 0.1%
246503 31
 
< 0.1%
1494474 30
 
< 0.1%
160829 29
 
< 0.1%
1590262 29
 
< 0.1%
1722823 28
 
< 0.1%
1617931 28
 
< 0.1%
Other values (105101) 144751
99.8%
ValueCountFrequency (%)
225 1
 
< 0.1%
331 1
 
< 0.1%
358 1
 
< 0.1%
390 3
< 0.1%
445 5
< 0.1%
ValueCountFrequency (%)
2703453 2
 
< 0.1%
2703439 1
 
< 0.1%
2703430 9
< 0.1%
2703427 1
 
< 0.1%
2703426 1
 
< 0.1%

amount_416A
Real number (ℝ)

SKEWED  ZEROS 

Distinct40724
Distinct (%)28.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8422.304482
Minimum-40000
Maximum12213286
Zeros58993
Zeros (%)40.7%
Negative10
Negative (%)< 0.1%
Memory size1.1 MiB
2024-02-13T20:53:30.259455image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum-40000
5-th percentile0
Q10
median223.658
Q3478.34
95-th percentile18437.611
Maximum12213286
Range12253286
Interquartile range (IQR)478.34

Descriptive statistics

Standard deviation86232.12048
Coefficient of variation (CV)10.23854227
Kurtosis5111.536895
Mean8422.304482
Median Absolute Deviation (MAD)223.658
Skewness50.47357923
Sum1221958468
Variance7435978602
MonotonicityNot monotonic
2024-02-13T20:53:30.446924image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 58993
40.7%
202.008 946
 
0.7%
204.04001 783
 
0.5%
204.03801 696
 
0.5%
202.00601 612
 
0.4%
202.01 308
 
0.2%
202.00401 274
 
0.2%
1010.04803 145
 
0.1%
202.002 115
 
0.1%
1010.05 109
 
0.1%
Other values (40714) 82105
56.6%
ValueCountFrequency (%)
-40000 1
 
< 0.1%
-33779.152 1
 
< 0.1%
-10000 3
< 0.1%
-8000 1
 
< 0.1%
-4000 3
< 0.1%
ValueCountFrequency (%)
12213286 1
< 0.1%
9502137 2
< 0.1%
4216085.5 1
< 0.1%
4045444.5 1
< 0.1%
4020477.2 1
< 0.1%

contractenddate_991D
Text

MISSING 

Distinct1524
Distinct (%)2.3%
Missing79682
Missing (%)54.9%
Memory size1.1 MiB
2024-02-13T20:53:30.948176image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters654040
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique354 ?
Unique (%)0.5%

Sample

1st row2018-03-18
2nd row2017-07-22
3rd row2017-09-30
4th row2017-07-31
5th row2018-02-08
ValueCountFrequency (%)
2017-08-03 295
 
0.5%
2017-12-31 293
 
0.4%
2017-09-01 292
 
0.4%
2017-09-08 282
 
0.4%
2017-12-23 280
 
0.4%
2017-09-30 279
 
0.4%
2017-12-03 277
 
0.4%
2017-12-22 261
 
0.4%
2017-09-03 259
 
0.4%
2017-12-24 257
 
0.4%
Other values (1514) 62629
95.8%
2024-02-13T20:53:31.592752image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 142764
21.8%
- 130808
20.0%
1 126554
19.3%
2 106783
16.3%
8 44433
 
6.8%
7 40025
 
6.1%
3 15826
 
2.4%
9 15458
 
2.4%
6 12228
 
1.9%
4 9776
 
1.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 523232
80.0%
Dash Punctuation 130808
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 142764
27.3%
1 126554
24.2%
2 106783
20.4%
8 44433
 
8.5%
7 40025
 
7.6%
3 15826
 
3.0%
9 15458
 
3.0%
6 12228
 
2.3%
4 9776
 
1.9%
5 9385
 
1.8%
Dash Punctuation
ValueCountFrequency (%)
- 130808
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 654040
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 142764
21.8%
- 130808
20.0%
1 126554
19.3%
2 106783
16.3%
8 44433
 
6.8%
7 40025
 
6.1%
3 15826
 
2.4%
9 15458
 
2.4%
6 12228
 
1.9%
4 9776
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 654040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 142764
21.8%
- 130808
20.0%
1 126554
19.3%
2 106783
16.3%
8 44433
 
6.8%
7 40025
 
6.1%
3 15826
 
2.4%
9 15458
 
2.4%
6 12228
 
1.9%
4 9776
 
1.5%

num_group1
Real number (ℝ)

ZEROS 

Distinct65
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5225314641
Minimum0
Maximum64
Zeros105111
Zeros (%)72.4%
Negative0
Negative (%)0.0%
Memory size1.1 MiB
2024-02-13T20:53:31.781324image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile2
Maximum64
Range64
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.620954397
Coefficient of variation (CV)3.102118261
Kurtosis274.5852851
Mean0.5225314641
Median Absolute Deviation (MAD)0
Skewness12.30842882
Sum75812
Variance2.627493157
MonotonicityNot monotonic
2024-02-13T20:53:31.950876image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 105111
72.4%
1 26045
 
18.0%
2 7878
 
5.4%
3 2701
 
1.9%
4 1190
 
0.8%
5 568
 
0.4%
6 348
 
0.2%
7 251
 
0.2%
8 173
 
0.1%
9 131
 
0.1%
Other values (55) 690
 
0.5%
ValueCountFrequency (%)
0 105111
72.4%
1 26045
 
18.0%
2 7878
 
5.4%
3 2701
 
1.9%
4 1190
 
0.8%
ValueCountFrequency (%)
64 1
< 0.1%
63 1
< 0.1%
62 1
< 0.1%
61 1
< 0.1%
60 1
< 0.1%
Distinct1579
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
2024-02-13T20:53:32.359257image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters1450860
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique162 ?
Unique (%)0.1%

Sample

1st row2016-08-16
2nd row2015-03-19
3rd row2014-09-02
4th row2014-07-23
5th row2016-06-08
ValueCountFrequency (%)
2014-07-11 368
 
0.3%
2014-04-11 306
 
0.2%
2014-03-28 304
 
0.2%
2013-12-26 301
 
0.2%
2014-04-09 301
 
0.2%
2014-04-14 295
 
0.2%
2014-04-02 292
 
0.2%
2014-01-06 289
 
0.2%
2014-05-30 283
 
0.2%
2013-12-23 282
 
0.2%
Other values (1569) 142065
97.9%
2024-02-13T20:53:32.955501image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 321975
22.2%
- 290172
20.0%
1 268553
18.5%
2 233905
16.1%
4 70022
 
4.8%
5 62496
 
4.3%
6 61629
 
4.2%
3 47351
 
3.3%
7 43192
 
3.0%
9 27336
 
1.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1160688
80.0%
Dash Punctuation 290172
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 321975
27.7%
1 268553
23.1%
2 233905
20.2%
4 70022
 
6.0%
5 62496
 
5.4%
6 61629
 
5.3%
3 47351
 
4.1%
7 43192
 
3.7%
9 27336
 
2.4%
8 24229
 
2.1%
Dash Punctuation
ValueCountFrequency (%)
- 290172
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1450860
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 321975
22.2%
- 290172
20.0%
1 268553
18.5%
2 233905
16.1%
4 70022
 
4.8%
5 62496
 
4.3%
6 61629
 
4.2%
3 47351
 
3.3%
7 43192
 
3.0%
9 27336
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1450860
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 321975
22.2%
- 290172
20.0%
1 268553
18.5%
2 233905
16.1%
4 70022
 
4.8%
5 62496
 
4.3%
6 61629
 
4.2%
3 47351
 
3.3%
7 43192
 
3.0%
9 27336
 
1.9%