Overview

Dataset statistics

Number of variables6
Number of observations157302
Missing cells450239
Missing cells (%)47.7%
Total size in memory7.2 MiB
Average record size in memory48.0 B

Variable types

Numeric5
Text1

Alerts

last180dayaveragebalance_704A has 145086 (92.2%) missing valuesMissing
last180dayturnover_1134A has 146221 (93.0%) missing valuesMissing
last30dayturnover_651A has 146221 (93.0%) missing valuesMissing
openingdate_857D has 12711 (8.1%) missing valuesMissing
last180dayaveragebalance_704A is highly skewed (γ1 = 38.97588207)Skewed
last180dayaveragebalance_704A has 8050 (5.1%) zerosZeros
last30dayturnover_651A has 9808 (6.2%) zerosZeros
num_group1 has 111772 (71.1%) zerosZeros

Reproduction

Analysis started2024-02-13 19:53:25.443889
Analysis finished2024-02-13 19:53:25.711245
Duration0.27 seconds
Software versionydata-profiling vv4.6.4
Download configurationconfig.json

Variables

case_id
Real number (ℝ)

Distinct111772
Distinct (%)71.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1468783.783
Minimum225
Maximum2703453
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.2 MiB
2024-02-13T20:53:25.830280image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum225
5-th percentile127698
Q1649173
median1560121
Q32531589.5
95-th percentile2666770.7
Maximum2703453
Range2703228
Interquartile range (IQR)1882416.5

Descriptive statistics

Standard deviation888331.5764
Coefficient of variation (CV)0.6048075876
Kurtosis-1.161136405
Mean1468783.783
Median Absolute Deviation (MAD)938384.5
Skewness-0.2257875647
Sum2.310426266 × 1011
Variance7.891329896 × 1011
MonotonicityIncreasing
2024-02-13T20:53:26.051872image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1377353 66
 
< 0.1%
1494474 33
 
< 0.1%
151842 32
 
< 0.1%
246503 32
 
< 0.1%
783268 32
 
< 0.1%
1306349 32
 
< 0.1%
1590262 29
 
< 0.1%
160829 29
 
< 0.1%
216742 29
 
< 0.1%
1617931 29
 
< 0.1%
Other values (111762) 156959
99.8%
ValueCountFrequency (%)
225 1
 
< 0.1%
331 1
 
< 0.1%
358 1
 
< 0.1%
390 3
< 0.1%
445 5
< 0.1%
ValueCountFrequency (%)
2703453 2
 
< 0.1%
2703439 1
 
< 0.1%
2703430 9
< 0.1%
2703427 1
 
< 0.1%
2703426 1
 
< 0.1%

last180dayaveragebalance_704A
Real number (ℝ)

MISSING  SKEWED  ZEROS 

Distinct2495
Distinct (%)20.4%
Missing145086
Missing (%)92.2%
Infinite0
Infinite (%)0.0%
Mean109.6358841
Minimum-308.79413
Maximum67777.77
Zeros8050
Zeros (%)5.1%
Negative1
Negative (%)< 0.1%
Memory size1.2 MiB
2024-02-13T20:53:26.235906image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum-308.79413
5-th percentile0
Q10
median0
Q31.053898025
95-th percentile352.450845
Maximum67777.77
Range68086.56413
Interquartile range (IQR)1.053898025

Descriptive statistics

Standard deviation949.9974577
Coefficient of variation (CV)8.665023005
Kurtosis2350.480947
Mean109.6358841
Median Absolute Deviation (MAD)0
Skewness38.97588207
Sum1339311.96
Variance902495.1697
MonotonicityNot monotonic
2024-02-13T20:53:26.403491image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 8050
 
5.1%
6 52
 
< 0.1%
0.080000006 47
 
< 0.1%
0.120000005 46
 
< 0.1%
0.16000001 36
 
< 0.1%
0.040000003 35
 
< 0.1%
2 29
 
< 0.1%
10 26
 
< 0.1%
4 25
 
< 0.1%
0.1 21
 
< 0.1%
Other values (2485) 3849
 
2.4%
(Missing) 145086
92.2%
ValueCountFrequency (%)
-308.79413 1
 
< 0.1%
0 8050
5.1%
0.00020000001 1
 
< 0.1%
0.00088799995 1
 
< 0.1%
0.001334 1
 
< 0.1%
ValueCountFrequency (%)
67777.77 1
< 0.1%
32115.504 2
< 0.1%
14724.3545 2
< 0.1%
14712.222 1
< 0.1%
14587.108 1
< 0.1%

last180dayturnover_1134A
Real number (ℝ)

MISSING 

Distinct2581
Distinct (%)23.3%
Missing146221
Missing (%)93.0%
Infinite0
Infinite (%)0.0%
Mean38494.50852
Minimum-187780
Maximum1161820
Zeros2
Zeros (%)< 0.1%
Negative9
Negative (%)< 0.1%
Memory size1.2 MiB
2024-02-13T20:53:26.835457image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum-187780
5-th percentile25
Q17878
median30000
Q360000
95-th percentile100000
Maximum1161820
Range1349600
Interquartile range (IQR)52122

Descriptive statistics

Standard deviation41400.58901
Coefficient of variation (CV)1.075493378
Kurtosis75.55276997
Mean38494.50852
Median Absolute Deviation (MAD)25980
Skewness4.418855267
Sum426557648.9
Variance1714008771
MonotonicityNot monotonic
2024-02-13T20:53:27.016459image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
60000 650
 
0.4%
100000 574
 
0.4%
20000 474
 
0.3%
40000 325
 
0.2%
30000 302
 
0.2%
24000 241
 
0.2%
44000 236
 
0.2%
34000 234
 
0.1%
10000 156
 
0.1%
54000 131
 
0.1%
Other values (2571) 7758
 
4.9%
(Missing) 146221
93.0%
ValueCountFrequency (%)
-187780 1
< 0.1%
-61300 1
< 0.1%
-35600 1
< 0.1%
-20000 1
< 0.1%
-13400 2
< 0.1%
ValueCountFrequency (%)
1161820 1
< 0.1%
900000 1
< 0.1%
547765 2
< 0.1%
518296.22 2
< 0.1%
401598.8 1
< 0.1%

last30dayturnover_651A
Real number (ℝ)

MISSING  ZEROS 

Distinct482
Distinct (%)4.3%
Missing146221
Missing (%)93.0%
Infinite0
Infinite (%)0.0%
Mean4955.383495
Minimum-477.506
Maximum390000
Zeros9808
Zeros (%)6.2%
Negative1
Negative (%)< 0.1%
Memory size1.2 MiB
2024-02-13T20:53:27.185434image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum-477.506
5-th percentile0
Q10
median0
Q30
95-th percentile37819
Maximum390000
Range390477.506
Interquartile range (IQR)0

Descriptive statistics

Standard deviation19217.73695
Coefficient of variation (CV)3.878153319
Kurtosis41.10797216
Mean4955.383495
Median Absolute Deviation (MAD)0
Skewness5.440935324
Sum54910604.51
Variance369321413.4
MonotonicityNot monotonic
2024-02-13T20:53:27.388861image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 9808
 
6.2%
100000 70
 
< 0.1%
60000 55
 
< 0.1%
34000 37
 
< 0.1%
20000 33
 
< 0.1%
40000 31
 
< 0.1%
24000 31
 
< 0.1%
44000 28
 
< 0.1%
64000 24
 
< 0.1%
150000 23
 
< 0.1%
Other values (472) 941
 
0.6%
(Missing) 146221
93.0%
ValueCountFrequency (%)
-477.506 1
 
< 0.1%
0 9808
6.2%
0.040000003 1
 
< 0.1%
0.080000006 3
 
< 0.1%
0.102 4
 
< 0.1%
ValueCountFrequency (%)
390000 1
< 0.1%
200000 2
< 0.1%
199980 2
< 0.1%
197980 2
< 0.1%
191930 1
< 0.1%

num_group1
Real number (ℝ)

ZEROS 

Distinct66
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5493064297
Minimum0
Maximum65
Zeros111772
Zeros (%)71.1%
Negative0
Negative (%)0.0%
Memory size1.2 MiB
2024-02-13T20:53:27.560096image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile2
Maximum65
Range65
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.639081645
Coefficient of variation (CV)2.983911267
Kurtosis265.7858492
Mean0.5493064297
Median Absolute Deviation (MAD)0
Skewness12.02821433
Sum86407
Variance2.686588638
MonotonicityNot monotonic
2024-02-13T20:53:27.821071image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 111772
71.1%
1 29309
 
18.6%
2 9172
 
5.8%
3 3231
 
2.1%
4 1412
 
0.9%
5 639
 
0.4%
6 397
 
0.3%
7 279
 
0.2%
8 191
 
0.1%
9 140
 
0.1%
Other values (56) 760
 
0.5%
ValueCountFrequency (%)
0 111772
71.1%
1 29309
 
18.6%
2 9172
 
5.8%
3 3231
 
2.1%
4 1412
 
0.9%
ValueCountFrequency (%)
65 1
< 0.1%
64 1
< 0.1%
63 1
< 0.1%
62 1
< 0.1%
61 1
< 0.1%

openingdate_857D
Text

MISSING 

Distinct1578
Distinct (%)1.1%
Missing12711
Missing (%)8.1%
Memory size1.2 MiB
2024-02-13T20:53:28.219973image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters1445910
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique164 ?
Unique (%)0.1%

Sample

1st row2016-08-16
2nd row2015-03-19
3rd row2014-09-02
4th row2014-07-23
5th row2016-06-08
ValueCountFrequency (%)
2014-07-11 368
 
0.3%
2014-04-11 306
 
0.2%
2014-03-28 304
 
0.2%
2014-04-09 301
 
0.2%
2013-12-26 301
 
0.2%
2014-04-14 295
 
0.2%
2014-04-02 292
 
0.2%
2014-01-06 289
 
0.2%
2013-12-23 282
 
0.2%
2014-05-30 281
 
0.2%
Other values (1568) 141572
97.9%
2024-02-13T20:53:28.779762image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 321048
22.2%
- 289182
20.0%
1 267223
18.5%
2 233065
16.1%
4 69996
 
4.8%
5 63818
 
4.4%
6 60488
 
4.2%
3 47179
 
3.3%
7 42483
 
2.9%
9 27284
 
1.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1156728
80.0%
Dash Punctuation 289182
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 321048
27.8%
1 267223
23.1%
2 233065
20.1%
4 69996
 
6.1%
5 63818
 
5.5%
6 60488
 
5.2%
3 47179
 
4.1%
7 42483
 
3.7%
9 27284
 
2.4%
8 24144
 
2.1%
Dash Punctuation
ValueCountFrequency (%)
- 289182
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1445910
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 321048
22.2%
- 289182
20.0%
1 267223
18.5%
2 233065
16.1%
4 69996
 
4.8%
5 63818
 
4.4%
6 60488
 
4.2%
3 47179
 
3.3%
7 42483
 
2.9%
9 27284
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1445910
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 321048
22.2%
- 289182
20.0%
1 267223
18.5%
2 233065
16.1%
4 69996
 
4.8%
5 63818
 
4.4%
6 60488
 
4.2%
3 47179
 
3.3%
7 42483
 
2.9%
9 27284
 
1.9%