Dataset statistics
| Number of variables | 6 |
|---|---|
| Number of observations | 157302 |
| Missing cells | 450239 |
| Missing cells (%) | 47.7% |
| Total size in memory | 7.2 MiB |
| Average record size in memory | 48.0 B |
Variable types
| Numeric | 5 |
|---|---|
| Text | 1 |
last180dayaveragebalance_704A has 145086 (92.2%) missing values | Missing |
last180dayturnover_1134A has 146221 (93.0%) missing values | Missing |
last30dayturnover_651A has 146221 (93.0%) missing values | Missing |
openingdate_857D has 12711 (8.1%) missing values | Missing |
last180dayaveragebalance_704A is highly skewed (γ1 = 38.97588207) | Skewed |
last180dayaveragebalance_704A has 8050 (5.1%) zeros | Zeros |
last30dayturnover_651A has 9808 (6.2%) zeros | Zeros |
num_group1 has 111772 (71.1%) zeros | Zeros |
Reproduction
| Analysis started | 2024-02-13 19:53:25.443889 |
|---|---|
| Analysis finished | 2024-02-13 19:53:25.711245 |
| Duration | 0.27 seconds |
| Software version | ydata-profiling vv4.6.4 |
| Download configuration | config.json |
case_id
Real number (ℝ)
| Distinct | 111772 |
|---|---|
| Distinct (%) | 71.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1468783.783 |
| Minimum | 225 |
|---|---|
| Maximum | 2703453 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.2 MiB |
Quantile statistics
| Minimum | 225 |
|---|---|
| 5-th percentile | 127698 |
| Q1 | 649173 |
| median | 1560121 |
| Q3 | 2531589.5 |
| 95-th percentile | 2666770.7 |
| Maximum | 2703453 |
| Range | 2703228 |
| Interquartile range (IQR) | 1882416.5 |
Descriptive statistics
| Standard deviation | 888331.5764 |
|---|---|
| Coefficient of variation (CV) | 0.6048075876 |
| Kurtosis | -1.161136405 |
| Mean | 1468783.783 |
| Median Absolute Deviation (MAD) | 938384.5 |
| Skewness | -0.2257875647 |
| Sum | 2.310426266 × 1011 |
| Variance | 7.891329896 × 1011 |
| Monotonicity | Increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1377353 | 66 | < 0.1% |
| 1494474 | 33 | < 0.1% |
| 151842 | 32 | < 0.1% |
| 246503 | 32 | < 0.1% |
| 783268 | 32 | < 0.1% |
| 1306349 | 32 | < 0.1% |
| 1590262 | 29 | < 0.1% |
| 160829 | 29 | < 0.1% |
| 216742 | 29 | < 0.1% |
| 1617931 | 29 | < 0.1% |
| Other values (111762) | 156959 |
| Value | Count | Frequency (%) |
| 225 | 1 | < 0.1% |
| 331 | 1 | < 0.1% |
| 358 | 1 | < 0.1% |
| 390 | 3 | |
| 445 | 5 |
| Value | Count | Frequency (%) |
| 2703453 | 2 | < 0.1% |
| 2703439 | 1 | < 0.1% |
| 2703430 | 9 | |
| 2703427 | 1 | < 0.1% |
| 2703426 | 1 | < 0.1% |
last180dayaveragebalance_704A
Real number (ℝ)
MISSING  SKEWED  ZEROS 
| Distinct | 2495 |
|---|---|
| Distinct (%) | 20.4% |
| Missing | 145086 |
| Missing (%) | 92.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 109.6358841 |
| Minimum | -308.79413 |
|---|---|
| Maximum | 67777.77 |
| Zeros | 8050 |
| Zeros (%) | 5.1% |
| Negative | 1 |
| Negative (%) | < 0.1% |
| Memory size | 1.2 MiB |
Quantile statistics
| Minimum | -308.79413 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1.053898025 |
| 95-th percentile | 352.450845 |
| Maximum | 67777.77 |
| Range | 68086.56413 |
| Interquartile range (IQR) | 1.053898025 |
Descriptive statistics
| Standard deviation | 949.9974577 |
|---|---|
| Coefficient of variation (CV) | 8.665023005 |
| Kurtosis | 2350.480947 |
| Mean | 109.6358841 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 38.97588207 |
| Sum | 1339311.96 |
| Variance | 902495.1697 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 8050 | 5.1% |
| 6 | 52 | < 0.1% |
| 0.080000006 | 47 | < 0.1% |
| 0.120000005 | 46 | < 0.1% |
| 0.16000001 | 36 | < 0.1% |
| 0.040000003 | 35 | < 0.1% |
| 2 | 29 | < 0.1% |
| 10 | 26 | < 0.1% |
| 4 | 25 | < 0.1% |
| 0.1 | 21 | < 0.1% |
| Other values (2485) | 3849 | 2.4% |
| (Missing) | 145086 |
| Value | Count | Frequency (%) |
| -308.79413 | 1 | < 0.1% |
| 0 | 8050 | |
| 0.00020000001 | 1 | < 0.1% |
| 0.00088799995 | 1 | < 0.1% |
| 0.001334 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 67777.77 | 1 | |
| 32115.504 | 2 | |
| 14724.3545 | 2 | |
| 14712.222 | 1 | |
| 14587.108 | 1 |
last180dayturnover_1134A
Real number (ℝ)
MISSING 
| Distinct | 2581 |
|---|---|
| Distinct (%) | 23.3% |
| Missing | 146221 |
| Missing (%) | 93.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 38494.50852 |
| Minimum | -187780 |
|---|---|
| Maximum | 1161820 |
| Zeros | 2 |
| Zeros (%) | < 0.1% |
| Negative | 9 |
| Negative (%) | < 0.1% |
| Memory size | 1.2 MiB |
Quantile statistics
| Minimum | -187780 |
|---|---|
| 5-th percentile | 25 |
| Q1 | 7878 |
| median | 30000 |
| Q3 | 60000 |
| 95-th percentile | 100000 |
| Maximum | 1161820 |
| Range | 1349600 |
| Interquartile range (IQR) | 52122 |
Descriptive statistics
| Standard deviation | 41400.58901 |
|---|---|
| Coefficient of variation (CV) | 1.075493378 |
| Kurtosis | 75.55276997 |
| Mean | 38494.50852 |
| Median Absolute Deviation (MAD) | 25980 |
| Skewness | 4.418855267 |
| Sum | 426557648.9 |
| Variance | 1714008771 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 60000 | 650 | 0.4% |
| 100000 | 574 | 0.4% |
| 20000 | 474 | 0.3% |
| 40000 | 325 | 0.2% |
| 30000 | 302 | 0.2% |
| 24000 | 241 | 0.2% |
| 44000 | 236 | 0.2% |
| 34000 | 234 | 0.1% |
| 10000 | 156 | 0.1% |
| 54000 | 131 | 0.1% |
| Other values (2571) | 7758 | 4.9% |
| (Missing) | 146221 |
| Value | Count | Frequency (%) |
| -187780 | 1 | |
| -61300 | 1 | |
| -35600 | 1 | |
| -20000 | 1 | |
| -13400 | 2 |
| Value | Count | Frequency (%) |
| 1161820 | 1 | |
| 900000 | 1 | |
| 547765 | 2 | |
| 518296.22 | 2 | |
| 401598.8 | 1 |
last30dayturnover_651A
Real number (ℝ)
MISSING  ZEROS 
| Distinct | 482 |
|---|---|
| Distinct (%) | 4.3% |
| Missing | 146221 |
| Missing (%) | 93.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4955.383495 |
| Minimum | -477.506 |
|---|---|
| Maximum | 390000 |
| Zeros | 9808 |
| Zeros (%) | 6.2% |
| Negative | 1 |
| Negative (%) | < 0.1% |
| Memory size | 1.2 MiB |
Quantile statistics
| Minimum | -477.506 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 37819 |
| Maximum | 390000 |
| Range | 390477.506 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 19217.73695 |
|---|---|
| Coefficient of variation (CV) | 3.878153319 |
| Kurtosis | 41.10797216 |
| Mean | 4955.383495 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 5.440935324 |
| Sum | 54910604.51 |
| Variance | 369321413.4 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 9808 | 6.2% |
| 100000 | 70 | < 0.1% |
| 60000 | 55 | < 0.1% |
| 34000 | 37 | < 0.1% |
| 20000 | 33 | < 0.1% |
| 40000 | 31 | < 0.1% |
| 24000 | 31 | < 0.1% |
| 44000 | 28 | < 0.1% |
| 64000 | 24 | < 0.1% |
| 150000 | 23 | < 0.1% |
| Other values (472) | 941 | 0.6% |
| (Missing) | 146221 |
| Value | Count | Frequency (%) |
| -477.506 | 1 | < 0.1% |
| 0 | 9808 | |
| 0.040000003 | 1 | < 0.1% |
| 0.080000006 | 3 | < 0.1% |
| 0.102 | 4 | < 0.1% |
| Value | Count | Frequency (%) |
| 390000 | 1 | |
| 200000 | 2 | |
| 199980 | 2 | |
| 197980 | 2 | |
| 191930 | 1 |
num_group1
Real number (ℝ)
ZEROS 
| Distinct | 66 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.5493064297 |
| Minimum | 0 |
|---|---|
| Maximum | 65 |
| Zeros | 111772 |
| Zeros (%) | 71.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.2 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 2 |
| Maximum | 65 |
| Range | 65 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.639081645 |
|---|---|
| Coefficient of variation (CV) | 2.983911267 |
| Kurtosis | 265.7858492 |
| Mean | 0.5493064297 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 12.02821433 |
| Sum | 86407 |
| Variance | 2.686588638 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 111772 | |
| 1 | 29309 | 18.6% |
| 2 | 9172 | 5.8% |
| 3 | 3231 | 2.1% |
| 4 | 1412 | 0.9% |
| 5 | 639 | 0.4% |
| 6 | 397 | 0.3% |
| 7 | 279 | 0.2% |
| 8 | 191 | 0.1% |
| 9 | 140 | 0.1% |
| Other values (56) | 760 | 0.5% |
| Value | Count | Frequency (%) |
| 0 | 111772 | |
| 1 | 29309 | 18.6% |
| 2 | 9172 | 5.8% |
| 3 | 3231 | 2.1% |
| 4 | 1412 | 0.9% |
| Value | Count | Frequency (%) |
| 65 | 1 | |
| 64 | 1 | |
| 63 | 1 | |
| 62 | 1 | |
| 61 | 1 |
openingdate_857D
Text
MISSING 
| Distinct | 1578 |
|---|---|
| Distinct (%) | 1.1% |
| Missing | 12711 |
| Missing (%) | 8.1% |
| Memory size | 1.2 MiB |
Length
| Max length | 10 |
|---|---|
| Median length | 10 |
| Mean length | 10 |
| Min length | 10 |
Characters and Unicode
| Total characters | 1445910 |
|---|---|
| Distinct characters | 11 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 164 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | 2016-08-16 |
|---|---|
| 2nd row | 2015-03-19 |
| 3rd row | 2014-09-02 |
| 4th row | 2014-07-23 |
| 5th row | 2016-06-08 |
| Value | Count | Frequency (%) |
| 2014-07-11 | 368 | 0.3% |
| 2014-04-11 | 306 | 0.2% |
| 2014-03-28 | 304 | 0.2% |
| 2014-04-09 | 301 | 0.2% |
| 2013-12-26 | 301 | 0.2% |
| 2014-04-14 | 295 | 0.2% |
| 2014-04-02 | 292 | 0.2% |
| 2014-01-06 | 289 | 0.2% |
| 2013-12-23 | 282 | 0.2% |
| 2014-05-30 | 281 | 0.2% |
| Other values (1568) | 141572 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 321048 | |
| - | 289182 | |
| 1 | 267223 | |
| 2 | 233065 | |
| 4 | 69996 | 4.8% |
| 5 | 63818 | 4.4% |
| 6 | 60488 | 4.2% |
| 3 | 47179 | 3.3% |
| 7 | 42483 | 2.9% |
| 9 | 27284 | 1.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1156728 | |
| Dash Punctuation | 289182 | 20.0% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 321048 | |
| 1 | 267223 | |
| 2 | 233065 | |
| 4 | 69996 | 6.1% |
| 5 | 63818 | 5.5% |
| 6 | 60488 | 5.2% |
| 3 | 47179 | 4.1% |
| 7 | 42483 | 3.7% |
| 9 | 27284 | 2.4% |
| 8 | 24144 | 2.1% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 289182 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1445910 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 321048 | |
| - | 289182 | |
| 1 | 267223 | |
| 2 | 233065 | |
| 4 | 69996 | 4.8% |
| 5 | 63818 | 4.4% |
| 6 | 60488 | 4.2% |
| 3 | 47179 | 3.3% |
| 7 | 42483 | 2.9% |
| 9 | 27284 | 1.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1445910 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 321048 | |
| - | 289182 | |
| 1 | 267223 | |
| 2 | 233065 | |
| 4 | 69996 | 4.8% |
| 5 | 63818 | 4.4% |
| 6 | 60488 | 4.2% |
| 3 | 47179 | 3.3% |
| 7 | 42483 | 2.9% |
| 9 | 27284 | 1.9% |