Dataset statistics
| Number of variables | 5 |
|---|---|
| Number of observations | 145086 |
| Missing cells | 79682 |
| Missing cells (%) | 11.0% |
| Total size in memory | 5.5 MiB |
| Average record size in memory | 40.0 B |
Variable types
| Numeric | 3 |
|---|---|
| Text | 2 |
contractenddate_991D has 79682 (54.9%) missing values | Missing |
amount_416A is highly skewed (γ1 = 50.47357923) | Skewed |
amount_416A has 58993 (40.7%) zeros | Zeros |
num_group1 has 105111 (72.4%) zeros | Zeros |
Reproduction
| Analysis started | 2024-02-13 19:53:29.415534 |
|---|---|
| Analysis finished | 2024-02-13 19:53:29.766475 |
| Duration | 0.35 seconds |
| Software version | ydata-profiling vv4.6.4 |
| Download configuration | config.json |
case_id
Real number (ℝ)
| Distinct | 105111 |
|---|---|
| Distinct (%) | 72.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1466214.05 |
| Minimum | 225 |
|---|---|
| Maximum | 2703453 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.1 MiB |
Quantile statistics
| Minimum | 225 |
|---|---|
| 5-th percentile | 127833 |
| Q1 | 660041 |
| median | 1556939 |
| Q3 | 2530539 |
| 95-th percentile | 2666481 |
| Maximum | 2703453 |
| Range | 2703228 |
| Interquartile range (IQR) | 1870498 |
Descriptive statistics
| Standard deviation | 886528.9589 |
|---|---|
| Coefficient of variation (CV) | 0.6046381558 |
| Kurtosis | -1.15861654 |
| Mean | 1466214.05 |
| Median Absolute Deviation (MAD) | 906694 |
| Skewness | -0.2196994507 |
| Sum | 2.127271316 × 1011 |
| Variance | 7.85933595 × 1011 |
| Monotonicity | Increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1377353 | 65 | < 0.1% |
| 1306349 | 32 | < 0.1% |
| 783268 | 32 | < 0.1% |
| 151842 | 31 | < 0.1% |
| 246503 | 31 | < 0.1% |
| 1494474 | 30 | < 0.1% |
| 160829 | 29 | < 0.1% |
| 1590262 | 29 | < 0.1% |
| 1722823 | 28 | < 0.1% |
| 1617931 | 28 | < 0.1% |
| Other values (105101) | 144751 |
| Value | Count | Frequency (%) |
| 225 | 1 | < 0.1% |
| 331 | 1 | < 0.1% |
| 358 | 1 | < 0.1% |
| 390 | 3 | |
| 445 | 5 |
| Value | Count | Frequency (%) |
| 2703453 | 2 | < 0.1% |
| 2703439 | 1 | < 0.1% |
| 2703430 | 9 | |
| 2703427 | 1 | < 0.1% |
| 2703426 | 1 | < 0.1% |
amount_416A
Real number (ℝ)
SKEWED  ZEROS 
| Distinct | 40724 |
|---|---|
| Distinct (%) | 28.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 8422.304482 |
| Minimum | -40000 |
|---|---|
| Maximum | 12213286 |
| Zeros | 58993 |
| Zeros (%) | 40.7% |
| Negative | 10 |
| Negative (%) | < 0.1% |
| Memory size | 1.1 MiB |
Quantile statistics
| Minimum | -40000 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 223.658 |
| Q3 | 478.34 |
| 95-th percentile | 18437.611 |
| Maximum | 12213286 |
| Range | 12253286 |
| Interquartile range (IQR) | 478.34 |
Descriptive statistics
| Standard deviation | 86232.12048 |
|---|---|
| Coefficient of variation (CV) | 10.23854227 |
| Kurtosis | 5111.536895 |
| Mean | 8422.304482 |
| Median Absolute Deviation (MAD) | 223.658 |
| Skewness | 50.47357923 |
| Sum | 1221958468 |
| Variance | 7435978602 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 58993 | |
| 202.008 | 946 | 0.7% |
| 204.04001 | 783 | 0.5% |
| 204.03801 | 696 | 0.5% |
| 202.00601 | 612 | 0.4% |
| 202.01 | 308 | 0.2% |
| 202.00401 | 274 | 0.2% |
| 1010.04803 | 145 | 0.1% |
| 202.002 | 115 | 0.1% |
| 1010.05 | 109 | 0.1% |
| Other values (40714) | 82105 |
| Value | Count | Frequency (%) |
| -40000 | 1 | < 0.1% |
| -33779.152 | 1 | < 0.1% |
| -10000 | 3 | |
| -8000 | 1 | < 0.1% |
| -4000 | 3 |
| Value | Count | Frequency (%) |
| 12213286 | 1 | |
| 9502137 | 2 | |
| 4216085.5 | 1 | |
| 4045444.5 | 1 | |
| 4020477.2 | 1 |
MISSING 
| Distinct | 1524 |
|---|---|
| Distinct (%) | 2.3% |
| Missing | 79682 |
| Missing (%) | 54.9% |
| Memory size | 1.1 MiB |
Length
| Max length | 10 |
|---|---|
| Median length | 10 |
| Mean length | 10 |
| Min length | 10 |
Characters and Unicode
| Total characters | 654040 |
|---|---|
| Distinct characters | 11 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 354 ? |
|---|---|
| Unique (%) | 0.5% |
Sample
| 1st row | 2018-03-18 |
|---|---|
| 2nd row | 2017-07-22 |
| 3rd row | 2017-09-30 |
| 4th row | 2017-07-31 |
| 5th row | 2018-02-08 |
| Value | Count | Frequency (%) |
| 2017-08-03 | 295 | 0.5% |
| 2017-12-31 | 293 | 0.4% |
| 2017-09-01 | 292 | 0.4% |
| 2017-09-08 | 282 | 0.4% |
| 2017-12-23 | 280 | 0.4% |
| 2017-09-30 | 279 | 0.4% |
| 2017-12-03 | 277 | 0.4% |
| 2017-12-22 | 261 | 0.4% |
| 2017-09-03 | 259 | 0.4% |
| 2017-12-24 | 257 | 0.4% |
| Other values (1514) | 62629 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 142764 | |
| - | 130808 | |
| 1 | 126554 | |
| 2 | 106783 | |
| 8 | 44433 | 6.8% |
| 7 | 40025 | 6.1% |
| 3 | 15826 | 2.4% |
| 9 | 15458 | 2.4% |
| 6 | 12228 | 1.9% |
| 4 | 9776 | 1.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 523232 | |
| Dash Punctuation | 130808 | 20.0% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 142764 | |
| 1 | 126554 | |
| 2 | 106783 | |
| 8 | 44433 | 8.5% |
| 7 | 40025 | 7.6% |
| 3 | 15826 | 3.0% |
| 9 | 15458 | 3.0% |
| 6 | 12228 | 2.3% |
| 4 | 9776 | 1.9% |
| 5 | 9385 | 1.8% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 130808 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 654040 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 142764 | |
| - | 130808 | |
| 1 | 126554 | |
| 2 | 106783 | |
| 8 | 44433 | 6.8% |
| 7 | 40025 | 6.1% |
| 3 | 15826 | 2.4% |
| 9 | 15458 | 2.4% |
| 6 | 12228 | 1.9% |
| 4 | 9776 | 1.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 654040 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 142764 | |
| - | 130808 | |
| 1 | 126554 | |
| 2 | 106783 | |
| 8 | 44433 | 6.8% |
| 7 | 40025 | 6.1% |
| 3 | 15826 | 2.4% |
| 9 | 15458 | 2.4% |
| 6 | 12228 | 1.9% |
| 4 | 9776 | 1.5% |
num_group1
Real number (ℝ)
ZEROS 
| Distinct | 65 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.5225314641 |
| Minimum | 0 |
|---|---|
| Maximum | 64 |
| Zeros | 105111 |
| Zeros (%) | 72.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.1 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 2 |
| Maximum | 64 |
| Range | 64 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.620954397 |
|---|---|
| Coefficient of variation (CV) | 3.102118261 |
| Kurtosis | 274.5852851 |
| Mean | 0.5225314641 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 12.30842882 |
| Sum | 75812 |
| Variance | 2.627493157 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 105111 | |
| 1 | 26045 | 18.0% |
| 2 | 7878 | 5.4% |
| 3 | 2701 | 1.9% |
| 4 | 1190 | 0.8% |
| 5 | 568 | 0.4% |
| 6 | 348 | 0.2% |
| 7 | 251 | 0.2% |
| 8 | 173 | 0.1% |
| 9 | 131 | 0.1% |
| Other values (55) | 690 | 0.5% |
| Value | Count | Frequency (%) |
| 0 | 105111 | |
| 1 | 26045 | 18.0% |
| 2 | 7878 | 5.4% |
| 3 | 2701 | 1.9% |
| 4 | 1190 | 0.8% |
| Value | Count | Frequency (%) |
| 64 | 1 | |
| 63 | 1 | |
| 62 | 1 | |
| 61 | 1 | |
| 60 | 1 |
openingdate_313D
Text
| Distinct | 1579 |
|---|---|
| Distinct (%) | 1.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.1 MiB |
Length
| Max length | 10 |
|---|---|
| Median length | 10 |
| Mean length | 10 |
| Min length | 10 |
Characters and Unicode
| Total characters | 1450860 |
|---|---|
| Distinct characters | 11 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 162 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | 2016-08-16 |
|---|---|
| 2nd row | 2015-03-19 |
| 3rd row | 2014-09-02 |
| 4th row | 2014-07-23 |
| 5th row | 2016-06-08 |
| Value | Count | Frequency (%) |
| 2014-07-11 | 368 | 0.3% |
| 2014-04-11 | 306 | 0.2% |
| 2014-03-28 | 304 | 0.2% |
| 2013-12-26 | 301 | 0.2% |
| 2014-04-09 | 301 | 0.2% |
| 2014-04-14 | 295 | 0.2% |
| 2014-04-02 | 292 | 0.2% |
| 2014-01-06 | 289 | 0.2% |
| 2014-05-30 | 283 | 0.2% |
| 2013-12-23 | 282 | 0.2% |
| Other values (1569) | 142065 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 321975 | |
| - | 290172 | |
| 1 | 268553 | |
| 2 | 233905 | |
| 4 | 70022 | 4.8% |
| 5 | 62496 | 4.3% |
| 6 | 61629 | 4.2% |
| 3 | 47351 | 3.3% |
| 7 | 43192 | 3.0% |
| 9 | 27336 | 1.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1160688 | |
| Dash Punctuation | 290172 | 20.0% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 321975 | |
| 1 | 268553 | |
| 2 | 233905 | |
| 4 | 70022 | 6.0% |
| 5 | 62496 | 5.4% |
| 6 | 61629 | 5.3% |
| 3 | 47351 | 4.1% |
| 7 | 43192 | 3.7% |
| 9 | 27336 | 2.4% |
| 8 | 24229 | 2.1% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 290172 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1450860 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 321975 | |
| - | 290172 | |
| 1 | 268553 | |
| 2 | 233905 | |
| 4 | 70022 | 4.8% |
| 5 | 62496 | 4.3% |
| 6 | 61629 | 4.2% |
| 3 | 47351 | 3.3% |
| 7 | 43192 | 3.0% |
| 9 | 27336 | 1.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1450860 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 321975 | |
| - | 290172 | |
| 1 | 268553 | |
| 2 | 233905 | |
| 4 | 70022 | 4.8% |
| 5 | 62496 | 4.3% |
| 6 | 61629 | 4.2% |
| 3 | 47351 | 3.3% |
| 7 | 43192 | 3.0% |
| 9 | 27336 | 1.9% |