Dataset statistics
| Number of variables | 5 |
|---|---|
| Number of observations | 3343800 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Total size in memory | 127.6 MiB |
| Average record size in memory | 40.0 B |
Variable types
| Numeric | 3 |
|---|---|
| Text | 2 |
num_group1 has 482265 (14.4%) zeros | Zeros |
Reproduction
| Analysis started | 2024-02-13 19:58:15.065329 |
|---|---|
| Analysis finished | 2024-02-13 19:58:20.376181 |
| Duration | 5.31 seconds |
| Software version | ydata-profiling vv4.6.4 |
| Download configuration | config.json |
case_id
Real number (ℝ)
| Distinct | 482265 |
|---|---|
| Distinct (%) | 14.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1161306.38 |
| Minimum | 357 |
|---|---|
| Maximum | 2629815 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 25.5 MiB |
Quantile statistics
| Minimum | 357 |
|---|---|
| 5-th percentile | 114792 |
| Q1 | 700623 |
| median | 1301411 |
| Q3 | 1471673 |
| 95-th percentile | 2585638 |
| Maximum | 2629815 |
| Range | 2629458 |
| Interquartile range (IQR) | 771050 |
Descriptive statistics
| Standard deviation | 657994.9559 |
|---|---|
| Coefficient of variation (CV) | 0.5665989331 |
| Kurtosis | 0.109580793 |
| Mean | 1161306.38 |
| Median Absolute Deviation (MAD) | 493315 |
| Skewness | 0.449679431 |
| Sum | 3.883176274 × 1012 |
| Variance | 4.32957362 × 1011 |
| Monotonicity | Increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 608018 | 121 | < 0.1% |
| 627764 | 121 | < 0.1% |
| 700836 | 111 | < 0.1% |
| 659955 | 99 | < 0.1% |
| 1339712 | 69 | < 0.1% |
| 1443276 | 64 | < 0.1% |
| 161770 | 63 | < 0.1% |
| 677615 | 63 | < 0.1% |
| 1569035 | 60 | < 0.1% |
| 1318861 | 59 | < 0.1% |
| Other values (482255) | 3342970 |
| Value | Count | Frequency (%) |
| 357 | 6 | |
| 381 | 6 | |
| 388 | 6 | |
| 405 | 6 | |
| 409 | 7 |
| Value | Count | Frequency (%) |
| 2629815 | 11 | |
| 2629812 | 2 | < 0.1% |
| 2629809 | 3 | < 0.1% |
| 2629808 | 9 | |
| 2629807 | 8 |
| Distinct | 152835 |
|---|---|
| Distinct (%) | 4.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 25.5 MiB |
Length
| Max length | 12 |
|---|---|
| Median length | 8 |
| Mean length | 8.048733477 |
| Min length | 8 |
Characters and Unicode
| Total characters | 26913355 |
|---|---|
| Distinct characters | 25 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 22669 ? |
|---|---|
| Unique (%) | 0.7% |
Sample
| 1st row | c91b12ff |
|---|---|
| 2nd row | c91b12ff |
| 3rd row | c91b12ff |
| 4th row | c91b12ff |
| 5th row | c91b12ff |
| Value | Count | Frequency (%) |
| 5e180ef0 | 209085 | 6.3% |
| 6a3d9351 | 18121 | 0.5% |
| f10df922 | 14002 | 0.4% |
| p114_118_163 | 11723 | 0.4% |
| p157_88_183 | 10887 | 0.3% |
| 7444479d | 10504 | 0.3% |
| a645aae1 | 8835 | 0.3% |
| b09374c3 | 8508 | 0.3% |
| 36a9355c | 8316 | 0.2% |
| a409d8fa | 8263 | 0.2% |
| Other values (152825) | 3035556 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 1956238 | 7.3% |
| e | 1947697 | 7.2% |
| 1 | 1936881 | 7.2% |
| 5 | 1811933 | 6.7% |
| 8 | 1795831 | 6.7% |
| f | 1713339 | 6.4% |
| 4 | 1633356 | 6.1% |
| 3 | 1629676 | 6.1% |
| d | 1597050 | 5.9% |
| 9 | 1580769 | 5.9% |
| Other values (15) | 9310585 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 16949878 | |
| Lowercase Letter | 9790872 | |
| Connector Punctuation | 115070 | 0.4% |
| Uppercase Letter | 57535 | 0.2% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 1947697 | |
| f | 1713339 | |
| d | 1597050 | |
| a | 1574649 | |
| c | 1504566 | |
| b | 1453491 | |
| l | 28 | < 0.1% |
| w | 14 | < 0.1% |
| i | 14 | < 0.1% |
| p | 8 | < 0.1% |
| Other values (2) | 16 | < 0.1% |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 1956238 | |
| 1 | 1936881 | |
| 5 | 1811933 | |
| 8 | 1795831 | |
| 4 | 1633356 | |
| 3 | 1629676 | |
| 9 | 1580769 | |
| 7 | 1553021 | |
| 6 | 1528513 | |
| 2 | 1523660 |
Uppercase Letter
| Value | Count | Frequency (%) |
| P | 57513 | |
| Q | 22 | < 0.1% |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 115070 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 17064948 | |
| Latin | 9848407 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 1947697 | |
| f | 1713339 | |
| d | 1597050 | |
| a | 1574649 | |
| c | 1504566 | |
| b | 1453491 | |
| P | 57513 | 0.6% |
| l | 28 | < 0.1% |
| Q | 22 | < 0.1% |
| w | 14 | < 0.1% |
| Other values (4) | 38 | < 0.1% |
Common
| Value | Count | Frequency (%) |
| 0 | 1956238 | |
| 1 | 1936881 | |
| 5 | 1811933 | |
| 8 | 1795831 | |
| 4 | 1633356 | |
| 3 | 1629676 | |
| 9 | 1580769 | |
| 7 | 1553021 | |
| 6 | 1528513 | |
| 2 | 1523660 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 26913355 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 1956238 | 7.3% |
| e | 1947697 | 7.2% |
| 1 | 1936881 | 7.2% |
| 5 | 1811933 | 6.7% |
| 8 | 1795831 | 6.7% |
| f | 1713339 | 6.4% |
| 4 | 1633356 | 6.1% |
| 3 | 1629676 | 6.1% |
| d | 1597050 | 5.9% |
| 9 | 1580769 | 5.9% |
| Other values (15) | 9310585 |
num_group1
Real number (ℝ)
ZEROS 
| Distinct | 121 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.893117411 |
| Minimum | 0 |
|---|---|
| Maximum | 120 |
| Zeros | 482265 |
| Zeros (%) | 14.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 25.5 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 3 |
| Q3 | 5 |
| 95-th percentile | 11 |
| Maximum | 120 |
| Range | 120 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 3.736778505 |
|---|---|
| Coefficient of variation (CV) | 0.9598422319 |
| Kurtosis | 28.23892001 |
| Mean | 3.893117411 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 2.791610254 |
| Sum | 13017806 |
| Variance | 13.9635136 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 482265 | |
| 1 | 463176 | |
| 2 | 447333 | |
| 3 | 428710 | |
| 4 | 407285 | |
| 5 | 356851 | |
| 6 | 177060 | 5.3% |
| 7 | 130208 | 3.9% |
| 8 | 105525 | 3.2% |
| 9 | 87921 | 2.6% |
| Other values (111) | 257466 |
| Value | Count | Frequency (%) |
| 0 | 482265 | |
| 1 | 463176 | |
| 2 | 447333 | |
| 3 | 428710 | |
| 4 | 407285 |
| Value | Count | Frequency (%) |
| 120 | 2 | |
| 119 | 2 | |
| 118 | 2 | |
| 117 | 2 | |
| 116 | 2 |
pmtamount_36A
Real number (ℝ)
| Distinct | 572321 |
|---|---|
| Distinct (%) | 17.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2260.537387 |
| Minimum | 0 |
|---|---|
| Maximum | 87115.6 |
| Zeros | 28 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 25.5 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 200 |
| Q1 | 745.46 |
| median | 1365.454 |
| Q3 | 2632.2 |
| 95-th percentile | 6931.88547 |
| Maximum | 87115.6 |
| Range | 87115.6 |
| Interquartile range (IQR) | 1886.74 |
Descriptive statistics
| Standard deviation | 3161.294121 |
|---|---|
| Coefficient of variation (CV) | 1.398470177 |
| Kurtosis | 46.27256428 |
| Mean | 2260.537387 |
| Median Absolute Deviation (MAD) | 799.454 |
| Skewness | 5.472864184 |
| Sum | 7558784916 |
| Variance | 9993780.517 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 850 | 156210 | 4.7% |
| 1000 | 66534 | 2.0% |
| 600 | 37836 | 1.1% |
| 565.60004 | 35777 | 1.1% |
| 2000 | 30371 | 0.9% |
| 1200 | 27687 | 0.8% |
| 900 | 23582 | 0.7% |
| 1400 | 18448 | 0.6% |
| 800 | 17254 | 0.5% |
| 1600 | 16166 | 0.5% |
| Other values (572311) | 2913935 |
| Value | Count | Frequency (%) |
| 0 | 28 | |
| 0.002 | 33 | |
| 0.004 | 7 | < 0.1% |
| 0.006 | 1 | < 0.1% |
| 0.008 | 6 | < 0.1% |
| Value | Count | Frequency (%) |
| 87115.6 | 1 | < 0.1% |
| 68760.805 | 1 | < 0.1% |
| 43134.2 | 1 | < 0.1% |
| 42500 | 1670 | |
| 42499.8 | 2 | < 0.1% |
| Distinct | 325 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 25.5 MiB |
Length
| Max length | 10 |
|---|---|
| Median length | 10 |
| Mean length | 10 |
| Min length | 10 |
Characters and Unicode
| Total characters | 33438000 |
|---|---|
| Distinct characters | 11 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | 2018-08-08 |
|---|---|
| 2nd row | 2018-11-28 |
| 3rd row | 2018-09-10 |
| 4th row | 2019-01-04 |
| 5th row | 2018-10-08 |
| Value | Count | Frequency (%) |
| 2019-04-02 | 43358 | 1.3% |
| 2019-04-03 | 41765 | 1.2% |
| 2019-03-08 | 41188 | 1.2% |
| 2019-03-07 | 39847 | 1.2% |
| 2019-01-07 | 39016 | 1.2% |
| 2019-03-11 | 37690 | 1.1% |
| 2019-04-09 | 36223 | 1.1% |
| 2019-02-06 | 33320 | 1.0% |
| 2019-01-04 | 33155 | 1.0% |
| 2019-03-14 | 31469 | 0.9% |
| Other values (315) | 2966769 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 7941380 | |
| - | 6687600 | |
| 1 | 6086606 | |
| 2 | 5081042 | |
| 9 | 3294466 | |
| 8 | 1185812 | 3.5% |
| 3 | 829824 | 2.5% |
| 4 | 646209 | 1.9% |
| 5 | 580237 | 1.7% |
| 7 | 561028 | 1.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 26750400 | |
| Dash Punctuation | 6687600 | 20.0% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 7941380 | |
| 1 | 6086606 | |
| 2 | 5081042 | |
| 9 | 3294466 | |
| 8 | 1185812 | 4.4% |
| 3 | 829824 | 3.1% |
| 4 | 646209 | 2.4% |
| 5 | 580237 | 2.2% |
| 7 | 561028 | 2.1% |
| 6 | 543796 | 2.0% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 6687600 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 33438000 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 7941380 | |
| - | 6687600 | |
| 1 | 6086606 | |
| 2 | 5081042 | |
| 9 | 3294466 | |
| 8 | 1185812 | 3.5% |
| 3 | 829824 | 2.5% |
| 4 | 646209 | 1.9% |
| 5 | 580237 | 1.7% |
| 7 | 561028 | 1.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 33438000 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 7941380 | |
| - | 6687600 | |
| 1 | 6086606 | |
| 2 | 5081042 | |
| 9 | 3294466 | |
| 8 | 1185812 | 3.5% |
| 3 | 829824 | 2.5% |
| 4 | 646209 | 1.9% |
| 5 | 580237 | 1.7% |
| 7 | 561028 | 1.7% |