Dataset statistics
| Number of variables | 5 |
|---|---|
| Number of observations | 1107933 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Total size in memory | 42.3 MiB |
| Average record size in memory | 40.0 B |
Variable types
| Numeric | 3 |
|---|---|
| Text | 2 |
num_group1 has 150732 (13.6%) zeros | Zeros |
Reproduction
| Analysis started | 2024-02-13 19:58:08.823892 |
|---|---|
| Analysis finished | 2024-02-13 19:58:11.071055 |
| Duration | 2.25 seconds |
| Software version | ydata-profiling vv4.6.4 |
| Download configuration | config.json |
case_id
Real number (ℝ)
| Distinct | 150732 |
|---|---|
| Distinct (%) | 13.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1469876.122 |
| Minimum | 49435 |
|---|---|
| Maximum | 2703452 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.5 MiB |
Quantile statistics
| Minimum | 49435 |
|---|---|
| 5-th percentile | 229480 |
| Q1 | 997668 |
| median | 1854645 |
| Q3 | 1907416 |
| 95-th percentile | 2686146 |
| Maximum | 2703452 |
| Range | 2654017 |
| Interquartile range (IQR) | 909748 |
Descriptive statistics
| Standard deviation | 705344.7771 |
|---|---|
| Coefficient of variation (CV) | 0.4798668178 |
| Kurtosis | -0.6728545046 |
| Mean | 1469876.122 |
| Median Absolute Deviation (MAD) | 84248 |
| Skewness | -0.5004240807 |
| Sum | 1.628524261 × 1012 |
| Variance | 4.975112545 × 1011 |
| Monotonicity | Increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1869915 | 101 | < 0.1% |
| 2681835 | 83 | < 0.1% |
| 1917900 | 70 | < 0.1% |
| 229270 | 65 | < 0.1% |
| 1863796 | 64 | < 0.1% |
| 1009026 | 60 | < 0.1% |
| 1861451 | 59 | < 0.1% |
| 1853166 | 58 | < 0.1% |
| 990891 | 56 | < 0.1% |
| 242302 | 54 | < 0.1% |
| Other values (150722) | 1107263 |
| Value | Count | Frequency (%) |
| 49435 | 11 | |
| 49490 | 6 | |
| 49526 | 2 | < 0.1% |
| 49563 | 11 | |
| 49576 | 11 |
| Value | Count | Frequency (%) |
| 2703452 | 6 | |
| 2703449 | 8 | |
| 2703448 | 6 | |
| 2703445 | 5 | |
| 2703443 | 6 |
amount_4917619A
Real number (ℝ)
| Distinct | 191635 |
|---|---|
| Distinct (%) | 17.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 20104.96572 |
| Minimum | 0 |
|---|---|
| Maximum | 344250 |
| Zeros | 32 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.5 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1735 |
| Q1 | 6885 |
| median | 13130.2 |
| Q3 | 24300 |
| 95-th percentile | 58140.5612 |
| Maximum | 344250 |
| Range | 344250 |
| Interquartile range (IQR) | 17415 |
Descriptive statistics
| Standard deviation | 25201.74513 |
|---|---|
| Coefficient of variation (CV) | 1.253508485 |
| Kurtosis | 43.42465887 |
| Mean | 20104.96572 |
| Median Absolute Deviation (MAD) | 7124.3997 |
| Skewness | 5.169141853 |
| Sum | 2.227495499 × 1010 |
| Variance | 635127957.8 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 6885 | 57533 | 5.2% |
| 8100 | 16295 | 1.5% |
| 644.2 | 9243 | 0.8% |
| 16200 | 8442 | 0.8% |
| 9720 | 7070 | 0.6% |
| 7290 | 5882 | 0.5% |
| 1288.4 | 5426 | 0.5% |
| 11340 | 4669 | 0.4% |
| 12960 | 4209 | 0.4% |
| 24300 | 3908 | 0.4% |
| Other values (191625) | 985256 |
| Value | Count | Frequency (%) |
| 0 | 32 | < 0.1% |
| 0.2 | 23 | < 0.1% |
| 0.4 | 23 | < 0.1% |
| 0.6 | 8 | < 0.1% |
| 0.8 | 88 |
| Value | Count | Frequency (%) |
| 344250 | 755 | |
| 344248.4 | 3 | < 0.1% |
| 344162 | 1 | < 0.1% |
| 344036.22 | 1 | < 0.1% |
| 344018.4 | 1 | < 0.1% |
| Distinct | 260 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 8.5 MiB |
Length
| Max length | 10 |
|---|---|
| Median length | 10 |
| Mean length | 10 |
| Min length | 10 |
Characters and Unicode
| Total characters | 11079330 |
|---|---|
| Distinct characters | 11 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2019-10-16 |
|---|---|
| 2nd row | 2019-10-16 |
| 3rd row | 2019-10-16 |
| 4th row | 2019-10-16 |
| 5th row | 2019-10-16 |
| Value | Count | Frequency (%) |
| 2020-04-03 | 20426 | 1.8% |
| 2020-04-09 | 19943 | 1.8% |
| 2020-05-06 | 16006 | 1.4% |
| 2020-06-04 | 15974 | 1.4% |
| 2020-06-05 | 15145 | 1.4% |
| 2020-06-08 | 14735 | 1.3% |
| 2020-05-05 | 14713 | 1.3% |
| 2020-04-02 | 14533 | 1.3% |
| 2020-05-08 | 13750 | 1.2% |
| 2020-05-11 | 13408 | 1.2% |
| Other values (250) | 949300 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 3855766 | |
| 2 | 2612860 | |
| - | 2215866 | |
| 1 | 618372 | 5.6% |
| 3 | 321552 | 2.9% |
| 4 | 281439 | 2.5% |
| 6 | 270104 | 2.4% |
| 5 | 254847 | 2.3% |
| 7 | 241968 | 2.2% |
| 9 | 214568 | 1.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 8863464 | |
| Dash Punctuation | 2215866 | 20.0% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 3855766 | |
| 2 | 2612860 | |
| 1 | 618372 | 7.0% |
| 3 | 321552 | 3.6% |
| 4 | 281439 | 3.2% |
| 6 | 270104 | 3.0% |
| 5 | 254847 | 2.9% |
| 7 | 241968 | 2.7% |
| 9 | 214568 | 2.4% |
| 8 | 191988 | 2.2% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 2215866 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 11079330 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 3855766 | |
| 2 | 2612860 | |
| - | 2215866 | |
| 1 | 618372 | 5.6% |
| 3 | 321552 | 2.9% |
| 4 | 281439 | 2.5% |
| 6 | 270104 | 2.4% |
| 5 | 254847 | 2.3% |
| 7 | 241968 | 2.2% |
| 9 | 214568 | 1.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 11079330 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 3855766 | |
| 2 | 2612860 | |
| - | 2215866 | |
| 1 | 618372 | 5.6% |
| 3 | 321552 | 2.9% |
| 4 | 281439 | 2.5% |
| 6 | 270104 | 2.4% |
| 5 | 254847 | 2.3% |
| 7 | 241968 | 2.2% |
| 9 | 214568 | 1.9% |
name_4917606M
Text
| Distinct | 55857 |
|---|---|
| Distinct (%) | 5.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 8.5 MiB |
Length
| Max length | 12 |
|---|---|
| Median length | 8 |
| Mean length | 8.056925825 |
| Min length | 8 |
Characters and Unicode
| Total characters | 8926534 |
|---|---|
| Distinct characters | 18 |
| Distinct categories | 4 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 6154 ? |
|---|---|
| Unique (%) | 0.6% |
Sample
| 1st row | 6b730375 |
|---|---|
| 2nd row | 6b730375 |
| 3rd row | 6b730375 |
| 4th row | 6b730375 |
| 5th row | 6b730375 |
| Value | Count | Frequency (%) |
| 5e180ef0 | 85284 | 7.7% |
| p114_118_163 | 7205 | 0.7% |
| 74ca9587 | 7153 | 0.6% |
| 7444479d | 5173 | 0.5% |
| 3613fb71 | 4079 | 0.4% |
| a409d8fa | 3529 | 0.3% |
| 36a9355c | 3499 | 0.3% |
| e304888c | 3465 | 0.3% |
| cda1fd10 | 3278 | 0.3% |
| c75d2f47 | 3203 | 0.3% |
| Other values (55847) | 982065 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 675640 | 7.6% |
| 0 | 673508 | 7.5% |
| e | 664650 | 7.4% |
| 5 | 605680 | 6.8% |
| 8 | 599952 | 6.7% |
| f | 562940 | 6.3% |
| 7 | 527523 | 5.9% |
| d | 524884 | 5.9% |
| 4 | 523107 | 5.9% |
| 3 | 518634 | 5.8% |
| Other values (8) | 3050016 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 5632463 | |
| Lowercase Letter | 3232751 | |
| Connector Punctuation | 40880 | 0.5% |
| Uppercase Letter | 20440 | 0.2% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 675640 | |
| 0 | 673508 | |
| 5 | 605680 | |
| 8 | 599952 | |
| 7 | 527523 | |
| 4 | 523107 | |
| 3 | 518634 | |
| 9 | 512812 | |
| 6 | 502308 | |
| 2 | 493299 |
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 664650 | |
| f | 562940 | |
| d | 524884 | |
| c | 499668 | |
| a | 491977 | |
| b | 488632 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 40880 |
Uppercase Letter
| Value | Count | Frequency (%) |
| P | 20440 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 5673343 | |
| Latin | 3253191 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 675640 | |
| 0 | 673508 | |
| 5 | 605680 | |
| 8 | 599952 | |
| 7 | 527523 | |
| 4 | 523107 | |
| 3 | 518634 | |
| 9 | 512812 | |
| 6 | 502308 | |
| 2 | 493299 |
Latin
| Value | Count | Frequency (%) |
| e | 664650 | |
| f | 562940 | |
| d | 524884 | |
| c | 499668 | |
| a | 491977 | |
| b | 488632 | |
| P | 20440 | 0.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 8926534 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 675640 | 7.6% |
| 0 | 673508 | 7.5% |
| e | 664650 | 7.4% |
| 5 | 605680 | 6.8% |
| 8 | 599952 | 6.7% |
| f | 562940 | 6.3% |
| 7 | 527523 | 5.9% |
| d | 524884 | 5.9% |
| 4 | 523107 | 5.9% |
| 3 | 518634 | 5.8% |
| Other values (8) | 3050016 |
num_group1
Real number (ℝ)
ZEROS 
| Distinct | 101 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.144719942 |
| Minimum | 0 |
|---|---|
| Maximum | 100 |
| Zeros | 150732 |
| Zeros (%) | 13.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 8.5 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 3 |
| Q3 | 5 |
| 95-th percentile | 12 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 4.108048495 |
|---|---|
| Coefficient of variation (CV) | 0.9911522495 |
| Kurtosis | 16.79635319 |
| Mean | 4.144719942 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 2.609646326 |
| Sum | 4592072 |
| Variance | 16.87606243 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 150732 | |
| 1 | 149210 | |
| 2 | 146224 | |
| 3 | 141814 | |
| 4 | 135063 | |
| 5 | 116602 | |
| 6 | 59418 | 5.4% |
| 7 | 42568 | 3.8% |
| 8 | 34229 | 3.1% |
| 9 | 28762 | 2.6% |
| Other values (91) | 103311 |
| Value | Count | Frequency (%) |
| 0 | 150732 | |
| 1 | 149210 | |
| 2 | 146224 | |
| 3 | 141814 | |
| 4 | 135063 |
| Value | Count | Frequency (%) |
| 100 | 1 | |
| 99 | 1 | |
| 98 | 1 | |
| 97 | 1 | |
| 96 | 1 |