Overview

Dataset statistics

Number of variables5
Number of observations3275770
Missing cells0
Missing cells (%)0.0%
Total size in memory125.0 MiB
Average record size in memory40.0 B

Variable types

Numeric3
Text2

Alerts

num_group1 has 457934 (14.0%) zerosZeros

Reproduction

Analysis started2024-02-13 19:57:58.772372
Analysis finished2024-02-13 19:58:04.874062
Duration6.1 seconds
Software versionydata-profiling vv4.6.4
Download configurationconfig.json

Variables

case_id
Real number (ℝ)

Distinct457934
Distinct (%)14.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1341396.653
Minimum28631
Maximum2702290
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size25.0 MiB
2024-02-13T20:58:05.002028image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum28631
5-th percentile173231
Q1877291
median1571545
Q31728466
95-th percentile2639631
Maximum2702290
Range2673659
Interquartile range (IQR)851175

Descriptive statistics

Standard deviation649263.4513
Coefficient of variation (CV)0.4840204796
Kurtosis-0.2658752081
Mean1341396.653
Median Absolute Deviation (MAD)256657
Skewness-0.1169124108
Sum4.394106914 × 1012
Variance4.215430292 × 1011
MonotonicityIncreasing
2024-02-13T20:58:05.198023image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1709323 99
 
< 0.1%
1748469 91
 
< 0.1%
1609436 76
 
< 0.1%
1604411 68
 
< 0.1%
949395 68
 
< 0.1%
879768 67
 
< 0.1%
834432 66
 
< 0.1%
184250 64
 
< 0.1%
161770 63
 
< 0.1%
1654934 63
 
< 0.1%
Other values (457924) 3275045
> 99.9%
ValueCountFrequency (%)
28631 4
 
< 0.1%
28632 10
< 0.1%
28633 1
 
< 0.1%
28635 2
 
< 0.1%
28636 7
< 0.1%
ValueCountFrequency (%)
2702290 3
 
< 0.1%
2701515 9
< 0.1%
2701074 6
< 0.1%
2700297 1
 
< 0.1%
2697303 1
 
< 0.1%

amount_4527230A
Real number (ℝ)

Distinct92743
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2360.421376
Minimum0
Maximum87115.6
Zeros165
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size25.0 MiB
2024-02-13T20:58:05.373024image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile215
Q1850
median1400
Q32778
95-th percentile7240.6
Maximum87115.6
Range87115.6
Interquartile range (IQR)1928

Descriptive statistics

Standard deviation3254.871236
Coefficient of variation (CV)1.378936519
Kurtosis44.23407366
Mean2360.421376
Median Absolute Deviation (MAD)813
Skewness5.343380447
Sum7732197530
Variance10594186.76
MonotonicityNot monotonic
2024-02-13T20:58:05.587682image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
850 261068
 
8.0%
1000 72903
 
2.2%
2000 31955
 
1.0%
900 28218
 
0.9%
1200 27040
 
0.8%
1400 18459
 
0.6%
499.4 18202
 
0.6%
1600 16308
 
0.5%
3000 15206
 
0.5%
860 12183
 
0.4%
Other values (92733) 2774228
84.7%
ValueCountFrequency (%)
0 165
< 0.1%
0.2 343
< 0.1%
0.4 91
 
< 0.1%
0.6 65
 
< 0.1%
0.8 83
 
< 0.1%
ValueCountFrequency (%)
87115.6 1
 
< 0.1%
68760.805 1
 
< 0.1%
43134.2 1
 
< 0.1%
42500 2791
0.1%
42499.8 5
 
< 0.1%
Distinct147037
Distinct (%)4.5%
Missing0
Missing (%)0.0%
Memory size25.0 MiB
2024-02-13T20:58:06.079060image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length12
Median length8
Mean length8.055986226
Min length8

Characters and Unicode

Total characters26389558
Distinct characters22
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18639 ?
Unique (%)0.6%

Sample

1st rowf980a1ea
2nd rowf980a1ea
3rd rowf980a1ea
4th rowf980a1ea
5th row5f9b74f5
ValueCountFrequency (%)
5e180ef0 204371
 
6.2%
p114_118_163 15751
 
0.5%
74ca9587 13196
 
0.4%
7444479d 12313
 
0.4%
p157_88_183 10025
 
0.3%
a409d8fa 9601
 
0.3%
f10df922 9408
 
0.3%
6bd6aa12 9239
 
0.3%
e304888c 8957
 
0.3%
3d4aae4c 8510
 
0.3%
Other values (147027) 2974399
90.8%
2024-02-13T20:58:06.726034image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 1942604
 
7.4%
0 1912119
 
7.2%
e 1910740
 
7.2%
8 1765395
 
6.7%
5 1755172
 
6.7%
f 1667799
 
6.3%
4 1601087
 
6.1%
3 1582260
 
6.0%
d 1559238
 
5.9%
9 1546882
 
5.9%
Other values (12) 9146262
34.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 16598498
62.9%
Lowercase Letter 9605327
36.4%
Connector Punctuation 123822
 
0.5%
Uppercase Letter 61911
 
0.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 1942604
11.7%
0 1912119
11.5%
8 1765395
10.6%
5 1755172
10.6%
4 1601087
9.6%
3 1582260
9.5%
9 1546882
9.3%
7 1543487
9.3%
6 1477022
8.9%
2 1472470
8.9%
Lowercase Letter
ValueCountFrequency (%)
e 1910740
19.9%
f 1667799
17.4%
d 1559238
16.2%
a 1539813
16.0%
c 1481947
15.4%
b 1445766
15.1%
l 12
 
< 0.1%
w 6
 
< 0.1%
i 6
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
P 61905
> 99.9%
Q 6
 
< 0.1%
Connector Punctuation
ValueCountFrequency (%)
_ 123822
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 16722320
63.4%
Latin 9667238
36.6%

Most frequent character per script

Common
ValueCountFrequency (%)
1 1942604
11.6%
0 1912119
11.4%
8 1765395
10.6%
5 1755172
10.5%
4 1601087
9.6%
3 1582260
9.5%
9 1546882
9.3%
7 1543487
9.2%
6 1477022
8.8%
2 1472470
8.8%
Latin
ValueCountFrequency (%)
e 1910740
19.8%
f 1667799
17.3%
d 1559238
16.1%
a 1539813
15.9%
c 1481947
15.3%
b 1445766
15.0%
P 61905
 
0.6%
l 12
 
< 0.1%
Q 6
 
< 0.1%
w 6
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 26389558
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 1942604
 
7.4%
0 1912119
 
7.2%
e 1910740
 
7.2%
8 1765395
 
6.7%
5 1755172
 
6.7%
f 1667799
 
6.3%
4 1601087
 
6.1%
3 1582260
 
6.0%
d 1559238
 
5.9%
9 1546882
 
5.9%
Other values (12) 9146262
34.7%

num_group1
Real number (ℝ)

ZEROS 

Distinct99
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.095615687
Minimum0
Maximum98
Zeros457934
Zeros (%)14.0%
Negative0
Negative (%)0.0%
Memory size25.0 MiB
2024-02-13T20:58:06.917605image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q35
95-th percentile11
Maximum98
Range98
Interquartile range (IQR)4

Descriptive statistics

Standard deviation3.933318764
Coefficient of variation (CV)0.9603730097
Kurtosis12.89455961
Mean4.095615687
Median Absolute Deviation (MAD)2
Skewness2.292191088
Sum13416295
Variance15.4709965
MonotonicityNot monotonic
2024-02-13T20:58:07.081437image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 457934
14.0%
1 441416
13.5%
2 426953
13.0%
3 409203
12.5%
4 388239
11.9%
5 341022
10.4%
6 174157
 
5.3%
7 133804
 
4.1%
8 112012
 
3.4%
9 95505
 
2.9%
Other values (89) 295525
9.0%
ValueCountFrequency (%)
0 457934
14.0%
1 441416
13.5%
2 426953
13.0%
3 409203
12.5%
4 388239
11.9%
ValueCountFrequency (%)
98 1
< 0.1%
97 1
< 0.1%
96 1
< 0.1%
95 1
< 0.1%
94 1
< 0.1%
Distinct397
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size25.0 MiB
2024-02-13T20:58:07.553419image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters32757700
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row2019-09-13
2nd row2019-09-13
3rd row2019-09-13
4th row2019-09-13
5th row2019-09-13
ValueCountFrequency (%)
2019-12-14 49509
 
1.5%
2019-12-13 47891
 
1.5%
2020-01-11 40515
 
1.2%
2020-01-12 38615
 
1.2%
2019-12-01 33947
 
1.0%
2020-01-13 33623
 
1.0%
2019-12-28 33166
 
1.0%
2019-12-16 32494
 
1.0%
2019-11-30 30798
 
0.9%
2019-11-23 28803
 
0.9%
Other values (387) 2906409
88.7%
2024-02-13T20:58:08.177194image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 8127577
24.8%
2 7116560
21.7%
- 6551540
20.0%
1 5956477
18.2%
9 2391991
 
7.3%
3 889737
 
2.7%
4 510281
 
1.6%
8 317062
 
1.0%
6 316578
 
1.0%
7 309918
 
0.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 26206160
80.0%
Dash Punctuation 6551540
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 8127577
31.0%
2 7116560
27.2%
1 5956477
22.7%
9 2391991
 
9.1%
3 889737
 
3.4%
4 510281
 
1.9%
8 317062
 
1.2%
6 316578
 
1.2%
7 309918
 
1.2%
5 269979
 
1.0%
Dash Punctuation
ValueCountFrequency (%)
- 6551540
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 32757700
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 8127577
24.8%
2 7116560
21.7%
- 6551540
20.0%
1 5956477
18.2%
9 2391991
 
7.3%
3 889737
 
2.7%
4 510281
 
1.6%
8 317062
 
1.0%
6 316578
 
1.0%
7 309918
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 32757700
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 8127577
24.8%
2 7116560
21.7%
- 6551540
20.0%
1 5956477
18.2%
9 2391991
 
7.3%
3 889737
 
2.7%
4 510281
 
1.6%
8 317062
 
1.0%
6 316578
 
1.0%
7 309918
 
0.9%