Questions tagged [sql]

SQL (Structured Query Language) is a special-purpose programming language designed for managing data held in a relational database management system (RDBMS).

SQL (Structured Query Language) is a special-purpose programming language designed for managing data held in a relational database management system (RDBMS). SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of the International Organization for Standards (ISO) in 1987. [Wikipedia]

37 questions
18
votes
2 answers

Calculating the 95th percentile: Comparing normal distribution, R Quantile, and Excel approaches

I was trying to compute the 95th percentile on the following dataset. I came across a few online references of doing it. Approach 1: Based on sample data The first one tells me to obtain the TOP 95 Percent of the dataset and then choose the MIN or…
Legend
  • 4,232
  • 7
  • 37
  • 50
7
votes
8 answers

How do I group a list of numeric values into ranges?

I do have a big list of numeric values (including duplicates) and I do want to group them into ranges in order to see if how do they distribute. Let's say there are 1000 values ranging from 0 to 2.000.000 and I do want to group them. How can I…
sorin
  • 181
  • 1
  • 2
  • 5
4
votes
1 answer

Electoral college simulation using SQL Server

As I'm sure will be apparent to most here, I'm not a statistician or a programmer, but one of my hobbies is politics. I've created a (very simple) USA electoral college simulation, seen below. The idea is that if you think you have a good estimate…
user269144
  • 41
  • 1
4
votes
2 answers

Conducting correlation and one way ANOVA using data from a PostgreSQL database

I am trying to discern the best way to calculate a correlation and perform a one-way ANOVA on data I am taking from a PostgreSQL database. What tools should I use? Can I do this using the SQL language itself? Is there an easy way to export the…
Spencer
  • 221
  • 3
  • 11
2
votes
2 answers

Split dataset randomly

I have a database with 500 records. I want to split these records to 75% and 25% *randomly*in order to use the different datasets for training and testing to machine learning algorithms. Does anyone knows how to do that? For example using an sql…
user21849
2
votes
1 answer

Machine Learning with Aggregated Frequency Data as Training

I am trying to build a Deep Learning model in which I have the following structure user feature binary_label 1 100 0 2 200 1 3 140 0 ... ... ... 6000000 188 1 But the problem is that when I try to use all data I am running out of…
2
votes
2 answers

Does anybody use star-schema databases to collect and organize their data?

I've been reading about the star schema (or dimensional) database structure, which puts all measurements main in 'facts' table, and all context for those measurements in 'dimension' table linked to the facts table (I'm doing a horrible job…
biofreezer
  • 255
  • 4
  • 11
2
votes
0 answers

Calculate value needed to reach target by the end of a year

I have a set of values (scores to a particular KPI), these are reported monthly. I need to calculate what I'd have to score every month from now until the end of the year to reach that target. I'm more interested in the algorithm/maths involved in…
2
votes
0 answers

How can I (numerically) approximate the quantile in a beta distribution in SQL?

I wrote some code in SAS that among other things, used the BETAINV function (or BETA.INV as it's called in Microsoft Excel) to calculate the quantile in a beta distribution corresponding to a random input value. Now I'm looking for a way to…
1
vote
1 answer

When should I update a recommendation engine?

[I asked this on StackOverflow and was told it would be a better fit here] I am including a basic recommendation engine in a very small project for my final exam. I understand the code and the math but I am not too certain as to when I should update…
The_Cthulhu_Kid
  • 253
  • 5
  • 12
1
vote
1 answer

Approximate binomial dist in sql

What is the best way of approximating a binomial distribution, given I have the following functions available: normal_cdf beta_cdf see here for full list. presto-docs Is there any other good way to approximate a binomial distribution, given I am…
1
vote
0 answers

Single SQL query for case-control matching

I am running a case-control study for which I wish to choose 5 controls for each case, stratifying by age, sex, and date of measurement. Each case or control has a unique serial number and the controls will be given a stratum_id that matches the…
barnhillec
  • 147
  • 4
1
vote
0 answers

How do we create a query in MS Access that gives us the cumulative difference?

I tried using this query: Running Total: CDbl(DSum("[Cost_Replacement]","Cost of Replacement","[RR_Pipe_ID]<=" & [RR_Pipe_ID] & "")) This above query provides the sum not the difference. My target is to calculate the cumulative difference of the…
1
vote
1 answer

Newbie Data Analyst Questions

I have recently applied to a Job by a startup that wants to higher a data analyst. I have been going through the rounds and it seems like they want to hire me for the position. I have recently graduated with a bachelor in statistics and have not yet…
Nick
  • 23
  • 4
1
vote
1 answer

Weekly sales for different start dates automate report

I’m trying to develop a report with customer sales behaviour each time a new product is Launched. The columns in the report would be : sales in time t (the time product was launched ) , sales in time t-1 (week before) , sales in time (t-2) (2 weeks…
Mafalda
  • 11
  • 1
1
2 3