Questions tagged [database]

A comprehensive collection of related data organized for convenient access, generally associated with software to update and query the data.

From Wikipedia:

A database is an organized collection of data. The data is typically organized to model relevant aspects of reality (for example, the availability of rooms in hotels), in a way that supports processes requiring this information (for example, finding a hotel with vacancies).

A large proportion of websites and applications rely on databases. They are a crucial component of telecommunications systems, banking systems, video games, and just about any other software system or electronic device that maintains some amount of persistent information. In addition to persistence, database systems provide a number of other properties that make them exceptionally useful and convenient: reliability, efficiency, scalability, concurrency control, data abstraction, and high-level query languages. Databases are so ubiquitous and important that computer science graduates frequently cite their database class as the one most useful to them in their industry or graduate-school careers.2

The term database should not be confused with Database Management System (DBMS). A DBMS is the system software used to create and manage databases and provide users and applications with access to the database(s). A database is to a DBMS as a document is to a word processor.

Some useful references:

Reference: adapted from Stack Overflow SE.

Excerpt reference: adapted from dictionary.com and GIS SE.

38 questions
18
votes
2 answers

Quality assurance and quality control (QA/QC) guidelines for a database

Background I am overseeing the input of data from primary literature into a database. The data entry process is error prone, particularly because users must interpret experimental design, extract data from graphics and tables, and transform results…
David LeBauer
  • 7,060
  • 6
  • 44
  • 89
5
votes
3 answers

Suggestions on how to merge multiple datasets with an imperfect ID across databases?

I have four databases of books that I have assembled from various sources, websites, etc. I would like to merge the databases, but I face a significant merging issue in that there is no "perfect" match ID among the databases. Each database has the…
user12798
  • 53
  • 1
  • 4
5
votes
2 answers

How to make an effective sampling from a database of text documents?

Problem: I want to know methods to perform an effective sampling from a database. The size of the database is about 250K text documents and in this case each text document is related to some majors (Electrical Engineering, Medicine and so on). So…
4
votes
2 answers

Public databases of learned HMM models for NLP

I understand that HMM models model language with Parts of Speech (POS) as hidden states and words as observations. These HMM models are usually learned from large text corpora, and many of these corpora are publicly available. Where can I find such…
4
votes
1 answer

What is the purpose of using a Laplacian distribution in adding noise for Differential Privacy?

I am reading up on Differential Privacy and it is mentioned that the technique relies on adding some controlled noise to the release of responses to queries towards a statistical database. This is done so as to preserve the privacy of the owners…
3
votes
0 answers

A non-technical visualization of my database (a layman's ER Diagram?)

I would like to provide an intuitive visualization of my database for readers and users of the database that does not require technical knowledge of database structure. The figure will be included in a journal article, and the audience is primarily…
Abe
  • 3,561
  • 7
  • 27
  • 45
3
votes
2 answers

Organizing cluster analysis results in a database

I'm a newbie in cluster analysis so please excuse me if my question seems to be very basic. I'm using SPSS and Matlab for performing cluster analysis in a variety of datasets. Dendograms are great for visualising the results. However, they are not…
Diego
  • 415
  • 4
  • 11
3
votes
0 answers

Which duration model to use with a fully right censored database

Currently I'm examining the duration of residence of households. I have a database at my disposal that indicates how long a specific household resides in its current home. I want to explain their duration of residence by the household…
EQuaker
  • 31
  • 2
3
votes
2 answers

Statistical analysis of relational database: is it possible and how?

I have been struggling with flat file databases and corresponding statistical packages for almost 20 years now (from Excel to SPSS, then Stata, and currently R). However, I have always had to convert complex and multidimensional relational databases…
Giuseppe Biondi-Zoccai
  • 2,244
  • 3
  • 19
  • 48
2
votes
0 answers

Probability: Estimating Database Size with N Smallest Random Values

Note: This was first posted on StackOverflow, but I did not have much luck there. I have a very large database. Each item in the database is assigned a uniformly distributed random value $\geq 0$ and $<1$. This database is so large that performing a…
speedplane
  • 121
  • 3
2
votes
1 answer

Why does "sticky noise" defy averaging attack?

I have read an interesting paper (pdf) describing how a privacy preserving technique might be breached, but I am having trouble understanding the following paragraph describing one of several layers of noise added to an observation. Let C be a…
Omry Atia
  • 597
  • 3
  • 11
2
votes
1 answer

Is Outlier detection in two separate databases is equal to one combined database?

Suppose that we have two databases : Database_1 and Database_2 . Database_1 has 300 samples and Database_2 has 700 samples. Database_all is combination of two databases. Is finding outliers using abs(X-mean(X))>=1.9*std(X) in Database_all is equal…
user2991243
  • 3,621
  • 4
  • 22
  • 48
1
vote
1 answer

What is a nonreductive database?

A database like Genbank is said to be a nonreductive database. What does that mean ?
lalal
  • 11
  • 1
1
vote
1 answer

Statistical package that works with sql database

At my firm we have been calculating fundamental stock factors each week for over a decade. We have the performance of each of these factors. I'd like to enhance the statistical analysis of these factors and improve the multi-factor optimization. …
1
vote
2 answers

Mann-Whitney U test on "histogram-compressed" data?

I want to perform a very large number of Mann-Whitney U-tests between many groups from data that I will get in from a database. I would prefer to use a pre-implemented version of this test. I think that I have three options: Perform the U-test on…
1
2 3