Single SQL query for case-control matching

Question

I am running a case-control study for which I wish to choose 5 controls for each case, stratifying by age, sex, and date of measurement. Each case or control has a unique serial number and the controls will be given a stratum_id that matches the case. My Python routine is approximately as follows:

control_cases = {}
for stratum_id, case in case_dict.items():
    query = (
        "SELECT serial_number "
        "FROM foo_db "
        "WHERE AGE = {} "
        "AND SEX = {} "
        "AND DATE = {} "
        "AND SERIAL_NUMBER NOT LIKE {} "
        "LIMIT 5;"
    ).format(case['age'], case['sex'], case['date'], case['serial_number']
    records = run_query(query)
    control_cases[stratum_id] = records

The problem is, I have hundreds of cases, and the SQL server seems to be highly throttled and doesn't respond well to hundreds of queries. I would like to somehow loop through all the case records, matching each with 5 controls, within a single SQL query. Right now I work around this by exporting them as strings and creating a single meta-string (QUERY_1) UNION ALL (QUERY_2) ... UNION ALL (QUERY_N). It gets the job done. But this is surely not very good SQL.

Why not just do `SELECT serial_number, age, sex, date FROM foo_db` and introduce a relatively simple `WHERE` clause, so that you get all the data you need in one query? Then filter down in Python? — Adrian Keister, Sep 30 '20 at 22:46
Thanks, the DB is much too big. If I selected all the records from all the dates I need to match on, to filter down later, that is many millions of records. I need that limit 5 for each record. — barnhillec, Oct 01 '20 at 18:12

Single SQL query for case-control matching

0 Answers0