9

This has always bothered me. It seems that string functions in SQL based servers always start at position 1 (at least this is the case with MySQL, SQL Server, Oracle and Postgres). For example the following query would be used to select the first letter of a column called first_name in the names database:

SELECT SUBSTRING(first_name,1,1) FROM names;

Why does the position for string functions not start at 0 as is the norm in nearly every programming language?

I'm looking for more than just this is the ANSI standard. Why is the standard?

EDIT: Okay, so 0 is not the "norm in nearly every programming language" as has been pointed out below. 1 is also used.

VKK
  • 203
  • 1
  • 2
  • 4
  • 1
    I find zero-based indexing extremely strange. After all this is how I count in real life. A word starts with the first character, not the "zeroed" character. – a_horse_with_no_name Jan 12 '16 at 21:53

1 Answers1

9

Considering that there is no zeroeth position in a string outside of computers, shouldn't the question really be: why are strings 0-based in some of the more common programming languages? (I am not sure about the statement of "nearly every programming language" since there are many more languages than most people are aware of)

Strings in C and other languages are simply an array of characters (i.e. char[]) that are null-terminated. This is why you can reference individual characters by using the index notation (i.e. stringVariable[index]). Variables are an address to a location in memory. The index is the offset to the starting address of the array. So when thinking in terms of strings being an array, it makes sense enough to interact with them in a 0-based fashion because it is at least consistent even if sometimes a little awkward.

Why is this different in SQL? I would guess that it has to do with SQL being more about physical storage than memory allocation. While some RDBMS's do support arrays (such as PostgreSQL), that is not standard. SQL is also a high-level declarative language that hides the operational specifics of what the query engine is really doing, and so the concepts of addresses and pointers just isn't there. Hence it doesn't really make sense to think in terms of 0-based indexes when working with SQL.

As another poster notes, the source of zero-based indexing is addressing. The first address in any data block ends in zero (whether it occupies the last digit in physical memory or not). And it's not just computers - the address of the first house on your neighborhood block likely is a number like 300 - not 301.

When programming iterative functions where moduli are used (to make something happen every 5 iterations, etc.) it's handy - and faster - to work with zero-based arrays.

Please also see:

Solomon Rutzky
  • 65,256
  • 6
  • 135
  • 267
  • 1
    Thanks srutzky. I think, based on the languages I'm familiar with (which is not "nearly every programming language") I assumed, incorrectly, that 0 was the norm. I never really questioned that... but you're right. Why do some languages start at 0 is a fair question? ... and the point about arrays makes sense. In many ways 1 is more intuitive. – VKK Jan 12 '16 at 20:30
  • Nonsense, if majority of programming languages have adopted 0 as the start of indices, then SQL has absolutely no right to throw this convention out of the window. – metabuddy Oct 25 '18 at 13:33
  • @metabuddy Of course SQL, and any other language, has every right to approach anything in any way that seems best for that particular world-view. There are _a lot_ of programming languages, with a lot of variation between them on syntax, variable scope, etc. Perhaps it was a poor choice that any language settled on 0-based indices in the first place (it isn't very intuitive). – Solomon Rutzky Oct 25 '18 at 14:35