InvalidRequest: code=2200 [Invalid query] message=”Invalid operator >= for PRIMARY KEY part “

cassandra, cql, cqlsh

I have a following dataset in cassandra :

Table Structure

CREATE TABLE userlog ( term text, ts timestamp, year int, month int, day int, hour int, weekofyear int, dayofyear int, count counter, PRIMARY KEY (term, ts, year,month,day,hour,weekofyear,dayofyear));

.

term             | ts                       | year | month | day | hour | weekofyear | dayofyear | count------------------+--------------------------+------+-------+-----+------+------------+-----------+-------www.datastax.com | 2028-07-13 17:06:28+0530 | 2015 |     7 |  28 |   16 |         31 |       209 |     2www.datastax.com | 2015-07-28 16:17:36+0530 | 2015 |     7 |  28 |   16 |         31 |       209 |     6www.datastax.com | 2015-07-28 16:17:36+0530 | 2015 |     7 |  28 |   16 |         31 |       209 |     2www.datastax.com | 2015-07-28 16:17:36+0530 | 2015 |     7 |  28 |   16 |         31 |       209 |     2www.datastax.com | 2015-07-28 16:21:15+0530 | 2015 |     7 |  28 |   16 |         31 |       209 |     2www.datastax.com | 2015-07-28 16:21:33+0530 | 2015 |     7 |  28 |   16 |         31 |       209 |     2www.datastax.com | 2015-07-28 16:21:50+0530 | 2015 |     7 |  28 |   16 |         31 |       209 |     2www.datastax.com | 2015-07-28 16:21:52+0530 | 2015 |     7 |  28 |   16 |         31 |       209 |     2www.datastax.com | 2015-07-28 16:21:53+0530 | 2015 |     7 |  28 |   16 |         31 |       209 |     2             www | 2015-07-28 16:46:00+0530 | 2015 |     7 |  28 |   16 |         31 |       209 |     2            www. | 2015-07-28 16:47:00+0530 | 2015 |     7 |  28 |   16 |         31 |       209 |     2            www. | 2015-07-28 16:48:00+0530 | 2015 |     7 |  28 |   16 |         31 |       209 |     2            www. | 2015-07-28 16:50:00+0530 | 2015 |     7 |  28 |   16 |         31 |       209 |     2            www. | 2015-07-28 16:55:00+0530 | 2015 |     7 |  28 |   16 |         31 |       209 |     2

When I run this query :

SELECT * FROM userlog  WHERE ts >= '2015-07-28 16:46' AND  ts <= '2015-07-28 16:55' ALLOW FILTERING;

I get correct result :

term | ts                       | year | month | day | hour | weekofyear | dayofyear | count------+--------------------------+------+-------+-----+------+------------+-----------+-------www  | 2015-07-28 16:46:00+0530 | 2015 |     7 |  28 |   16 |         31 |       209 |     2www. | 2015-07-28 16:47:00+0530 | 2015 |     7 |  28 |   16 |         31 |       209 |     2www. | 2015-07-28 16:48:00+0530 | 2015 |     7 |  28 |   16 |         31 |       209 |     2www. | 2015-07-28 16:50:00+0530 | 2015 |     7 |  28 |   16 |         31 |       209 |     2www. | 2015-07-28 16:55:00+0530 | 2015 |     7 |  28 |   16 |         31 |       209 |     2

but when I try to delete the rows with the same conditions

DELETE FROM userlog WHERE ts >= '2015-07-28 16:46' AND  ts <= '2015-07-28 16:55';

It throws following error :

InvalidRequest: code=2200 [Invalid query] message="Invalid operator >= for PRIMARY KEY part ts"

Am i missing something ? How to delete data in the specified time-range ? Also, is there any way to get the data in the specified time range (other than what I am doing?)

Best Solution

The DELETE command does not support range queries or the ALLOW FILTERING clause. From the documentation it only supports the = and IN operators:

You can delete an individual row by fully specifying both the partition and clustering columns.

You can delete a whole partition by just specifying the partition key.

And you can use the IN operator to do a few of these at once.

If you want to selectively delete rows from within a partition, you could first query for them using SELECT and then in your application issue a delete for each row returned from the SELECT.

Usually you wouldn't want to use ALLOW FILTERING on your SELECT statements since that is very inefficient, so normally you would specify the partition key when doing a range query SELECT. You would structure your schema so that the information you need for an operation is in known partitions so that you wouldn't need to do a full table scan to find things.