You're not just "reading sequentially 0.5MB", you're asking Cassandra to turn it into rows, filter out tombstones (deleted rows), and turn it into a resultset. 0.04ms per row is pretty reasonable; my rule of thumb is 0.5ms per 10 rows for an entire query.
Remember that Cassandra optimizes for short requests suitable for online applications; 10 to 100 row resultsets are typical. There is no parallelization within a single query.