/ redis  

redis用scan进行数据的迭代遍历

在redis中是支持使用通配符的使用,例如‘?’或是‘
所以我们在获取redis里面的某个db里面的所有数据可以用 `keys
`这样的指令来实现。
但是存在一个问题就是这样做的话,在数据量很大的情况下效率是很不理想的。redis的作者在
一篇博文中也提到这一点。

Your collection is very small? Fine, use SMEMBERS and you get Redis-alike performances anyway. Your collection is big? Don’t use O(N) commands if not for “debugging” purposes. A popular example is the misused KEYS command, source of troubles for non-experts, and top hit among the Redis slow log entries.

因此,一个新的命令SCAN出现了,它可以帮助我们解决因为用keys遍历大数据量的数据库而导致服务器阻塞的情况,因为它每次都只便利一小部分数据,每次操作对应的时间复杂度是O(1)

SCAN 命令是一个基于游标的迭代器(cursor based iterator): SCAN 命令每次被调用之后, 都会向用户返回一个新的游标, 用户在下次迭代时需要使用这个新游标作为 SCAN 命令的游标参数, 以此来延续之前的迭代过程。当 SCAN 命令的游标参数被设置为 0 时, 服务器将开始一次新的迭代, 而当服务器向用户返回值为 0 的游标时, 表示迭代已结束。

一个简单的例子

127.0.0.1:6379> debug populate 33
OK
127.0.0.1:6379> keys *
 1) "key:15"
 2) "key:10"
 3) "key:31"
 4) "key:19"
 5) "key:7"
 6) "key:12"
 7) "key:28"
 8) "key:9"
 9) "key:13"
10) "key:20"
11) "key:4"
12) "key:5"
13) "key:21"
14) "key:25"
15) "key:22"
16) "key:6"
17) "key:11"
18) "key:26"
19) "key:16"
20) "key:0"
21) "key:8"
22) "key:3"
23) "key:29"
24) "key:24"
25) "key:30"
26) "key:27"
27) "key:2"
28) "key:23"
29) "key:17"
30) "key:32"
31) "key:18"
32) "key:1"
33) "key:14"
127.0.0.1:6379> scan 0
1) "42"
2)  1) "key:2"
    2) "key:31"
    3) "key:16"
    4) "key:4"
    5) "key:17"
    6) "key:12"
    7) "key:29"
    8) "key:24"
    9) "key:19"
   10) "key:7"
127.0.0.1:6379> scan 42
1) "25"
2)  1) "key:11"
    2) "key:20"
    3) "key:32"
    4) "key:25"
    5) "key:14"
    6) "key:22"
    7) "key:6"
    8) "key:9"
    9) "key:23"
   10) "key:0"
127.0.0.1:6379> scan 25
1) "7"
2)  1) "key:10"
    2) "key:13"
    3) "key:30"
    4) "key:27"
    5) "key:21"
    6) "key:15"
    7) "key:8"
    8) "key:3"
    9) "key:5"
   10) "key:18"
   11) "key:1"
127.0.0.1:6379> scan 7
1) "0"
2) 1) "key:26"
   2) "key:28"

在使用jedis时,根据scan指令的定义

它每次返回的数据结构包含2个变量,一个是下次遍历对应的游标位置,一个遍历的数据集。
当游标位置为0时表示遍历完所有元素完毕。

所以我们可以这么来写

import com.lin.jedisFactory.JedisPoolUtil;
import redis.clients.jedis.Jedis;
import redis.clients.jedis.ScanResult;

import java.util.ArrayList;
import java.util.List;

/**
 * Created by Kevin on 2015/1/30.
 */
public class ScanTest {
    public static void main(String[] args) {
        Long startTime = System.currentTimeMillis();
        List<String> retList = new ArrayList<String>();
        Jedis jedis = JedisPoolUtil.getJedis();
        String scanRet = "0";
        do {
            ScanResult ret = jedis.scan(scanRet);
            scanRet = ret.getStringCursor();
            retList.addAll(ret.getResult());
        } while (!scanRet.equals("0"));
        System.out.println(retList.size());
        Long endTime = System.currentTimeMillis();
        System.out.println("using time is:"+(endTime - startTime));
        JedisPoolUtil.release(jedis);
    }
}

注意不要在使用scan(int)这个方法,它存在一个bug,参数应该是unsigned long而不是int,这个方法在以后jedis版本大改时会被删除

参考文章1
参考文章2