目录

MongoDB之浅谈ntoreturn和batchSize

目录

最近写mongo-proxy时,注意到了ntoreturn,看完官方文档,有点迷糊,因为它和batchSize似乎干着同一件事;之后看了源码,终于弄清楚二者的关系,对MongoDB加深了理解。

首先需要了解一些基本知识:

对于数据查询,结果集可能会很大,db会把结果集划分成多个部分,分批传输至client。

对于MongoDB,ntoreturn和batchSize影响第一批结果的文档数量。

MongoDB query定义如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
struct OP_QUERY {
    MsgHeader header;                   // standard message header
    int32     flags;                    // bit vector of query options.  See below for details.
    cstring   fullCollectionName;       // "dbname.collectionname"
    int32     numberToSkip;             // number of documents to skip
    int32     numberToReturn;           // number of documents to return
                                        //  in the first OP_REPLY batch
    document  query;                    // query object.  See below for details.
    [ document  returnFieldsSelector; ] // Optional. Selector indicating the fields
                                        //  to return.  See below for details.
}

官方文档对于numberToReturn的说明如下:

Limits the number of documents in the first OP_REPLY message to the query. However, the database will still establish a cursor and return the cursorID to the client if there are more results than numberToReturn. If the client driver offers ‘limit’ functionality (like the SQL LIMIT keyword), then it is up to the client driver to ensure that no more than the specified number of document are returned to the calling application. If numberToReturn is 0, the db will use the default return size. If the number is negative, then the database will return that number and close the cursor. No further results for that query can be fetched. If numberToReturn is 1 the server will treat it as -1 (closing the cursor automatically).

numberToReturn限制第一批返回结果的文档数量,其值可以是:

  • 正整数

    返回指定数量的目标文档,如果结果集文档数量多于numberToReturn,数据库内部会建立一个cursor,并把cursorID返回至客户端,以供其读取后续的文档

    1是例外,MongoDB会将1视为-1,返回文档并关闭cursor

  • 0

    返回默认数量的目标文档

  • 负整数

    返回指定数量的目标文档,然后关闭cursor,因此无法获取后续的文档

需要特别指出,command是特殊的query,command的numberToReturn只允许是1或-1。默认情况下,如果一个cursor全部读完或空闲600秒后,server会将其关闭,cursor是数据库的一类资源,应避免产生过多空闲的cursor。

读到这里,问题来了,它和batchSize的作用差不多,下面具体分析第一批结果的组装过程:

src/mongo/db/query/find.cpp

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
// Run the query.
// bb is used to hold query results
// this buffer should contain either requested documents per query or
// explain information, but not both
BufBuilder bb(FindCommon::kInitReplyBufferSize);
bb.skip(sizeof(QueryResult::Value));

// How many results have we obtained from the executor?
int numResults = 0;

// If we're replaying the oplog, we save the last time that we read.
Timestamp slaveReadTill;

BSONObj obj;
PlanExecutor::ExecState state;
// uint64_t numMisplacedDocs = 0;

// Get summary info about which plan the executor is using.
{
    stdx::lock_guard<Client> lk(*txn->getClient());
    curop.setPlanSummary_inlock(Explain::getPlanSummary(exec.get()));
}

while (PlanExecutor::ADVANCED == (state = exec->getNext(&obj, NULL))) {
    // Add result to output buffer.
    bb.appendBuf((void*)obj.objdata(), obj.objsize());

    // Count the result.
    ++numResults;

    // Possibly note slave's position in the oplog.
    if (pq.isOplogReplay()) {
        BSONElement e = obj["ts"];
        if (Date == e.type() || bsonTimestamp == e.type()) {
            slaveReadTill = e.timestamp(); 
        }
    }

    if (FindCommon::enoughForFirstBatch(pq, numResults, bb.len())) {
        LOG(5) << "Enough for first batch, wantMore=" << pq.wantMore()
               << " ntoreturn=" << pq.getNToReturn().value_or(0) << " numResults=" << numResults
               << endl;
        break;
    }
}

上面这段代码的功能是组装第一批返回的结果集,每填充一个文档,检查一次是否满足返回条件(L39),如果满足,则跳出循环。

看一下FindCommon::enoughForFirstBatch()

src/mongo/db/query/find_common.cpp

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
bool FindCommon::enoughForFirstBatch(const LiteParsedQuery& pq,
                                     long long numDocs,
                                     int bytesBuffered) {
    if (!pq.getEffectiveBatchSize()) {
        // If there is no batch size, we stop generating additional results as soon as we have
        // either 101 documents or at least 1MB of data.
        return (bytesBuffered > 1024 * 1024) || numDocs >= LiteParsedQuery::kDefaultBatchSize;
    }

    // If there is a batch size, we add results until either satisfying this batch size or exceeding
    // the 4MB size threshold.
    return numDocs >= pq.getEffectiveBatchSize().value() ||
        bytesBuffered > kMaxBytesToReturnToClientAtOnce;
}

src/mongo/db/query/lite_parsed_query.cpp

1
const long long LiteParsedQuery::kDefaultBatchSize = 101;

src/mongo/db/query/find_common.h

1
static const int kMaxBytesToReturnToClientAtOnce = 4 * 1024 * 1024;

如果未设置batchSize,那么文档数量达到101条文档或数据量达到1MB时,返回true; 如果已设置batchSize,那么文档数量达到指定batchSize或数据量达到4MB时,返回true。 可见,返回结果在数据大小上有硬性限制。

mongo/db/query/lite_parsed_query.cpp

1
2
3
boost::optional<long long> LiteParsedQuery::getEffectiveBatchSize() const {
    return _batchSize ? _batchSize : _ntoreturn;
}

对于一个query,batchSize默认是0,如果未指定,则由ntoreturn确定第一批结果集的文档数量,即batchSize优先级高于ntoreturn。

后来看到这几个变量的注释,ntoreturn一般是由driver或shell根据用户指定batchSize和limit计算出来的,其实作为db用户,并不用关注,毕竟ntoreturn是很底层的实现。

src/mongo/db/query/lite_parsed_query.h

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// Must be either unset or positive. Negative limit is illegal and a limit value of zero
// received from the client is interpreted as the absence of a limit value.
boost::optional<long long> _limit;

// Must be either unset or non-negative. Negative batchSize is illegal but batchSize of 0 is
// allowed.
boost::optional<long long> _batchSize;   

// Set only when parsed from an OP_QUERY find message. The value is computed by driver or shell
// and is set to be a min of batchSize and limit provided by user. LPQ can have set either
// ntoreturn or batchSize / limit.
boost::optional<long long> _ntoreturn;

参考资料