improve schema and store original document in json blob