CS/DataBase

NoSQL MongoDB

prden 2024. 9. 18. 17:15

1. 데이터 모델링과 인덱싱

 1) 컬렉션 사이의 관계

    객체 내부에 또 다른 객체를 두어 배열 형식으로 임베디드 방식으로 관리할지 vs. RDB 처럼 FK를 두어서 레퍼런스 방식으로 관리할지.

    장단점 p270(맛있는 Mongo DB)

 

In MongoDB, when modeling a “Catalog” database (or any related schema design), the decision between embedding and referencing depends on the specific use case and the nature of your data. Here’s an overview of when to use each approach:

 

1. Embedding

 

In this approach, related data is stored within the same document as nested sub-documents. This method is often referred to as denormalization.

 

When to Use:

 

One-to-few relationships: When the related data is small and unlikely to grow significantly.

Read-heavy operations: Embedding can improve read performance since all data is retrieved in a single query.

Data that’s tightly coupled: If the embedded data will always be retrieved and updated together with the parent document.

Low update frequency: Embedded documents are best when you don’t frequently need to update individual pieces of the embedded data.

{
  "_id": 1,
  "name": "Product 1",
  "category": {
    "id": 101,
    "name": "Electronics"
  }
}

 

In this example, the category is embedded directly within the product document.

 

Pros of Embedding:

 

Faster reads: All related data is in one place.

Atomic updates: Modifying a document updates both the main document and its embedded documents at once.

 

Cons:

 

Document size: MongoDB has a document size limit of 16MB. If embedded data grows too large, you might hit this limit.

Duplication: In cases where the embedded data (like categories) is reused across multiple documents, changes in one embedded sub-document might require updating multiple parent documents.

 

2. Referencing

 

In this approach, documents reference each other by storing the ObjectId or some unique identifier of the related document. This method is akin to normalization in relational databases.

 

When to Use:

 

One-to-many or many-to-many relationships: When the related data can grow indefinitely or if you need flexibility.

Write-heavy operations: If you need to update the related data frequently, keeping it in a separate document avoids overwriting large parent documents.

Data that’s loosely coupled: If the related data is only occasionally needed or can change independently from the main document.

Avoiding duplication: If the same related data (e.g., categories) is used across many parent documents.

 

Example:

 

Products collection:
{
  "_id": 1,
  "name": "Product 1",
  "categoryId": 101
}

Categories collection:
{
  "_id": 101,
  "name": "Electronics"
}

 

Here, the product document references the category by categoryId.

 

Pros of Referencing:

 

Scalability: Since referenced data is in a separate document, you avoid the document size limit and can scale your collections independently.

Data consistency: Updates to the referenced document (like a category) are reflected across all related documents.

 

Cons:

 

Slower reads: You might need to perform multiple queries (or a $lookup operation) to fetch related documents.

Data fragmentation: If documents are spread across different collections, maintaining consistency in relationships can be more complex.

 

Best Practices

 

Embed when data is tightly coupled and grows moderately: For example, embedding category data into products makes sense if categories rarely change and you always retrieve the product with its category.

Reference when data is loosely coupled or grows large: If categories are shared among many products, and categories can be updated independently, referencing will prevent duplication and large document sizes.

Hybrid approach: In some cases, you can combine both. For example, you can embed frequently accessed, immutable fields of a related document (like the name of the category) and reference less frequently used or mutable fields.

 

Example of Hybrid Approach:

{
  "_id": 1,
  "name": "Product 1",
  "category": {
    "name": "Electronics"
  },
  "categoryId": 101
}

In this case, you store the category name directly for quick reads but also reference the categoryId for updates or deeper queries.

 

Conclusion:

 

Embedding is best when you have tight coupling and frequent read operations with little need for updates to the embedded data.

Referencing is better for flexible, loosely coupled relationships, especially when data grows or is frequently updated.

Consider a hybrid approach for balancing read performance with flexibility, especially in cases of shared or frequently updated data.

 

By understanding your data access patterns, frequency of updates, and the relationship between the entities, you can choose the best approach for your catalog database modeling.

 

Each Document within a collection has a 16MB size limit. 

in MongoDB, each document within a collection has a 16MB size limit. This means that the total size of a single document, including its embedded data and subdocuments, cannot exceed 16MB.

 

Key Points:

 

1. Document Size: Each individual document in MongoDB is capped at 16MB. This includes the entire document’s data, including any nested or embedded subdocuments.

2. Collection Size: There’s no explicit size limit on an entire collection, which can grow to multiple terabytes. The limit applies only at the document level.

3. Workarounds: If you need to store larger amounts of data:

Referencing: Break the data into smaller documents and use references between them.

GridFS: For very large files (e.g., large media files), you can use MongoDB’s GridFS, which allows you to store and retrieve files larger than 16MB by splitting them into smaller chunks.

 

If you embed too much data in a document and exceed the 16MB limit, MongoDB will throw an error when you try to insert or update that document. Therefore, it’s essential to monitor document sizes and use referencing or alternative strategies when necessary.

 

 

 2) 인덱싱

2. 복제

 

3. 샤딩 

 

'CS > DataBase' 카테고리의 다른 글

ORACLE D/B  (0) 2023.04.21
ORACLE 인덱스  (0) 2023.04.21
ORACLE 파티셔닝  (1) 2023.04.21
ORACLE 쿼리 조인, 쿼리 튜닝  (0) 2023.03.10
Lock, DeadLock, Blocking  (0) 2023.03.10