MM-59966 - Compliance Export overhaul - feature branch (#29789)
* [MM-59089] Add a compliance export constant (#27919)
* add a useful constant
* i18n
* another constant
* another i18n
* [MM-60422] Add GetChannelsWithActivityDuring (#28301)
* modify GetUsersInChannelDuring to accept a slice of channelIds
* add GetChannelsWithActivityDuring
* add compliance export progress message; remove unused custom status
* linting
* tests running too fast
* add batch size config settings
* add store tests
* linting
* empty commit
* i18n changes
* fix i18n ordering
* MM-60570 - Server-side changes consolidating the export CLI with server/ent code (#28640)
* add an i18n field; add the CLI's export directory
* int64 -> int
* Add UntilUpdateAt for MessageExport and AnalyticsPostCount
to merge
* remove now-unused i18n strings
* add TranslationsPreInitFromBuffer to allow CLI to use i18n
* use GetBuilder to simplify; rename TranslationsPreInitFromFileBytes
* [MM-59089] Improve compliance export timings (#1733 - Enterprise repo)
* MM-60422 - Performance and logic fixes for Compliance Exports (#1757 - Enterprise repo)
* MM-60570 - Enterprise-side changes consolidating the export CLI with server/ent code (#1769 - Enterprise repo)
* merge conflicts; missed file from ent branch
* MM-61038 - Add an option to sqlstore.New (#28702)
remove useless comment
add test
add an option to sqlstore.New
* MM-60976: Remove RunExport command from Mattermost binary (#28805)
* remove RunExport command from mattermost binary
* remove the code it was calling
* fix i18n
* remove test (was only testing license, not functionality)
* empty commit
* fix flaky GetChannelsWithActivityDuring test
* MM-60063: Dedicated Export Filestore fix, redo of #1772 (enterprise) (#28803)
* redo filestore fix #1772 (enterprise repo) on top of MM-59966 feature
* add new e2e tests for export filestore
* golint
* ok, note to self: shadowing bad, actually (when there's a defer)
* empty commit
* MM-61137 - Message export: Support 7.8.11 era dbs (#28824)
* support 7.8.11 era dbs by wrapping the store using only what we need
* fix flaky GetChannelsWithActivityDuring test
* add a comment
* only need to define the MEFileInfoStore (the one that'll be overridden)
* blank commit
* MM-60974 - Message Export: Add performance metrics (#28836)
* support 7.8.11 era dbs by wrapping the store using only what we need
* fix flaky GetChannelsWithActivityDuring test
* add a comment
* only need to define the MEFileInfoStore (the one that'll be overridden)
* performance metrics
* cleanup unneeded named returns
* blank commit
* MM-60975 - Message export: Add startTime and endTime to export folder name (#28840)
* support 7.8.11 era dbs by wrapping the store using only what we need
* fix flaky GetChannelsWithActivityDuring test
* add a comment
* only need to define the MEFileInfoStore (the one that'll be overridden)
* performance metrics
* output startTime and endTime in export folder
* empty commit
* merge conflict
* MM-60978 - Message export: Improve xml fields; fix delete semantics (#28873)
* support 7.8.11 era dbs by wrapping the store using only what we need
* fix flaky GetChannelsWithActivityDuring test
* add a comment
* only need to define the MEFileInfoStore (the one that'll be overridden)
* performance metrics
* output startTime and endTime in export folder
* empty commit
* add xml fields, omit when empty, tests
* fix delete semantics; test (and test for update semantics)
* clarify comments
* simplify edited post detection, now there's no edge case.
* add some spacing to help fast running tests
* merge conflicts/updates needed for new deleted post semantics
* linting; fixing tests from upstream merge
* use SafeDereference
* linting
* stronger typing; better wrapped errors; better formatting
* blank commit
* goimports formatting
* fix merge mistake
* minor fixes due to changes in master
* MM-61755 - Simplifying and Support reporting to the db from the CLI (#29281)
* finally clean up JobData struct and stringMap; prep for CLI using db
* and now simplify using StringMapToJobDataWithZeroValues
* remove unused fn
* create JobDataExported; clean up errors
* MM-60176 - Message Export: Global relay cleanup (#29168)
* move global relay logic into global_relay_export
* blank commit
* blank commit
* improve errors
* MM-60693 - Refactor CSV to use same codepath as Actiance (#29191)
* move global relay logic into global_relay_export
* blank commit
* refactor (and simplify) ExportParams into shared
* blank commit
* remove unused fn
* csv now uses pre-calculated joins/leaves like actiance
* improve errors
* remove nil post check; remove ignoredPosts metric
* remove unneeded copy
* MM-61696 - Refactor GlobalRelay to use same codepath as Actiance (#29225)
* move global relay logic into global_relay_export
* blank commit
* refactor (and simplify) ExportParams into shared
* blank commit
* remove unused fn
* csv now uses pre-calculated joins/leaves like actiance
* remove newly unneeded function and its test. goodbye.
* refactor GetPostAttachments for csv + global relay to share
* refactor global_relay_export and fix tests (no changes to output)
* improve errors
* remove nil post check; remove ignoredPosts metric
* remove unneeded copy
* remove unneeded nil check
* PR comments
* MM-61715 - Generalize e2e to all export types 🤖 (#29369)
* move global relay logic into global_relay_export
* blank commit
* refactor (and simplify) ExportParams into shared
* blank commit
* remove unused fn
* csv now uses pre-calculated joins/leaves like actiance
* remove newly unneeded function and its test. goodbye.
* refactor GetPostAttachments for csv + global relay to share
* refactor global_relay_export and fix tests (no changes to output)
* improve errors
* remove nil post check; remove ignoredPosts metric
* remove unneeded copy
* remove unneeded nil check
* PR comments
* refactor isDeletedMsg for all export types
* fix start and endtime, nasty csv createAt bug; bring closer to Actiance
* align unit tests with new logic (e.g. starttime / endtime)
* refactor a TimestampConvert fn for code + tests
* bug: pass templates to global relay (hurray for e2e tests, otherwise...)
* add global relay zip to allowed list (only for tests)
* test helpers
* new templates for e2e tests
* e2e tests... phew.
* linting
* merge conflicts
* unexport PostToRow; add test helper marker
* cleanup, shortening, thanks to PR comments
* MM-61972 - Generalize export data path - Actiance (#29399)
* extract and generalize the export data generation functions
* finish moving test (bc of previous extraction)
* lift a function from common -> shared (to break an import cycle)
* actiance now takes general export data, processes it into actiance data
* bring tests in line with correct sorting rules (upadateAt, messageId)
* fixups, PR comments
* turn strings.Repeat into a more descriptive const
amended: one letter fix; bad rebase
* MM-62009 - e2e clock heisenbug (#29434)
* consolidate assertions; output debuggable diffs (keeping for future)
* refactor test output generator to generators file
* waitUntilZeroPosts + pass through until to job = fix all clock issues
* simplify messages to model.NewId(); remove unneeded waitUntilZeroPosts
* model.NewId() -> storetest.NewTestID()
* MM-61980 - Generalize export data path - CSV (#29482)
* simple refactoring
* increase sleep times for (very) rare test failures
* add extra information to the generic export for CSV
* adj Actiance to handle new generic export (no difference in its output)
* no longer need mergePosts (yay), move getJoinLeavePosts for everyone
* adjust tests for new csv semantics (detailed in summary)
* and need to add the new exported data to the export_data_tests
* rearrange csv writing to happen after data export (more logical)
* linting
* remove debug statements
* figured out what was wrong with global relay e2e test 3; solid now
* PR comments
* MM-61718 - Generalize export data path - Global Relay (#29508)
* move global relay over to using the generalized export data
* performance pass -- not much can be done
* Update server/enterprise/message_export/global_relay_export/global_relay_export.go
Co-authored-by: Claudio Costa <cstcld91@gmail.com>
---------
Co-authored-by: Claudio Costa <cstcld91@gmail.com>
* MM-62058 - Align CSV with Actiance (#29551)
* refactoring actiance files and var names for clarity
* bug found in exported attachments (we used to miss some start/ends)
* changes needed for actiance due to new generic exports
* bringing CSV up to actiance standards
* fixing global relay b/c of new semantics (adding a note on an edge case)
* aligning e2e tests, adding comments to clarify what is expected/tested
* necessary changes; 1 more test for added functionality (ignoreDeleted)
* comment style
* MM-62059 - Align Global Relay with Actiance/CSV; many fixes (#29665)
* core logic changes to general export_data and the specific export paths
* unit tests and e2e tests, covering all new edge cases and all logic
* linting
* better var naming, const value, and cleaning up functions calls
* MM-62436 - Temporarily skip cypress tests that require download link (#29772)
---------
Co-authored-by: Claudio Costa <cstcld91@gmail.com>
2025-01-10 16:56:02 -05:00
|
|
|
// Copyright (c) 2015-present Mattermost, Inc. All Rights Reserved.
|
|
|
|
|
// See LICENSE.enterprise for license information.
|
|
|
|
|
|
|
|
|
|
package shared
|
|
|
|
|
|
|
|
|
|
import (
|
|
|
|
|
"encoding/json"
|
|
|
|
|
"fmt"
|
|
|
|
|
"path"
|
|
|
|
|
"strconv"
|
|
|
|
|
"time"
|
|
|
|
|
|
|
|
|
|
"github.com/pkg/errors"
|
|
|
|
|
|
|
|
|
|
"github.com/mattermost/mattermost/server/v8/platform/shared/filestore"
|
|
|
|
|
"github.com/mattermost/mattermost/server/v8/platform/shared/templates"
|
|
|
|
|
|
|
|
|
|
"github.com/mattermost/mattermost/server/public/model"
|
|
|
|
|
"github.com/mattermost/mattermost/server/public/shared/mlog"
|
|
|
|
|
"github.com/mattermost/mattermost/server/public/shared/request"
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
const (
|
2025-01-26 22:58:07 -05:00
|
|
|
MissingFileMessageDuringBackendRead = "File backend read: File missing for post; cannot copy file to archive"
|
|
|
|
|
MissingFileMessageDuringCopy = "Copy buffer: File missing for post; cannot copy file to archive"
|
MM-59966 - Compliance Export overhaul - feature branch (#29789)
* [MM-59089] Add a compliance export constant (#27919)
* add a useful constant
* i18n
* another constant
* another i18n
* [MM-60422] Add GetChannelsWithActivityDuring (#28301)
* modify GetUsersInChannelDuring to accept a slice of channelIds
* add GetChannelsWithActivityDuring
* add compliance export progress message; remove unused custom status
* linting
* tests running too fast
* add batch size config settings
* add store tests
* linting
* empty commit
* i18n changes
* fix i18n ordering
* MM-60570 - Server-side changes consolidating the export CLI with server/ent code (#28640)
* add an i18n field; add the CLI's export directory
* int64 -> int
* Add UntilUpdateAt for MessageExport and AnalyticsPostCount
to merge
* remove now-unused i18n strings
* add TranslationsPreInitFromBuffer to allow CLI to use i18n
* use GetBuilder to simplify; rename TranslationsPreInitFromFileBytes
* [MM-59089] Improve compliance export timings (#1733 - Enterprise repo)
* MM-60422 - Performance and logic fixes for Compliance Exports (#1757 - Enterprise repo)
* MM-60570 - Enterprise-side changes consolidating the export CLI with server/ent code (#1769 - Enterprise repo)
* merge conflicts; missed file from ent branch
* MM-61038 - Add an option to sqlstore.New (#28702)
remove useless comment
add test
add an option to sqlstore.New
* MM-60976: Remove RunExport command from Mattermost binary (#28805)
* remove RunExport command from mattermost binary
* remove the code it was calling
* fix i18n
* remove test (was only testing license, not functionality)
* empty commit
* fix flaky GetChannelsWithActivityDuring test
* MM-60063: Dedicated Export Filestore fix, redo of #1772 (enterprise) (#28803)
* redo filestore fix #1772 (enterprise repo) on top of MM-59966 feature
* add new e2e tests for export filestore
* golint
* ok, note to self: shadowing bad, actually (when there's a defer)
* empty commit
* MM-61137 - Message export: Support 7.8.11 era dbs (#28824)
* support 7.8.11 era dbs by wrapping the store using only what we need
* fix flaky GetChannelsWithActivityDuring test
* add a comment
* only need to define the MEFileInfoStore (the one that'll be overridden)
* blank commit
* MM-60974 - Message Export: Add performance metrics (#28836)
* support 7.8.11 era dbs by wrapping the store using only what we need
* fix flaky GetChannelsWithActivityDuring test
* add a comment
* only need to define the MEFileInfoStore (the one that'll be overridden)
* performance metrics
* cleanup unneeded named returns
* blank commit
* MM-60975 - Message export: Add startTime and endTime to export folder name (#28840)
* support 7.8.11 era dbs by wrapping the store using only what we need
* fix flaky GetChannelsWithActivityDuring test
* add a comment
* only need to define the MEFileInfoStore (the one that'll be overridden)
* performance metrics
* output startTime and endTime in export folder
* empty commit
* merge conflict
* MM-60978 - Message export: Improve xml fields; fix delete semantics (#28873)
* support 7.8.11 era dbs by wrapping the store using only what we need
* fix flaky GetChannelsWithActivityDuring test
* add a comment
* only need to define the MEFileInfoStore (the one that'll be overridden)
* performance metrics
* output startTime and endTime in export folder
* empty commit
* add xml fields, omit when empty, tests
* fix delete semantics; test (and test for update semantics)
* clarify comments
* simplify edited post detection, now there's no edge case.
* add some spacing to help fast running tests
* merge conflicts/updates needed for new deleted post semantics
* linting; fixing tests from upstream merge
* use SafeDereference
* linting
* stronger typing; better wrapped errors; better formatting
* blank commit
* goimports formatting
* fix merge mistake
* minor fixes due to changes in master
* MM-61755 - Simplifying and Support reporting to the db from the CLI (#29281)
* finally clean up JobData struct and stringMap; prep for CLI using db
* and now simplify using StringMapToJobDataWithZeroValues
* remove unused fn
* create JobDataExported; clean up errors
* MM-60176 - Message Export: Global relay cleanup (#29168)
* move global relay logic into global_relay_export
* blank commit
* blank commit
* improve errors
* MM-60693 - Refactor CSV to use same codepath as Actiance (#29191)
* move global relay logic into global_relay_export
* blank commit
* refactor (and simplify) ExportParams into shared
* blank commit
* remove unused fn
* csv now uses pre-calculated joins/leaves like actiance
* improve errors
* remove nil post check; remove ignoredPosts metric
* remove unneeded copy
* MM-61696 - Refactor GlobalRelay to use same codepath as Actiance (#29225)
* move global relay logic into global_relay_export
* blank commit
* refactor (and simplify) ExportParams into shared
* blank commit
* remove unused fn
* csv now uses pre-calculated joins/leaves like actiance
* remove newly unneeded function and its test. goodbye.
* refactor GetPostAttachments for csv + global relay to share
* refactor global_relay_export and fix tests (no changes to output)
* improve errors
* remove nil post check; remove ignoredPosts metric
* remove unneeded copy
* remove unneeded nil check
* PR comments
* MM-61715 - Generalize e2e to all export types 🤖 (#29369)
* move global relay logic into global_relay_export
* blank commit
* refactor (and simplify) ExportParams into shared
* blank commit
* remove unused fn
* csv now uses pre-calculated joins/leaves like actiance
* remove newly unneeded function and its test. goodbye.
* refactor GetPostAttachments for csv + global relay to share
* refactor global_relay_export and fix tests (no changes to output)
* improve errors
* remove nil post check; remove ignoredPosts metric
* remove unneeded copy
* remove unneeded nil check
* PR comments
* refactor isDeletedMsg for all export types
* fix start and endtime, nasty csv createAt bug; bring closer to Actiance
* align unit tests with new logic (e.g. starttime / endtime)
* refactor a TimestampConvert fn for code + tests
* bug: pass templates to global relay (hurray for e2e tests, otherwise...)
* add global relay zip to allowed list (only for tests)
* test helpers
* new templates for e2e tests
* e2e tests... phew.
* linting
* merge conflicts
* unexport PostToRow; add test helper marker
* cleanup, shortening, thanks to PR comments
* MM-61972 - Generalize export data path - Actiance (#29399)
* extract and generalize the export data generation functions
* finish moving test (bc of previous extraction)
* lift a function from common -> shared (to break an import cycle)
* actiance now takes general export data, processes it into actiance data
* bring tests in line with correct sorting rules (upadateAt, messageId)
* fixups, PR comments
* turn strings.Repeat into a more descriptive const
amended: one letter fix; bad rebase
* MM-62009 - e2e clock heisenbug (#29434)
* consolidate assertions; output debuggable diffs (keeping for future)
* refactor test output generator to generators file
* waitUntilZeroPosts + pass through until to job = fix all clock issues
* simplify messages to model.NewId(); remove unneeded waitUntilZeroPosts
* model.NewId() -> storetest.NewTestID()
* MM-61980 - Generalize export data path - CSV (#29482)
* simple refactoring
* increase sleep times for (very) rare test failures
* add extra information to the generic export for CSV
* adj Actiance to handle new generic export (no difference in its output)
* no longer need mergePosts (yay), move getJoinLeavePosts for everyone
* adjust tests for new csv semantics (detailed in summary)
* and need to add the new exported data to the export_data_tests
* rearrange csv writing to happen after data export (more logical)
* linting
* remove debug statements
* figured out what was wrong with global relay e2e test 3; solid now
* PR comments
* MM-61718 - Generalize export data path - Global Relay (#29508)
* move global relay over to using the generalized export data
* performance pass -- not much can be done
* Update server/enterprise/message_export/global_relay_export/global_relay_export.go
Co-authored-by: Claudio Costa <cstcld91@gmail.com>
---------
Co-authored-by: Claudio Costa <cstcld91@gmail.com>
* MM-62058 - Align CSV with Actiance (#29551)
* refactoring actiance files and var names for clarity
* bug found in exported attachments (we used to miss some start/ends)
* changes needed for actiance due to new generic exports
* bringing CSV up to actiance standards
* fixing global relay b/c of new semantics (adding a note on an edge case)
* aligning e2e tests, adding comments to clarify what is expected/tested
* necessary changes; 1 more test for added functionality (ignoreDeleted)
* comment style
* MM-62059 - Align Global Relay with Actiance/CSV; many fixes (#29665)
* core logic changes to general export_data and the specific export paths
* unit tests and e2e tests, covering all new edge cases and all logic
* linting
* better var naming, const value, and cleaning up functions calls
* MM-62436 - Temporarily skip cypress tests that require download link (#29772)
---------
Co-authored-by: Claudio Costa <cstcld91@gmail.com>
2025-01-10 16:56:02 -05:00
|
|
|
|
|
|
|
|
EstimatedPostCount = 10_000_000
|
|
|
|
|
|
|
|
|
|
// JobDataBatchStartTime is the posts.updateat value from the previous batch. Posts are selected using
|
|
|
|
|
// keyset pagination sorted by (posts.updateat, posts.id).
|
|
|
|
|
JobDataBatchStartTime = "batch_start_time"
|
|
|
|
|
|
|
|
|
|
// JobDataJobStartTime is the start of the job (doesn't change across batches)
|
|
|
|
|
JobDataJobStartTime = "job_start_time"
|
|
|
|
|
|
|
|
|
|
// JobDataBatchStartId is the posts.id value from the previous batch.
|
|
|
|
|
JobDataBatchStartId = "batch_start_id"
|
|
|
|
|
|
|
|
|
|
// JobDataJobEndTime is the point up to which this job is exporting. It is the time the job was started,
|
|
|
|
|
// i.e., we export everything from the end of previous batch to the moment this batch started.
|
|
|
|
|
JobDataJobEndTime = "job_end_time"
|
|
|
|
|
|
|
|
|
|
JobDataJobStartId = "job_start_id"
|
|
|
|
|
JobDataExportType = "export_type"
|
2025-06-24 17:38:30 -04:00
|
|
|
JobDataInitiatedBy = "initiated_by"
|
MM-59966 - Compliance Export overhaul - feature branch (#29789)
* [MM-59089] Add a compliance export constant (#27919)
* add a useful constant
* i18n
* another constant
* another i18n
* [MM-60422] Add GetChannelsWithActivityDuring (#28301)
* modify GetUsersInChannelDuring to accept a slice of channelIds
* add GetChannelsWithActivityDuring
* add compliance export progress message; remove unused custom status
* linting
* tests running too fast
* add batch size config settings
* add store tests
* linting
* empty commit
* i18n changes
* fix i18n ordering
* MM-60570 - Server-side changes consolidating the export CLI with server/ent code (#28640)
* add an i18n field; add the CLI's export directory
* int64 -> int
* Add UntilUpdateAt for MessageExport and AnalyticsPostCount
to merge
* remove now-unused i18n strings
* add TranslationsPreInitFromBuffer to allow CLI to use i18n
* use GetBuilder to simplify; rename TranslationsPreInitFromFileBytes
* [MM-59089] Improve compliance export timings (#1733 - Enterprise repo)
* MM-60422 - Performance and logic fixes for Compliance Exports (#1757 - Enterprise repo)
* MM-60570 - Enterprise-side changes consolidating the export CLI with server/ent code (#1769 - Enterprise repo)
* merge conflicts; missed file from ent branch
* MM-61038 - Add an option to sqlstore.New (#28702)
remove useless comment
add test
add an option to sqlstore.New
* MM-60976: Remove RunExport command from Mattermost binary (#28805)
* remove RunExport command from mattermost binary
* remove the code it was calling
* fix i18n
* remove test (was only testing license, not functionality)
* empty commit
* fix flaky GetChannelsWithActivityDuring test
* MM-60063: Dedicated Export Filestore fix, redo of #1772 (enterprise) (#28803)
* redo filestore fix #1772 (enterprise repo) on top of MM-59966 feature
* add new e2e tests for export filestore
* golint
* ok, note to self: shadowing bad, actually (when there's a defer)
* empty commit
* MM-61137 - Message export: Support 7.8.11 era dbs (#28824)
* support 7.8.11 era dbs by wrapping the store using only what we need
* fix flaky GetChannelsWithActivityDuring test
* add a comment
* only need to define the MEFileInfoStore (the one that'll be overridden)
* blank commit
* MM-60974 - Message Export: Add performance metrics (#28836)
* support 7.8.11 era dbs by wrapping the store using only what we need
* fix flaky GetChannelsWithActivityDuring test
* add a comment
* only need to define the MEFileInfoStore (the one that'll be overridden)
* performance metrics
* cleanup unneeded named returns
* blank commit
* MM-60975 - Message export: Add startTime and endTime to export folder name (#28840)
* support 7.8.11 era dbs by wrapping the store using only what we need
* fix flaky GetChannelsWithActivityDuring test
* add a comment
* only need to define the MEFileInfoStore (the one that'll be overridden)
* performance metrics
* output startTime and endTime in export folder
* empty commit
* merge conflict
* MM-60978 - Message export: Improve xml fields; fix delete semantics (#28873)
* support 7.8.11 era dbs by wrapping the store using only what we need
* fix flaky GetChannelsWithActivityDuring test
* add a comment
* only need to define the MEFileInfoStore (the one that'll be overridden)
* performance metrics
* output startTime and endTime in export folder
* empty commit
* add xml fields, omit when empty, tests
* fix delete semantics; test (and test for update semantics)
* clarify comments
* simplify edited post detection, now there's no edge case.
* add some spacing to help fast running tests
* merge conflicts/updates needed for new deleted post semantics
* linting; fixing tests from upstream merge
* use SafeDereference
* linting
* stronger typing; better wrapped errors; better formatting
* blank commit
* goimports formatting
* fix merge mistake
* minor fixes due to changes in master
* MM-61755 - Simplifying and Support reporting to the db from the CLI (#29281)
* finally clean up JobData struct and stringMap; prep for CLI using db
* and now simplify using StringMapToJobDataWithZeroValues
* remove unused fn
* create JobDataExported; clean up errors
* MM-60176 - Message Export: Global relay cleanup (#29168)
* move global relay logic into global_relay_export
* blank commit
* blank commit
* improve errors
* MM-60693 - Refactor CSV to use same codepath as Actiance (#29191)
* move global relay logic into global_relay_export
* blank commit
* refactor (and simplify) ExportParams into shared
* blank commit
* remove unused fn
* csv now uses pre-calculated joins/leaves like actiance
* improve errors
* remove nil post check; remove ignoredPosts metric
* remove unneeded copy
* MM-61696 - Refactor GlobalRelay to use same codepath as Actiance (#29225)
* move global relay logic into global_relay_export
* blank commit
* refactor (and simplify) ExportParams into shared
* blank commit
* remove unused fn
* csv now uses pre-calculated joins/leaves like actiance
* remove newly unneeded function and its test. goodbye.
* refactor GetPostAttachments for csv + global relay to share
* refactor global_relay_export and fix tests (no changes to output)
* improve errors
* remove nil post check; remove ignoredPosts metric
* remove unneeded copy
* remove unneeded nil check
* PR comments
* MM-61715 - Generalize e2e to all export types 🤖 (#29369)
* move global relay logic into global_relay_export
* blank commit
* refactor (and simplify) ExportParams into shared
* blank commit
* remove unused fn
* csv now uses pre-calculated joins/leaves like actiance
* remove newly unneeded function and its test. goodbye.
* refactor GetPostAttachments for csv + global relay to share
* refactor global_relay_export and fix tests (no changes to output)
* improve errors
* remove nil post check; remove ignoredPosts metric
* remove unneeded copy
* remove unneeded nil check
* PR comments
* refactor isDeletedMsg for all export types
* fix start and endtime, nasty csv createAt bug; bring closer to Actiance
* align unit tests with new logic (e.g. starttime / endtime)
* refactor a TimestampConvert fn for code + tests
* bug: pass templates to global relay (hurray for e2e tests, otherwise...)
* add global relay zip to allowed list (only for tests)
* test helpers
* new templates for e2e tests
* e2e tests... phew.
* linting
* merge conflicts
* unexport PostToRow; add test helper marker
* cleanup, shortening, thanks to PR comments
* MM-61972 - Generalize export data path - Actiance (#29399)
* extract and generalize the export data generation functions
* finish moving test (bc of previous extraction)
* lift a function from common -> shared (to break an import cycle)
* actiance now takes general export data, processes it into actiance data
* bring tests in line with correct sorting rules (upadateAt, messageId)
* fixups, PR comments
* turn strings.Repeat into a more descriptive const
amended: one letter fix; bad rebase
* MM-62009 - e2e clock heisenbug (#29434)
* consolidate assertions; output debuggable diffs (keeping for future)
* refactor test output generator to generators file
* waitUntilZeroPosts + pass through until to job = fix all clock issues
* simplify messages to model.NewId(); remove unneeded waitUntilZeroPosts
* model.NewId() -> storetest.NewTestID()
* MM-61980 - Generalize export data path - CSV (#29482)
* simple refactoring
* increase sleep times for (very) rare test failures
* add extra information to the generic export for CSV
* adj Actiance to handle new generic export (no difference in its output)
* no longer need mergePosts (yay), move getJoinLeavePosts for everyone
* adjust tests for new csv semantics (detailed in summary)
* and need to add the new exported data to the export_data_tests
* rearrange csv writing to happen after data export (more logical)
* linting
* remove debug statements
* figured out what was wrong with global relay e2e test 3; solid now
* PR comments
* MM-61718 - Generalize export data path - Global Relay (#29508)
* move global relay over to using the generalized export data
* performance pass -- not much can be done
* Update server/enterprise/message_export/global_relay_export/global_relay_export.go
Co-authored-by: Claudio Costa <cstcld91@gmail.com>
---------
Co-authored-by: Claudio Costa <cstcld91@gmail.com>
* MM-62058 - Align CSV with Actiance (#29551)
* refactoring actiance files and var names for clarity
* bug found in exported attachments (we used to miss some start/ends)
* changes needed for actiance due to new generic exports
* bringing CSV up to actiance standards
* fixing global relay b/c of new semantics (adding a note on an edge case)
* aligning e2e tests, adding comments to clarify what is expected/tested
* necessary changes; 1 more test for added functionality (ignoreDeleted)
* comment style
* MM-62059 - Align Global Relay with Actiance/CSV; many fixes (#29665)
* core logic changes to general export_data and the specific export paths
* unit tests and e2e tests, covering all new edge cases and all logic
* linting
* better var naming, const value, and cleaning up functions calls
* MM-62436 - Temporarily skip cypress tests that require download link (#29772)
---------
Co-authored-by: Claudio Costa <cstcld91@gmail.com>
2025-01-10 16:56:02 -05:00
|
|
|
JobDataBatchSize = "batch_size"
|
|
|
|
|
JobDataChannelBatchSize = "channel_batch_size"
|
|
|
|
|
JobDataChannelHistoryBatchSize = "channel_history_batch_size"
|
|
|
|
|
JobDataMessagesExported = "messages_exported"
|
|
|
|
|
JobDataWarningCount = "warning_count"
|
|
|
|
|
JobDataIsDownloadable = "is_downloadable"
|
|
|
|
|
JobDataExportDir = "export_dir"
|
|
|
|
|
JobDataBatchNumber = "job_batch_number"
|
|
|
|
|
JobDataTotalPostsExpected = "total_posts_expected"
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
type PostUpdatedType string
|
|
|
|
|
|
|
|
|
|
const (
|
|
|
|
|
EditedOriginalMsg PostUpdatedType = "EditedOriginalMsg"
|
|
|
|
|
EditedNewMsg PostUpdatedType = "EditedNewMsg"
|
|
|
|
|
UpdatedNoMsgChange PostUpdatedType = "UpdatedNoMsgChange"
|
|
|
|
|
Deleted PostUpdatedType = "Deleted"
|
|
|
|
|
FileDeleted PostUpdatedType = "FileDeleted"
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
// JobData keeps the current state of the job.
|
|
|
|
|
// When used by a worker, all fields in JobDataExported are exported to the job's job.Data prop bag.
|
|
|
|
|
type JobData struct {
|
|
|
|
|
JobDataExported
|
|
|
|
|
|
|
|
|
|
ExportPeriodStartTime int64
|
|
|
|
|
|
|
|
|
|
// This section is the current state of the export
|
|
|
|
|
ChannelMetadata map[string]*MetadataChannel
|
|
|
|
|
ChannelMemberHistories map[string][]*model.ChannelMemberHistoryResult
|
|
|
|
|
Cursor model.MessageExportCursor
|
|
|
|
|
PostsToExport []*model.MessageExport
|
|
|
|
|
BatchEndTime int64
|
|
|
|
|
BatchPath string
|
|
|
|
|
MessageExportMs []int64
|
|
|
|
|
ProcessingPostsMs []int64
|
|
|
|
|
ProcessingXmlMs []int64
|
|
|
|
|
TransferringFilesMs []int64
|
|
|
|
|
TransferringZipMs []int64
|
|
|
|
|
TotalBatchMs []int64
|
|
|
|
|
Finished bool
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
type JobDataExported struct {
|
|
|
|
|
ExportType string
|
|
|
|
|
ExportDir string
|
|
|
|
|
BatchStartTime int64
|
|
|
|
|
BatchStartId string
|
|
|
|
|
JobStartTime int64
|
|
|
|
|
JobEndTime int64
|
|
|
|
|
JobStartId string
|
|
|
|
|
BatchSize int
|
|
|
|
|
ChannelBatchSize int
|
|
|
|
|
ChannelHistoryBatchSize int
|
|
|
|
|
BatchNumber int
|
|
|
|
|
TotalPostsExpected int
|
|
|
|
|
MessagesExported int
|
|
|
|
|
WarningCount int
|
|
|
|
|
IsDownloadable bool
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
func JobDataToStringMap(jd JobData) map[string]string {
|
|
|
|
|
ret := make(map[string]string)
|
|
|
|
|
ret[JobDataExportType] = jd.ExportType
|
|
|
|
|
ret[JobDataExportDir] = jd.ExportDir
|
|
|
|
|
ret[JobDataBatchStartTime] = strconv.FormatInt(jd.BatchStartTime, 10)
|
|
|
|
|
ret[JobDataBatchStartId] = jd.BatchStartId
|
|
|
|
|
ret[JobDataJobStartTime] = strconv.FormatInt(jd.JobStartTime, 10)
|
|
|
|
|
ret[JobDataJobEndTime] = strconv.FormatInt(jd.JobEndTime, 10)
|
|
|
|
|
ret[JobDataJobStartId] = jd.JobStartId
|
|
|
|
|
ret[JobDataBatchSize] = strconv.Itoa(jd.BatchSize)
|
|
|
|
|
ret[JobDataChannelBatchSize] = strconv.Itoa(jd.ChannelBatchSize)
|
|
|
|
|
ret[JobDataChannelHistoryBatchSize] = strconv.Itoa(jd.ChannelHistoryBatchSize)
|
|
|
|
|
ret[JobDataBatchNumber] = strconv.Itoa(jd.BatchNumber)
|
|
|
|
|
ret[JobDataTotalPostsExpected] = strconv.Itoa(jd.TotalPostsExpected)
|
|
|
|
|
ret[JobDataMessagesExported] = strconv.Itoa(jd.MessagesExported)
|
|
|
|
|
ret[JobDataWarningCount] = strconv.Itoa(jd.WarningCount)
|
|
|
|
|
ret[JobDataIsDownloadable] = strconv.FormatBool(jd.IsDownloadable)
|
|
|
|
|
return ret
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
func StringMapToJobDataWithZeroValues(sm map[string]string) (JobData, error) {
|
|
|
|
|
var jd JobData
|
|
|
|
|
var err error
|
|
|
|
|
|
|
|
|
|
jd.ExportType = sm[JobDataExportType]
|
|
|
|
|
jd.ExportDir = sm[JobDataExportDir]
|
|
|
|
|
|
|
|
|
|
batchStartTime, ok := sm[JobDataBatchStartTime]
|
|
|
|
|
if !ok {
|
|
|
|
|
batchStartTime = "0"
|
|
|
|
|
}
|
|
|
|
|
if jd.BatchStartTime, err = strconv.ParseInt(batchStartTime, 10, 64); err != nil {
|
|
|
|
|
return jd, errors.Wrap(err, "error converting JobDataBatchStartTime")
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
jd.BatchStartId = sm[JobDataBatchStartId]
|
|
|
|
|
|
|
|
|
|
jobStartTime, ok := sm[JobDataJobStartTime]
|
|
|
|
|
if !ok {
|
|
|
|
|
jobStartTime = "0"
|
|
|
|
|
}
|
|
|
|
|
if jd.JobStartTime, err = strconv.ParseInt(jobStartTime, 10, 64); err != nil {
|
|
|
|
|
return jd, errors.Wrap(err, "error converting JobDataJobStartTime")
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
jobEndTime, ok := sm[JobDataJobEndTime]
|
|
|
|
|
if !ok {
|
|
|
|
|
jobEndTime = "0"
|
|
|
|
|
}
|
|
|
|
|
if jd.JobEndTime, err = strconv.ParseInt(jobEndTime, 10, 64); err != nil {
|
|
|
|
|
return jd, errors.Wrap(err, "error converting JobDataJobEndTime")
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
jd.JobStartId = sm[JobDataJobStartId]
|
|
|
|
|
|
|
|
|
|
jobBatchSize, ok := sm[JobDataBatchSize]
|
|
|
|
|
if !ok {
|
|
|
|
|
jobBatchSize = "0"
|
|
|
|
|
}
|
|
|
|
|
if jd.BatchSize, err = strconv.Atoi(jobBatchSize); err != nil {
|
|
|
|
|
return jd, errors.Wrap(err, "error converting JobDataBatchSize")
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
channelBatchSize, ok := sm[JobDataChannelBatchSize]
|
|
|
|
|
if !ok {
|
|
|
|
|
channelBatchSize = "0"
|
|
|
|
|
}
|
|
|
|
|
if jd.ChannelBatchSize, err = strconv.Atoi(channelBatchSize); err != nil {
|
|
|
|
|
return jd, errors.Wrap(err, "error converting JobDataChannelBatchSize")
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
channelHistoryBatchSize, ok := sm[JobDataChannelHistoryBatchSize]
|
|
|
|
|
if !ok {
|
|
|
|
|
channelHistoryBatchSize = "0"
|
|
|
|
|
}
|
|
|
|
|
if jd.ChannelHistoryBatchSize, err = strconv.Atoi(channelHistoryBatchSize); err != nil {
|
|
|
|
|
return jd, errors.Wrap(err, "error converting JobDataChannelHistoryBatchSize")
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
batchNumber, ok := sm[JobDataBatchNumber]
|
|
|
|
|
if !ok {
|
|
|
|
|
batchNumber = "0"
|
|
|
|
|
}
|
|
|
|
|
if jd.BatchNumber, err = strconv.Atoi(batchNumber); err != nil {
|
|
|
|
|
return jd, errors.Wrap(err, "error converting JobDataBatchNumber")
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
totalPostsExpected, ok := sm[JobDataTotalPostsExpected]
|
|
|
|
|
if !ok {
|
|
|
|
|
totalPostsExpected = "0"
|
|
|
|
|
}
|
|
|
|
|
if jd.TotalPostsExpected, err = strconv.Atoi(totalPostsExpected); err != nil {
|
|
|
|
|
return jd, errors.Wrap(err, "error converting JobDataTotalPostsExpected")
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
messagesExported, ok := sm[JobDataMessagesExported]
|
|
|
|
|
if !ok {
|
|
|
|
|
messagesExported = "0"
|
|
|
|
|
}
|
|
|
|
|
if jd.MessagesExported, err = strconv.Atoi(messagesExported); err != nil {
|
|
|
|
|
return jd, errors.Wrap(err, "error converting JobDataMessagesExported")
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
warningCount, ok := sm[JobDataWarningCount]
|
|
|
|
|
if !ok {
|
|
|
|
|
warningCount = "0"
|
|
|
|
|
}
|
|
|
|
|
if jd.WarningCount, err = strconv.Atoi(warningCount); err != nil {
|
|
|
|
|
return jd, errors.Wrap(err, "error converting JobDataWarningCount")
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
isDownloadable, ok := sm[JobDataIsDownloadable]
|
|
|
|
|
if !ok {
|
|
|
|
|
isDownloadable = "0"
|
|
|
|
|
}
|
|
|
|
|
if jd.IsDownloadable, err = strconv.ParseBool(isDownloadable); err != nil {
|
|
|
|
|
return jd, errors.Wrap(err, "error converting JobDataIsDownloadable")
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return jd, nil
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
type BackendParams struct {
|
|
|
|
|
Config *model.Config
|
|
|
|
|
Store MessageExportStore
|
|
|
|
|
FileAttachmentBackend filestore.FileBackend
|
|
|
|
|
ExportBackend filestore.FileBackend
|
|
|
|
|
HtmlTemplates *templates.Container
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
type ExportParams struct {
|
|
|
|
|
ExportType string
|
|
|
|
|
ChannelMetadata map[string]*MetadataChannel
|
|
|
|
|
Posts []*model.MessageExport
|
|
|
|
|
ChannelMemberHistories map[string][]*model.ChannelMemberHistoryResult
|
|
|
|
|
JobStartTime int64
|
|
|
|
|
BatchPath string
|
|
|
|
|
BatchStartTime int64
|
|
|
|
|
BatchEndTime int64
|
|
|
|
|
Config *model.Config
|
|
|
|
|
Db MessageExportStore
|
|
|
|
|
FileAttachmentBackend filestore.FileBackend
|
|
|
|
|
ExportBackend filestore.FileBackend
|
|
|
|
|
Templates *templates.Container
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
type WriteExportResult struct {
|
|
|
|
|
TransferringFilesMs int64
|
|
|
|
|
ProcessingXmlMs int64
|
|
|
|
|
TransferringZipMs int64
|
|
|
|
|
NumWarnings int
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
type RunExportResults struct {
|
|
|
|
|
CreatedPosts int
|
|
|
|
|
EditedOrigMsgPosts int
|
|
|
|
|
EditedNewMsgPosts int
|
|
|
|
|
UpdatedPosts int
|
|
|
|
|
DeletedPosts int
|
|
|
|
|
UploadedFiles int
|
|
|
|
|
DeletedFiles int
|
|
|
|
|
NumChannels int
|
|
|
|
|
Joins int
|
|
|
|
|
Leaves int
|
|
|
|
|
ProcessingPostsMs int64
|
|
|
|
|
WriteExportResult
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
type ChannelMemberJoin struct {
|
|
|
|
|
UserId string
|
|
|
|
|
IsBot bool
|
|
|
|
|
Email string
|
|
|
|
|
Username string
|
|
|
|
|
Datetime int64
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
type ChannelMemberLeave struct {
|
|
|
|
|
UserId string
|
|
|
|
|
IsBot bool
|
|
|
|
|
Email string
|
|
|
|
|
Username string
|
|
|
|
|
Datetime int64
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
type ChannelMember struct {
|
|
|
|
|
UserId string
|
|
|
|
|
IsBot bool
|
|
|
|
|
Email string
|
|
|
|
|
Username string
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
type MetadataChannel struct {
|
|
|
|
|
TeamId *string
|
|
|
|
|
TeamName *string
|
|
|
|
|
TeamDisplayName *string
|
|
|
|
|
ChannelId string
|
|
|
|
|
ChannelName string
|
|
|
|
|
ChannelDisplayName string
|
|
|
|
|
ChannelType model.ChannelType
|
|
|
|
|
RoomId string
|
|
|
|
|
StartTime int64
|
|
|
|
|
EndTime int64
|
|
|
|
|
MessagesCount int
|
|
|
|
|
AttachmentsCount int
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
type Metadata struct {
|
|
|
|
|
Channels map[string]*MetadataChannel
|
|
|
|
|
MessagesCount int
|
|
|
|
|
AttachmentsCount int
|
|
|
|
|
StartTime int64
|
|
|
|
|
EndTime int64
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
func (metadata *Metadata) UpdateCounts(channelId string, numMessages int, numAttachments int) error {
|
|
|
|
|
_, ok := metadata.Channels[channelId]
|
|
|
|
|
if !ok {
|
|
|
|
|
return fmt.Errorf("could not find channelId for post in metadata.Channels")
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
metadata.Channels[channelId].AttachmentsCount += numAttachments
|
|
|
|
|
metadata.AttachmentsCount += numAttachments
|
|
|
|
|
metadata.Channels[channelId].MessagesCount += numMessages
|
|
|
|
|
metadata.MessagesCount += numMessages
|
|
|
|
|
|
|
|
|
|
return nil
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// GetInitialExportPeriodData calculates and caches the channel memberships, channel metadata, and the TotalPostsExpected.
|
|
|
|
|
func GetInitialExportPeriodData(rctx request.CTX, store MessageExportStore, data JobData, reportProgress func(string)) (JobData, error) {
|
|
|
|
|
// Counting all posts may fail or timeout when the posts table is large. If this happens, log a warning, but carry
|
|
|
|
|
// on with the job anyway. The only issue is that the progress % reporting will be inaccurate.
|
|
|
|
|
count, err := store.Post().AnalyticsPostCount(&model.PostCountOptions{ExcludeSystemPosts: true, SincePostID: data.JobStartId, SinceUpdateAt: data.ExportPeriodStartTime, UntilUpdateAt: data.JobEndTime})
|
|
|
|
|
if err != nil {
|
|
|
|
|
rctx.Logger().Warn("Worker: Failed to fetch total post count for job. An estimated value will be used for progress reporting.", mlog.Err(err))
|
|
|
|
|
data.TotalPostsExpected = EstimatedPostCount
|
|
|
|
|
} else {
|
|
|
|
|
data.TotalPostsExpected = int(count)
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
rctx.Logger().Info("Expecting to export total posts", mlog.Int("total_posts", data.TotalPostsExpected))
|
|
|
|
|
|
|
|
|
|
// Every time we claim the job, we need to gather the membership data that every batch will use.
|
|
|
|
|
// If we're here, then either this is the start of the job, or the job was stopped (e.g., the worker stopped)
|
|
|
|
|
// and we've claimed it again. Either way, we need to recalculate channel and member history data.
|
|
|
|
|
data.ChannelMetadata, data.ChannelMemberHistories, err = CalculateChannelExports(rctx,
|
|
|
|
|
ChannelExportsParams{
|
|
|
|
|
Store: store,
|
|
|
|
|
ExportPeriodStartTime: data.ExportPeriodStartTime,
|
|
|
|
|
ExportPeriodEndTime: data.JobEndTime,
|
|
|
|
|
ChannelBatchSize: data.ChannelBatchSize,
|
|
|
|
|
ChannelHistoryBatchSize: data.ChannelHistoryBatchSize,
|
|
|
|
|
ReportProgressMessage: reportProgress,
|
|
|
|
|
})
|
|
|
|
|
if err != nil {
|
|
|
|
|
return data, err
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
data.Cursor = model.MessageExportCursor{
|
|
|
|
|
LastPostUpdateAt: data.BatchStartTime,
|
|
|
|
|
LastPostId: data.BatchStartId,
|
|
|
|
|
UntilUpdateAt: data.JobEndTime,
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return data, nil
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
type ChannelExportsParams struct {
|
|
|
|
|
Store MessageExportStore
|
|
|
|
|
ExportPeriodStartTime int64
|
|
|
|
|
ExportPeriodEndTime int64
|
|
|
|
|
ChannelBatchSize int
|
|
|
|
|
ChannelHistoryBatchSize int
|
|
|
|
|
ReportProgressMessage func(message string)
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// CalculateChannelExports returns the channel info ( map[channelId]*MetadataChannel ) and the channel user
|
|
|
|
|
// joins/leaves ( map[channelId][]*model.ChannelMemberHistoryResult ) for any channel that has had activity
|
|
|
|
|
// (posts or user join/leaves) between ExportPeriodStartTime and ExportPeriodEndTime.
|
|
|
|
|
func CalculateChannelExports(rctx request.CTX, opt ChannelExportsParams) (map[string]*MetadataChannel, map[string][]*model.ChannelMemberHistoryResult, error) {
|
|
|
|
|
// Which channels had user activity in the export period?
|
|
|
|
|
activeChannelIds, err := opt.Store.ChannelMemberHistory().GetChannelsWithActivityDuring(opt.ExportPeriodStartTime, opt.ExportPeriodEndTime)
|
|
|
|
|
if err != nil {
|
|
|
|
|
return nil, nil, err
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if len(activeChannelIds) == 0 {
|
|
|
|
|
return nil, nil, nil
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
rctx.Logger().Debug("Started CalculateChannelExports", mlog.Int("export_period_start_time", opt.ExportPeriodStartTime), mlog.Int("export_period_end_time", opt.ExportPeriodEndTime), mlog.Int("num_active_channel_ids", len(activeChannelIds)))
|
|
|
|
|
message := rctx.T("ent.message_export.actiance_export.calculate_channel_exports.channel_message", model.StringMap{"NumChannels": strconv.Itoa(len(activeChannelIds))})
|
|
|
|
|
opt.ReportProgressMessage(message)
|
|
|
|
|
|
|
|
|
|
// For each channel, get its metadata.
|
|
|
|
|
channelMetadata := make(map[string]*MetadataChannel, len(activeChannelIds))
|
|
|
|
|
|
|
|
|
|
// Use batches to reduce db load and network waste.
|
|
|
|
|
for pos := 0; pos < len(activeChannelIds); pos += opt.ChannelBatchSize {
|
|
|
|
|
upTo := min(pos+opt.ChannelBatchSize, len(activeChannelIds))
|
|
|
|
|
batch := activeChannelIds[pos:upTo]
|
|
|
|
|
channels, err := opt.Store.Channel().GetMany(batch, true)
|
|
|
|
|
if err != nil {
|
|
|
|
|
return nil, nil, err
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
for _, channel := range channels {
|
|
|
|
|
channelMetadata[channel.Id] = &MetadataChannel{
|
|
|
|
|
TeamId: model.NewPointer(channel.TeamId),
|
|
|
|
|
ChannelId: channel.Id,
|
|
|
|
|
ChannelName: channel.Name,
|
|
|
|
|
ChannelDisplayName: channel.DisplayName,
|
|
|
|
|
ChannelType: channel.Type,
|
|
|
|
|
RoomId: fmt.Sprintf("%v - %v", ChannelTypeDisplayName(channel.Type), channel.Id),
|
|
|
|
|
StartTime: opt.ExportPeriodStartTime,
|
|
|
|
|
EndTime: opt.ExportPeriodEndTime,
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
historiesByChannelId := make(map[string][]*model.ChannelMemberHistoryResult, len(activeChannelIds))
|
|
|
|
|
|
|
|
|
|
var batchTimes []int64
|
|
|
|
|
|
|
|
|
|
// Now that we have metadata, get channelMemberHistories for each channel.
|
|
|
|
|
// Use batches to reduce total db load and network waste.
|
|
|
|
|
for pos := 0; pos < len(activeChannelIds); pos += opt.ChannelHistoryBatchSize {
|
|
|
|
|
// This may take a while, so update the system console UI.
|
|
|
|
|
message := rctx.T("ent.message_export.actiance_export.calculate_channel_exports.activity_message", model.StringMap{
|
|
|
|
|
"NumChannels": strconv.Itoa(len(activeChannelIds)),
|
|
|
|
|
"NumCompleted": strconv.Itoa(pos),
|
|
|
|
|
})
|
|
|
|
|
opt.ReportProgressMessage(message)
|
|
|
|
|
|
|
|
|
|
start := time.Now()
|
|
|
|
|
|
|
|
|
|
upTo := min(pos+opt.ChannelHistoryBatchSize, len(activeChannelIds))
|
|
|
|
|
batch := activeChannelIds[pos:upTo]
|
|
|
|
|
channelMemberHistories, err := opt.Store.ChannelMemberHistory().GetUsersInChannelDuring(opt.ExportPeriodStartTime, opt.ExportPeriodEndTime, batch)
|
|
|
|
|
if err != nil {
|
|
|
|
|
return nil, nil, err
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
batchTimes = append(batchTimes, time.Since(start).Milliseconds())
|
|
|
|
|
|
|
|
|
|
// collect the channelMemberHistories by channelId
|
|
|
|
|
for _, entry := range channelMemberHistories {
|
|
|
|
|
historiesByChannelId[entry.ChannelId] = append(historiesByChannelId[entry.ChannelId], entry)
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
rctx.Logger().Info("GetUsersInChannelDuring batch times", mlog.Array("batch_times", batchTimes))
|
|
|
|
|
|
|
|
|
|
return channelMetadata, historiesByChannelId, nil
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// ChannelHasActivity returns true if the channel (represented by the []*model.ChannelMemberHistoryResult slice)
|
|
|
|
|
// had user activity between startTime and endTime
|
|
|
|
|
func ChannelHasActivity(cmhs []*model.ChannelMemberHistoryResult, startTime int64, endTime int64) bool {
|
|
|
|
|
for _, cmh := range cmhs {
|
|
|
|
|
if (cmh.JoinTime >= startTime && cmh.JoinTime <= endTime) ||
|
|
|
|
|
(cmh.LeaveTime != nil && *cmh.LeaveTime >= startTime && *cmh.LeaveTime <= endTime) {
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
func GetJoinsAndLeavesForChannel(startTime int64, endTime int64, channelMembersHistory []*model.ChannelMemberHistoryResult,
|
|
|
|
|
postAuthors map[string]ChannelMember) ([]ChannelMemberJoin, []ChannelMemberLeave) {
|
|
|
|
|
var joins []ChannelMemberJoin
|
|
|
|
|
var leaves []ChannelMemberLeave
|
|
|
|
|
|
|
|
|
|
alreadyJoined := make(map[string]bool)
|
|
|
|
|
for _, cmh := range channelMembersHistory {
|
|
|
|
|
if cmh.UserDeleteAt > 0 && cmh.UserDeleteAt < startTime {
|
|
|
|
|
continue
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if cmh.JoinTime > endTime {
|
|
|
|
|
continue
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if cmh.LeaveTime != nil && *cmh.LeaveTime < startTime {
|
|
|
|
|
continue
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if cmh.JoinTime <= endTime {
|
|
|
|
|
joins = append(joins, ChannelMemberJoin{
|
|
|
|
|
UserId: cmh.UserId,
|
|
|
|
|
IsBot: cmh.IsBot,
|
|
|
|
|
Email: cmh.UserEmail,
|
|
|
|
|
Username: cmh.Username,
|
|
|
|
|
Datetime: cmh.JoinTime,
|
|
|
|
|
})
|
|
|
|
|
alreadyJoined[cmh.UserId] = true
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if cmh.LeaveTime != nil && *cmh.LeaveTime <= endTime {
|
|
|
|
|
leaves = append(leaves, ChannelMemberLeave{
|
|
|
|
|
UserId: cmh.UserId,
|
|
|
|
|
IsBot: cmh.IsBot,
|
|
|
|
|
Email: cmh.UserEmail,
|
|
|
|
|
Username: cmh.Username,
|
|
|
|
|
Datetime: *cmh.LeaveTime,
|
|
|
|
|
})
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
for _, member := range postAuthors {
|
|
|
|
|
if alreadyJoined[member.UserId] {
|
|
|
|
|
continue
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
joins = append(joins, ChannelMemberJoin{
|
|
|
|
|
UserId: member.UserId,
|
|
|
|
|
IsBot: member.IsBot,
|
|
|
|
|
Email: member.Email,
|
|
|
|
|
Username: member.Username,
|
|
|
|
|
Datetime: startTime,
|
|
|
|
|
})
|
|
|
|
|
}
|
|
|
|
|
return joins, leaves
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// GetPostAttachments if the post included any files, we need to add special elements to the export.
|
|
|
|
|
func GetPostAttachments(db MessageExportStore, post *model.MessageExport) ([]*model.FileInfo, error) {
|
|
|
|
|
if len(post.PostFileIds) == 0 {
|
|
|
|
|
return []*model.FileInfo{}, nil
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
attachments, err := db.FileInfo().GetForPost(*post.PostId, true, true, false)
|
|
|
|
|
if err != nil {
|
|
|
|
|
return nil, fmt.Errorf("failed to get file info for a post: %w", err)
|
|
|
|
|
}
|
|
|
|
|
return attachments, nil
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
func ChannelTypeDisplayName(channelType model.ChannelType) string {
|
|
|
|
|
return map[model.ChannelType]string{
|
|
|
|
|
model.ChannelTypeOpen: "public",
|
|
|
|
|
model.ChannelTypePrivate: "private",
|
|
|
|
|
model.ChannelTypeDirect: "direct",
|
|
|
|
|
model.ChannelTypeGroup: "group",
|
|
|
|
|
}[channelType]
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
func GetBatchPath(exportDir string, prevPostUpdateAt int64, lastPostUpdateAt int64, batchNumber int) string {
|
|
|
|
|
if exportDir == "" {
|
|
|
|
|
exportDir = path.Join(model.ComplianceExportPath, time.Now().Format(model.ComplianceExportDirectoryFormat))
|
|
|
|
|
}
|
|
|
|
|
return path.Join(exportDir,
|
|
|
|
|
fmt.Sprintf("batch%03d-%d-%d.zip", batchNumber, prevPostUpdateAt, lastPostUpdateAt))
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// GetExportBackend returns the file backend where the export will be created.
|
|
|
|
|
func GetExportBackend(rctx request.CTX, config *model.Config) (filestore.FileBackend, error) {
|
|
|
|
|
insecure := config.ServiceSettings.EnableInsecureOutgoingConnections
|
|
|
|
|
skipVerify := insecure != nil && *insecure
|
|
|
|
|
|
|
|
|
|
if config.FileSettings.DedicatedExportStore != nil && *config.FileSettings.DedicatedExportStore {
|
|
|
|
|
rctx.Logger().Debug("Worker: using dedicated export filestore", mlog.String("driver_name", *config.FileSettings.ExportDriverName))
|
|
|
|
|
backend, errFileBack := filestore.NewExportFileBackend(filestore.NewExportFileBackendSettingsFromConfig(&config.FileSettings, true, skipVerify))
|
|
|
|
|
if errFileBack != nil {
|
|
|
|
|
return nil, errFileBack
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return backend, nil
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
backend, err := filestore.NewFileBackend(filestore.NewFileBackendSettingsFromConfig(&config.FileSettings, true, skipVerify))
|
|
|
|
|
if err != nil {
|
|
|
|
|
return nil, err
|
|
|
|
|
}
|
|
|
|
|
return backend, nil
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// GetFileAttachmentBackend returns the file backend where file attachments are
|
|
|
|
|
// located for messages that will be exported. This may be the same backend
|
|
|
|
|
// where the export will be created.
|
|
|
|
|
func GetFileAttachmentBackend(rctx request.CTX, config *model.Config) (filestore.FileBackend, error) {
|
|
|
|
|
insecure := config.ServiceSettings.EnableInsecureOutgoingConnections
|
|
|
|
|
|
|
|
|
|
backend, err := filestore.NewFileBackend(filestore.NewFileBackendSettingsFromConfig(&config.FileSettings, true, insecure != nil && *insecure))
|
|
|
|
|
if err != nil {
|
|
|
|
|
return nil, err
|
|
|
|
|
}
|
|
|
|
|
return backend, nil
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
func IsDeletedMsg(post *model.MessageExport) bool {
|
|
|
|
|
if model.SafeDereference(post.PostDeleteAt) > 0 && post.PostProps != nil {
|
|
|
|
|
props := map[string]any{}
|
|
|
|
|
err := json.Unmarshal([]byte(*post.PostProps), &props)
|
|
|
|
|
if err != nil {
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if _, ok := props[model.PostPropsDeleteBy]; ok {
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return false
|
|
|
|
|
}
|