mattermost/server/enterprise/message_export/shared/shared.go

612 lines
20 KiB
Go
Raw Permalink Normal View History

MM-59966 - Compliance Export overhaul - feature branch (#29789) * [MM-59089] Add a compliance export constant (#27919) * add a useful constant * i18n * another constant * another i18n * [MM-60422] Add GetChannelsWithActivityDuring (#28301) * modify GetUsersInChannelDuring to accept a slice of channelIds * add GetChannelsWithActivityDuring * add compliance export progress message; remove unused custom status * linting * tests running too fast * add batch size config settings * add store tests * linting * empty commit * i18n changes * fix i18n ordering * MM-60570 - Server-side changes consolidating the export CLI with server/ent code (#28640) * add an i18n field; add the CLI's export directory * int64 -> int * Add UntilUpdateAt for MessageExport and AnalyticsPostCount to merge * remove now-unused i18n strings * add TranslationsPreInitFromBuffer to allow CLI to use i18n * use GetBuilder to simplify; rename TranslationsPreInitFromFileBytes * [MM-59089] Improve compliance export timings (#1733 - Enterprise repo) * MM-60422 - Performance and logic fixes for Compliance Exports (#1757 - Enterprise repo) * MM-60570 - Enterprise-side changes consolidating the export CLI with server/ent code (#1769 - Enterprise repo) * merge conflicts; missed file from ent branch * MM-61038 - Add an option to sqlstore.New (#28702) remove useless comment add test add an option to sqlstore.New * MM-60976: Remove RunExport command from Mattermost binary (#28805) * remove RunExport command from mattermost binary * remove the code it was calling * fix i18n * remove test (was only testing license, not functionality) * empty commit * fix flaky GetChannelsWithActivityDuring test * MM-60063: Dedicated Export Filestore fix, redo of #1772 (enterprise) (#28803) * redo filestore fix #1772 (enterprise repo) on top of MM-59966 feature * add new e2e tests for export filestore * golint * ok, note to self: shadowing bad, actually (when there's a defer) * empty commit * MM-61137 - Message export: Support 7.8.11 era dbs (#28824) * support 7.8.11 era dbs by wrapping the store using only what we need * fix flaky GetChannelsWithActivityDuring test * add a comment * only need to define the MEFileInfoStore (the one that'll be overridden) * blank commit * MM-60974 - Message Export: Add performance metrics (#28836) * support 7.8.11 era dbs by wrapping the store using only what we need * fix flaky GetChannelsWithActivityDuring test * add a comment * only need to define the MEFileInfoStore (the one that'll be overridden) * performance metrics * cleanup unneeded named returns * blank commit * MM-60975 - Message export: Add startTime and endTime to export folder name (#28840) * support 7.8.11 era dbs by wrapping the store using only what we need * fix flaky GetChannelsWithActivityDuring test * add a comment * only need to define the MEFileInfoStore (the one that'll be overridden) * performance metrics * output startTime and endTime in export folder * empty commit * merge conflict * MM-60978 - Message export: Improve xml fields; fix delete semantics (#28873) * support 7.8.11 era dbs by wrapping the store using only what we need * fix flaky GetChannelsWithActivityDuring test * add a comment * only need to define the MEFileInfoStore (the one that'll be overridden) * performance metrics * output startTime and endTime in export folder * empty commit * add xml fields, omit when empty, tests * fix delete semantics; test (and test for update semantics) * clarify comments * simplify edited post detection, now there's no edge case. * add some spacing to help fast running tests * merge conflicts/updates needed for new deleted post semantics * linting; fixing tests from upstream merge * use SafeDereference * linting * stronger typing; better wrapped errors; better formatting * blank commit * goimports formatting * fix merge mistake * minor fixes due to changes in master * MM-61755 - Simplifying and Support reporting to the db from the CLI (#29281) * finally clean up JobData struct and stringMap; prep for CLI using db * and now simplify using StringMapToJobDataWithZeroValues * remove unused fn * create JobDataExported; clean up errors * MM-60176 - Message Export: Global relay cleanup (#29168) * move global relay logic into global_relay_export * blank commit * blank commit * improve errors * MM-60693 - Refactor CSV to use same codepath as Actiance (#29191) * move global relay logic into global_relay_export * blank commit * refactor (and simplify) ExportParams into shared * blank commit * remove unused fn * csv now uses pre-calculated joins/leaves like actiance * improve errors * remove nil post check; remove ignoredPosts metric * remove unneeded copy * MM-61696 - Refactor GlobalRelay to use same codepath as Actiance (#29225) * move global relay logic into global_relay_export * blank commit * refactor (and simplify) ExportParams into shared * blank commit * remove unused fn * csv now uses pre-calculated joins/leaves like actiance * remove newly unneeded function and its test. goodbye. * refactor GetPostAttachments for csv + global relay to share * refactor global_relay_export and fix tests (no changes to output) * improve errors * remove nil post check; remove ignoredPosts metric * remove unneeded copy * remove unneeded nil check * PR comments * MM-61715 - Generalize e2e to all export types 🤖 (#29369) * move global relay logic into global_relay_export * blank commit * refactor (and simplify) ExportParams into shared * blank commit * remove unused fn * csv now uses pre-calculated joins/leaves like actiance * remove newly unneeded function and its test. goodbye. * refactor GetPostAttachments for csv + global relay to share * refactor global_relay_export and fix tests (no changes to output) * improve errors * remove nil post check; remove ignoredPosts metric * remove unneeded copy * remove unneeded nil check * PR comments * refactor isDeletedMsg for all export types * fix start and endtime, nasty csv createAt bug; bring closer to Actiance * align unit tests with new logic (e.g. starttime / endtime) * refactor a TimestampConvert fn for code + tests * bug: pass templates to global relay (hurray for e2e tests, otherwise...) * add global relay zip to allowed list (only for tests) * test helpers * new templates for e2e tests * e2e tests... phew. * linting * merge conflicts * unexport PostToRow; add test helper marker * cleanup, shortening, thanks to PR comments * MM-61972 - Generalize export data path - Actiance (#29399) * extract and generalize the export data generation functions * finish moving test (bc of previous extraction) * lift a function from common -> shared (to break an import cycle) * actiance now takes general export data, processes it into actiance data * bring tests in line with correct sorting rules (upadateAt, messageId) * fixups, PR comments * turn strings.Repeat into a more descriptive const amended: one letter fix; bad rebase * MM-62009 - e2e clock heisenbug (#29434) * consolidate assertions; output debuggable diffs (keeping for future) * refactor test output generator to generators file * waitUntilZeroPosts + pass through until to job = fix all clock issues * simplify messages to model.NewId(); remove unneeded waitUntilZeroPosts * model.NewId() -> storetest.NewTestID() * MM-61980 - Generalize export data path - CSV (#29482) * simple refactoring * increase sleep times for (very) rare test failures * add extra information to the generic export for CSV * adj Actiance to handle new generic export (no difference in its output) * no longer need mergePosts (yay), move getJoinLeavePosts for everyone * adjust tests for new csv semantics (detailed in summary) * and need to add the new exported data to the export_data_tests * rearrange csv writing to happen after data export (more logical) * linting * remove debug statements * figured out what was wrong with global relay e2e test 3; solid now * PR comments * MM-61718 - Generalize export data path - Global Relay (#29508) * move global relay over to using the generalized export data * performance pass -- not much can be done * Update server/enterprise/message_export/global_relay_export/global_relay_export.go Co-authored-by: Claudio Costa <cstcld91@gmail.com> --------- Co-authored-by: Claudio Costa <cstcld91@gmail.com> * MM-62058 - Align CSV with Actiance (#29551) * refactoring actiance files and var names for clarity * bug found in exported attachments (we used to miss some start/ends) * changes needed for actiance due to new generic exports * bringing CSV up to actiance standards * fixing global relay b/c of new semantics (adding a note on an edge case) * aligning e2e tests, adding comments to clarify what is expected/tested * necessary changes; 1 more test for added functionality (ignoreDeleted) * comment style * MM-62059 - Align Global Relay with Actiance/CSV; many fixes (#29665) * core logic changes to general export_data and the specific export paths * unit tests and e2e tests, covering all new edge cases and all logic * linting * better var naming, const value, and cleaning up functions calls * MM-62436 - Temporarily skip cypress tests that require download link (#29772) --------- Co-authored-by: Claudio Costa <cstcld91@gmail.com>
2025-01-10 16:56:02 -05:00
// Copyright (c) 2015-present Mattermost, Inc. All Rights Reserved.
// See LICENSE.enterprise for license information.
package shared
import (
"encoding/json"
"fmt"
"path"
"strconv"
"time"
"github.com/pkg/errors"
"github.com/mattermost/mattermost/server/v8/platform/shared/filestore"
"github.com/mattermost/mattermost/server/v8/platform/shared/templates"
"github.com/mattermost/mattermost/server/public/model"
"github.com/mattermost/mattermost/server/public/shared/mlog"
"github.com/mattermost/mattermost/server/public/shared/request"
)
const (
MissingFileMessageDuringBackendRead = "File backend read: File missing for post; cannot copy file to archive"
MissingFileMessageDuringCopy = "Copy buffer: File missing for post; cannot copy file to archive"
MM-59966 - Compliance Export overhaul - feature branch (#29789) * [MM-59089] Add a compliance export constant (#27919) * add a useful constant * i18n * another constant * another i18n * [MM-60422] Add GetChannelsWithActivityDuring (#28301) * modify GetUsersInChannelDuring to accept a slice of channelIds * add GetChannelsWithActivityDuring * add compliance export progress message; remove unused custom status * linting * tests running too fast * add batch size config settings * add store tests * linting * empty commit * i18n changes * fix i18n ordering * MM-60570 - Server-side changes consolidating the export CLI with server/ent code (#28640) * add an i18n field; add the CLI's export directory * int64 -> int * Add UntilUpdateAt for MessageExport and AnalyticsPostCount to merge * remove now-unused i18n strings * add TranslationsPreInitFromBuffer to allow CLI to use i18n * use GetBuilder to simplify; rename TranslationsPreInitFromFileBytes * [MM-59089] Improve compliance export timings (#1733 - Enterprise repo) * MM-60422 - Performance and logic fixes for Compliance Exports (#1757 - Enterprise repo) * MM-60570 - Enterprise-side changes consolidating the export CLI with server/ent code (#1769 - Enterprise repo) * merge conflicts; missed file from ent branch * MM-61038 - Add an option to sqlstore.New (#28702) remove useless comment add test add an option to sqlstore.New * MM-60976: Remove RunExport command from Mattermost binary (#28805) * remove RunExport command from mattermost binary * remove the code it was calling * fix i18n * remove test (was only testing license, not functionality) * empty commit * fix flaky GetChannelsWithActivityDuring test * MM-60063: Dedicated Export Filestore fix, redo of #1772 (enterprise) (#28803) * redo filestore fix #1772 (enterprise repo) on top of MM-59966 feature * add new e2e tests for export filestore * golint * ok, note to self: shadowing bad, actually (when there's a defer) * empty commit * MM-61137 - Message export: Support 7.8.11 era dbs (#28824) * support 7.8.11 era dbs by wrapping the store using only what we need * fix flaky GetChannelsWithActivityDuring test * add a comment * only need to define the MEFileInfoStore (the one that'll be overridden) * blank commit * MM-60974 - Message Export: Add performance metrics (#28836) * support 7.8.11 era dbs by wrapping the store using only what we need * fix flaky GetChannelsWithActivityDuring test * add a comment * only need to define the MEFileInfoStore (the one that'll be overridden) * performance metrics * cleanup unneeded named returns * blank commit * MM-60975 - Message export: Add startTime and endTime to export folder name (#28840) * support 7.8.11 era dbs by wrapping the store using only what we need * fix flaky GetChannelsWithActivityDuring test * add a comment * only need to define the MEFileInfoStore (the one that'll be overridden) * performance metrics * output startTime and endTime in export folder * empty commit * merge conflict * MM-60978 - Message export: Improve xml fields; fix delete semantics (#28873) * support 7.8.11 era dbs by wrapping the store using only what we need * fix flaky GetChannelsWithActivityDuring test * add a comment * only need to define the MEFileInfoStore (the one that'll be overridden) * performance metrics * output startTime and endTime in export folder * empty commit * add xml fields, omit when empty, tests * fix delete semantics; test (and test for update semantics) * clarify comments * simplify edited post detection, now there's no edge case. * add some spacing to help fast running tests * merge conflicts/updates needed for new deleted post semantics * linting; fixing tests from upstream merge * use SafeDereference * linting * stronger typing; better wrapped errors; better formatting * blank commit * goimports formatting * fix merge mistake * minor fixes due to changes in master * MM-61755 - Simplifying and Support reporting to the db from the CLI (#29281) * finally clean up JobData struct and stringMap; prep for CLI using db * and now simplify using StringMapToJobDataWithZeroValues * remove unused fn * create JobDataExported; clean up errors * MM-60176 - Message Export: Global relay cleanup (#29168) * move global relay logic into global_relay_export * blank commit * blank commit * improve errors * MM-60693 - Refactor CSV to use same codepath as Actiance (#29191) * move global relay logic into global_relay_export * blank commit * refactor (and simplify) ExportParams into shared * blank commit * remove unused fn * csv now uses pre-calculated joins/leaves like actiance * improve errors * remove nil post check; remove ignoredPosts metric * remove unneeded copy * MM-61696 - Refactor GlobalRelay to use same codepath as Actiance (#29225) * move global relay logic into global_relay_export * blank commit * refactor (and simplify) ExportParams into shared * blank commit * remove unused fn * csv now uses pre-calculated joins/leaves like actiance * remove newly unneeded function and its test. goodbye. * refactor GetPostAttachments for csv + global relay to share * refactor global_relay_export and fix tests (no changes to output) * improve errors * remove nil post check; remove ignoredPosts metric * remove unneeded copy * remove unneeded nil check * PR comments * MM-61715 - Generalize e2e to all export types 🤖 (#29369) * move global relay logic into global_relay_export * blank commit * refactor (and simplify) ExportParams into shared * blank commit * remove unused fn * csv now uses pre-calculated joins/leaves like actiance * remove newly unneeded function and its test. goodbye. * refactor GetPostAttachments for csv + global relay to share * refactor global_relay_export and fix tests (no changes to output) * improve errors * remove nil post check; remove ignoredPosts metric * remove unneeded copy * remove unneeded nil check * PR comments * refactor isDeletedMsg for all export types * fix start and endtime, nasty csv createAt bug; bring closer to Actiance * align unit tests with new logic (e.g. starttime / endtime) * refactor a TimestampConvert fn for code + tests * bug: pass templates to global relay (hurray for e2e tests, otherwise...) * add global relay zip to allowed list (only for tests) * test helpers * new templates for e2e tests * e2e tests... phew. * linting * merge conflicts * unexport PostToRow; add test helper marker * cleanup, shortening, thanks to PR comments * MM-61972 - Generalize export data path - Actiance (#29399) * extract and generalize the export data generation functions * finish moving test (bc of previous extraction) * lift a function from common -> shared (to break an import cycle) * actiance now takes general export data, processes it into actiance data * bring tests in line with correct sorting rules (upadateAt, messageId) * fixups, PR comments * turn strings.Repeat into a more descriptive const amended: one letter fix; bad rebase * MM-62009 - e2e clock heisenbug (#29434) * consolidate assertions; output debuggable diffs (keeping for future) * refactor test output generator to generators file * waitUntilZeroPosts + pass through until to job = fix all clock issues * simplify messages to model.NewId(); remove unneeded waitUntilZeroPosts * model.NewId() -> storetest.NewTestID() * MM-61980 - Generalize export data path - CSV (#29482) * simple refactoring * increase sleep times for (very) rare test failures * add extra information to the generic export for CSV * adj Actiance to handle new generic export (no difference in its output) * no longer need mergePosts (yay), move getJoinLeavePosts for everyone * adjust tests for new csv semantics (detailed in summary) * and need to add the new exported data to the export_data_tests * rearrange csv writing to happen after data export (more logical) * linting * remove debug statements * figured out what was wrong with global relay e2e test 3; solid now * PR comments * MM-61718 - Generalize export data path - Global Relay (#29508) * move global relay over to using the generalized export data * performance pass -- not much can be done * Update server/enterprise/message_export/global_relay_export/global_relay_export.go Co-authored-by: Claudio Costa <cstcld91@gmail.com> --------- Co-authored-by: Claudio Costa <cstcld91@gmail.com> * MM-62058 - Align CSV with Actiance (#29551) * refactoring actiance files and var names for clarity * bug found in exported attachments (we used to miss some start/ends) * changes needed for actiance due to new generic exports * bringing CSV up to actiance standards * fixing global relay b/c of new semantics (adding a note on an edge case) * aligning e2e tests, adding comments to clarify what is expected/tested * necessary changes; 1 more test for added functionality (ignoreDeleted) * comment style * MM-62059 - Align Global Relay with Actiance/CSV; many fixes (#29665) * core logic changes to general export_data and the specific export paths * unit tests and e2e tests, covering all new edge cases and all logic * linting * better var naming, const value, and cleaning up functions calls * MM-62436 - Temporarily skip cypress tests that require download link (#29772) --------- Co-authored-by: Claudio Costa <cstcld91@gmail.com>
2025-01-10 16:56:02 -05:00
EstimatedPostCount = 10_000_000
// JobDataBatchStartTime is the posts.updateat value from the previous batch. Posts are selected using
// keyset pagination sorted by (posts.updateat, posts.id).
JobDataBatchStartTime = "batch_start_time"
// JobDataJobStartTime is the start of the job (doesn't change across batches)
JobDataJobStartTime = "job_start_time"
// JobDataBatchStartId is the posts.id value from the previous batch.
JobDataBatchStartId = "batch_start_id"
// JobDataJobEndTime is the point up to which this job is exporting. It is the time the job was started,
// i.e., we export everything from the end of previous batch to the moment this batch started.
JobDataJobEndTime = "job_end_time"
JobDataJobStartId = "job_start_id"
JobDataExportType = "export_type"
[MM-63557] mmctl: Add compliance export create cmd (#30594) * Refactor job retrieval to support multiple statuses & multiple types - Updated job retrieval functions to handle multiple job statuses. - Renamed `GetJobsByTypeAndStatus` to `GetJobsByTypesAndStatuses` for consistency across the codebase. - Adjusted related function signatures and implementations in the job store and retry layer to accommodate the new method. - Updated tests to reflect changes in job retrieval logic and ensure proper functionality. * Add compliance export create command and tests - Introduced `ComplianceExportCreateCmd` to facilitate the creation of compliance export jobs with options for date, start, and end timestamps. - Added unit tests for the new command, covering various scenarios including valid and invalid inputs. - Updated documentation to include usage examples and options for the new command. - Enhanced existing tests to ensure proper functionality of compliance export job handling. * update docs * update tests for new logic * Refactor message export job tests to use DefaultPreviousJobPageSize - Updated all test cases in worker_test.go to replace hardcoded page size of 100 with DefaultPreviousJobPageSize for consistency. - Adjusted the worker.go file to define DefaultPreviousJobPageSize and use it in job retrieval logic. - Ensured that the changes maintain the functionality of job data initialization and retrieval tests. * PR comments * PR comments, simplifications, clarifications, formatting * prefer hypen over underscore in command names * merge conflict * update mmctl docs
2025-06-24 17:38:30 -04:00
JobDataInitiatedBy = "initiated_by"
MM-59966 - Compliance Export overhaul - feature branch (#29789) * [MM-59089] Add a compliance export constant (#27919) * add a useful constant * i18n * another constant * another i18n * [MM-60422] Add GetChannelsWithActivityDuring (#28301) * modify GetUsersInChannelDuring to accept a slice of channelIds * add GetChannelsWithActivityDuring * add compliance export progress message; remove unused custom status * linting * tests running too fast * add batch size config settings * add store tests * linting * empty commit * i18n changes * fix i18n ordering * MM-60570 - Server-side changes consolidating the export CLI with server/ent code (#28640) * add an i18n field; add the CLI's export directory * int64 -> int * Add UntilUpdateAt for MessageExport and AnalyticsPostCount to merge * remove now-unused i18n strings * add TranslationsPreInitFromBuffer to allow CLI to use i18n * use GetBuilder to simplify; rename TranslationsPreInitFromFileBytes * [MM-59089] Improve compliance export timings (#1733 - Enterprise repo) * MM-60422 - Performance and logic fixes for Compliance Exports (#1757 - Enterprise repo) * MM-60570 - Enterprise-side changes consolidating the export CLI with server/ent code (#1769 - Enterprise repo) * merge conflicts; missed file from ent branch * MM-61038 - Add an option to sqlstore.New (#28702) remove useless comment add test add an option to sqlstore.New * MM-60976: Remove RunExport command from Mattermost binary (#28805) * remove RunExport command from mattermost binary * remove the code it was calling * fix i18n * remove test (was only testing license, not functionality) * empty commit * fix flaky GetChannelsWithActivityDuring test * MM-60063: Dedicated Export Filestore fix, redo of #1772 (enterprise) (#28803) * redo filestore fix #1772 (enterprise repo) on top of MM-59966 feature * add new e2e tests for export filestore * golint * ok, note to self: shadowing bad, actually (when there's a defer) * empty commit * MM-61137 - Message export: Support 7.8.11 era dbs (#28824) * support 7.8.11 era dbs by wrapping the store using only what we need * fix flaky GetChannelsWithActivityDuring test * add a comment * only need to define the MEFileInfoStore (the one that'll be overridden) * blank commit * MM-60974 - Message Export: Add performance metrics (#28836) * support 7.8.11 era dbs by wrapping the store using only what we need * fix flaky GetChannelsWithActivityDuring test * add a comment * only need to define the MEFileInfoStore (the one that'll be overridden) * performance metrics * cleanup unneeded named returns * blank commit * MM-60975 - Message export: Add startTime and endTime to export folder name (#28840) * support 7.8.11 era dbs by wrapping the store using only what we need * fix flaky GetChannelsWithActivityDuring test * add a comment * only need to define the MEFileInfoStore (the one that'll be overridden) * performance metrics * output startTime and endTime in export folder * empty commit * merge conflict * MM-60978 - Message export: Improve xml fields; fix delete semantics (#28873) * support 7.8.11 era dbs by wrapping the store using only what we need * fix flaky GetChannelsWithActivityDuring test * add a comment * only need to define the MEFileInfoStore (the one that'll be overridden) * performance metrics * output startTime and endTime in export folder * empty commit * add xml fields, omit when empty, tests * fix delete semantics; test (and test for update semantics) * clarify comments * simplify edited post detection, now there's no edge case. * add some spacing to help fast running tests * merge conflicts/updates needed for new deleted post semantics * linting; fixing tests from upstream merge * use SafeDereference * linting * stronger typing; better wrapped errors; better formatting * blank commit * goimports formatting * fix merge mistake * minor fixes due to changes in master * MM-61755 - Simplifying and Support reporting to the db from the CLI (#29281) * finally clean up JobData struct and stringMap; prep for CLI using db * and now simplify using StringMapToJobDataWithZeroValues * remove unused fn * create JobDataExported; clean up errors * MM-60176 - Message Export: Global relay cleanup (#29168) * move global relay logic into global_relay_export * blank commit * blank commit * improve errors * MM-60693 - Refactor CSV to use same codepath as Actiance (#29191) * move global relay logic into global_relay_export * blank commit * refactor (and simplify) ExportParams into shared * blank commit * remove unused fn * csv now uses pre-calculated joins/leaves like actiance * improve errors * remove nil post check; remove ignoredPosts metric * remove unneeded copy * MM-61696 - Refactor GlobalRelay to use same codepath as Actiance (#29225) * move global relay logic into global_relay_export * blank commit * refactor (and simplify) ExportParams into shared * blank commit * remove unused fn * csv now uses pre-calculated joins/leaves like actiance * remove newly unneeded function and its test. goodbye. * refactor GetPostAttachments for csv + global relay to share * refactor global_relay_export and fix tests (no changes to output) * improve errors * remove nil post check; remove ignoredPosts metric * remove unneeded copy * remove unneeded nil check * PR comments * MM-61715 - Generalize e2e to all export types 🤖 (#29369) * move global relay logic into global_relay_export * blank commit * refactor (and simplify) ExportParams into shared * blank commit * remove unused fn * csv now uses pre-calculated joins/leaves like actiance * remove newly unneeded function and its test. goodbye. * refactor GetPostAttachments for csv + global relay to share * refactor global_relay_export and fix tests (no changes to output) * improve errors * remove nil post check; remove ignoredPosts metric * remove unneeded copy * remove unneeded nil check * PR comments * refactor isDeletedMsg for all export types * fix start and endtime, nasty csv createAt bug; bring closer to Actiance * align unit tests with new logic (e.g. starttime / endtime) * refactor a TimestampConvert fn for code + tests * bug: pass templates to global relay (hurray for e2e tests, otherwise...) * add global relay zip to allowed list (only for tests) * test helpers * new templates for e2e tests * e2e tests... phew. * linting * merge conflicts * unexport PostToRow; add test helper marker * cleanup, shortening, thanks to PR comments * MM-61972 - Generalize export data path - Actiance (#29399) * extract and generalize the export data generation functions * finish moving test (bc of previous extraction) * lift a function from common -> shared (to break an import cycle) * actiance now takes general export data, processes it into actiance data * bring tests in line with correct sorting rules (upadateAt, messageId) * fixups, PR comments * turn strings.Repeat into a more descriptive const amended: one letter fix; bad rebase * MM-62009 - e2e clock heisenbug (#29434) * consolidate assertions; output debuggable diffs (keeping for future) * refactor test output generator to generators file * waitUntilZeroPosts + pass through until to job = fix all clock issues * simplify messages to model.NewId(); remove unneeded waitUntilZeroPosts * model.NewId() -> storetest.NewTestID() * MM-61980 - Generalize export data path - CSV (#29482) * simple refactoring * increase sleep times for (very) rare test failures * add extra information to the generic export for CSV * adj Actiance to handle new generic export (no difference in its output) * no longer need mergePosts (yay), move getJoinLeavePosts for everyone * adjust tests for new csv semantics (detailed in summary) * and need to add the new exported data to the export_data_tests * rearrange csv writing to happen after data export (more logical) * linting * remove debug statements * figured out what was wrong with global relay e2e test 3; solid now * PR comments * MM-61718 - Generalize export data path - Global Relay (#29508) * move global relay over to using the generalized export data * performance pass -- not much can be done * Update server/enterprise/message_export/global_relay_export/global_relay_export.go Co-authored-by: Claudio Costa <cstcld91@gmail.com> --------- Co-authored-by: Claudio Costa <cstcld91@gmail.com> * MM-62058 - Align CSV with Actiance (#29551) * refactoring actiance files and var names for clarity * bug found in exported attachments (we used to miss some start/ends) * changes needed for actiance due to new generic exports * bringing CSV up to actiance standards * fixing global relay b/c of new semantics (adding a note on an edge case) * aligning e2e tests, adding comments to clarify what is expected/tested * necessary changes; 1 more test for added functionality (ignoreDeleted) * comment style * MM-62059 - Align Global Relay with Actiance/CSV; many fixes (#29665) * core logic changes to general export_data and the specific export paths * unit tests and e2e tests, covering all new edge cases and all logic * linting * better var naming, const value, and cleaning up functions calls * MM-62436 - Temporarily skip cypress tests that require download link (#29772) --------- Co-authored-by: Claudio Costa <cstcld91@gmail.com>
2025-01-10 16:56:02 -05:00
JobDataBatchSize = "batch_size"
JobDataChannelBatchSize = "channel_batch_size"
JobDataChannelHistoryBatchSize = "channel_history_batch_size"
JobDataMessagesExported = "messages_exported"
JobDataWarningCount = "warning_count"
JobDataIsDownloadable = "is_downloadable"
JobDataExportDir = "export_dir"
JobDataBatchNumber = "job_batch_number"
JobDataTotalPostsExpected = "total_posts_expected"
)
type PostUpdatedType string
const (
EditedOriginalMsg PostUpdatedType = "EditedOriginalMsg"
EditedNewMsg PostUpdatedType = "EditedNewMsg"
UpdatedNoMsgChange PostUpdatedType = "UpdatedNoMsgChange"
Deleted PostUpdatedType = "Deleted"
FileDeleted PostUpdatedType = "FileDeleted"
)
// JobData keeps the current state of the job.
// When used by a worker, all fields in JobDataExported are exported to the job's job.Data prop bag.
type JobData struct {
JobDataExported
ExportPeriodStartTime int64
// This section is the current state of the export
ChannelMetadata map[string]*MetadataChannel
ChannelMemberHistories map[string][]*model.ChannelMemberHistoryResult
Cursor model.MessageExportCursor
PostsToExport []*model.MessageExport
BatchEndTime int64
BatchPath string
MessageExportMs []int64
ProcessingPostsMs []int64
ProcessingXmlMs []int64
TransferringFilesMs []int64
TransferringZipMs []int64
TotalBatchMs []int64
Finished bool
}
type JobDataExported struct {
ExportType string
ExportDir string
BatchStartTime int64
BatchStartId string
JobStartTime int64
JobEndTime int64
JobStartId string
BatchSize int
ChannelBatchSize int
ChannelHistoryBatchSize int
BatchNumber int
TotalPostsExpected int
MessagesExported int
WarningCount int
IsDownloadable bool
}
func JobDataToStringMap(jd JobData) map[string]string {
ret := make(map[string]string)
ret[JobDataExportType] = jd.ExportType
ret[JobDataExportDir] = jd.ExportDir
ret[JobDataBatchStartTime] = strconv.FormatInt(jd.BatchStartTime, 10)
ret[JobDataBatchStartId] = jd.BatchStartId
ret[JobDataJobStartTime] = strconv.FormatInt(jd.JobStartTime, 10)
ret[JobDataJobEndTime] = strconv.FormatInt(jd.JobEndTime, 10)
ret[JobDataJobStartId] = jd.JobStartId
ret[JobDataBatchSize] = strconv.Itoa(jd.BatchSize)
ret[JobDataChannelBatchSize] = strconv.Itoa(jd.ChannelBatchSize)
ret[JobDataChannelHistoryBatchSize] = strconv.Itoa(jd.ChannelHistoryBatchSize)
ret[JobDataBatchNumber] = strconv.Itoa(jd.BatchNumber)
ret[JobDataTotalPostsExpected] = strconv.Itoa(jd.TotalPostsExpected)
ret[JobDataMessagesExported] = strconv.Itoa(jd.MessagesExported)
ret[JobDataWarningCount] = strconv.Itoa(jd.WarningCount)
ret[JobDataIsDownloadable] = strconv.FormatBool(jd.IsDownloadable)
return ret
}
func StringMapToJobDataWithZeroValues(sm map[string]string) (JobData, error) {
var jd JobData
var err error
jd.ExportType = sm[JobDataExportType]
jd.ExportDir = sm[JobDataExportDir]
batchStartTime, ok := sm[JobDataBatchStartTime]
if !ok {
batchStartTime = "0"
}
if jd.BatchStartTime, err = strconv.ParseInt(batchStartTime, 10, 64); err != nil {
return jd, errors.Wrap(err, "error converting JobDataBatchStartTime")
}
jd.BatchStartId = sm[JobDataBatchStartId]
jobStartTime, ok := sm[JobDataJobStartTime]
if !ok {
jobStartTime = "0"
}
if jd.JobStartTime, err = strconv.ParseInt(jobStartTime, 10, 64); err != nil {
return jd, errors.Wrap(err, "error converting JobDataJobStartTime")
}
jobEndTime, ok := sm[JobDataJobEndTime]
if !ok {
jobEndTime = "0"
}
if jd.JobEndTime, err = strconv.ParseInt(jobEndTime, 10, 64); err != nil {
return jd, errors.Wrap(err, "error converting JobDataJobEndTime")
}
jd.JobStartId = sm[JobDataJobStartId]
jobBatchSize, ok := sm[JobDataBatchSize]
if !ok {
jobBatchSize = "0"
}
if jd.BatchSize, err = strconv.Atoi(jobBatchSize); err != nil {
return jd, errors.Wrap(err, "error converting JobDataBatchSize")
}
channelBatchSize, ok := sm[JobDataChannelBatchSize]
if !ok {
channelBatchSize = "0"
}
if jd.ChannelBatchSize, err = strconv.Atoi(channelBatchSize); err != nil {
return jd, errors.Wrap(err, "error converting JobDataChannelBatchSize")
}
channelHistoryBatchSize, ok := sm[JobDataChannelHistoryBatchSize]
if !ok {
channelHistoryBatchSize = "0"
}
if jd.ChannelHistoryBatchSize, err = strconv.Atoi(channelHistoryBatchSize); err != nil {
return jd, errors.Wrap(err, "error converting JobDataChannelHistoryBatchSize")
}
batchNumber, ok := sm[JobDataBatchNumber]
if !ok {
batchNumber = "0"
}
if jd.BatchNumber, err = strconv.Atoi(batchNumber); err != nil {
return jd, errors.Wrap(err, "error converting JobDataBatchNumber")
}
totalPostsExpected, ok := sm[JobDataTotalPostsExpected]
if !ok {
totalPostsExpected = "0"
}
if jd.TotalPostsExpected, err = strconv.Atoi(totalPostsExpected); err != nil {
return jd, errors.Wrap(err, "error converting JobDataTotalPostsExpected")
}
messagesExported, ok := sm[JobDataMessagesExported]
if !ok {
messagesExported = "0"
}
if jd.MessagesExported, err = strconv.Atoi(messagesExported); err != nil {
return jd, errors.Wrap(err, "error converting JobDataMessagesExported")
}
warningCount, ok := sm[JobDataWarningCount]
if !ok {
warningCount = "0"
}
if jd.WarningCount, err = strconv.Atoi(warningCount); err != nil {
return jd, errors.Wrap(err, "error converting JobDataWarningCount")
}
isDownloadable, ok := sm[JobDataIsDownloadable]
if !ok {
isDownloadable = "0"
}
if jd.IsDownloadable, err = strconv.ParseBool(isDownloadable); err != nil {
return jd, errors.Wrap(err, "error converting JobDataIsDownloadable")
}
return jd, nil
}
type BackendParams struct {
Config *model.Config
Store MessageExportStore
FileAttachmentBackend filestore.FileBackend
ExportBackend filestore.FileBackend
HtmlTemplates *templates.Container
}
type ExportParams struct {
ExportType string
ChannelMetadata map[string]*MetadataChannel
Posts []*model.MessageExport
ChannelMemberHistories map[string][]*model.ChannelMemberHistoryResult
JobStartTime int64
BatchPath string
BatchStartTime int64
BatchEndTime int64
Config *model.Config
Db MessageExportStore
FileAttachmentBackend filestore.FileBackend
ExportBackend filestore.FileBackend
Templates *templates.Container
}
type WriteExportResult struct {
TransferringFilesMs int64
ProcessingXmlMs int64
TransferringZipMs int64
NumWarnings int
}
type RunExportResults struct {
CreatedPosts int
EditedOrigMsgPosts int
EditedNewMsgPosts int
UpdatedPosts int
DeletedPosts int
UploadedFiles int
DeletedFiles int
NumChannels int
Joins int
Leaves int
ProcessingPostsMs int64
WriteExportResult
}
type ChannelMemberJoin struct {
UserId string
IsBot bool
Email string
Username string
Datetime int64
}
type ChannelMemberLeave struct {
UserId string
IsBot bool
Email string
Username string
Datetime int64
}
type ChannelMember struct {
UserId string
IsBot bool
Email string
Username string
}
type MetadataChannel struct {
TeamId *string
TeamName *string
TeamDisplayName *string
ChannelId string
ChannelName string
ChannelDisplayName string
ChannelType model.ChannelType
RoomId string
StartTime int64
EndTime int64
MessagesCount int
AttachmentsCount int
}
type Metadata struct {
Channels map[string]*MetadataChannel
MessagesCount int
AttachmentsCount int
StartTime int64
EndTime int64
}
func (metadata *Metadata) UpdateCounts(channelId string, numMessages int, numAttachments int) error {
_, ok := metadata.Channels[channelId]
if !ok {
return fmt.Errorf("could not find channelId for post in metadata.Channels")
}
metadata.Channels[channelId].AttachmentsCount += numAttachments
metadata.AttachmentsCount += numAttachments
metadata.Channels[channelId].MessagesCount += numMessages
metadata.MessagesCount += numMessages
return nil
}
// GetInitialExportPeriodData calculates and caches the channel memberships, channel metadata, and the TotalPostsExpected.
func GetInitialExportPeriodData(rctx request.CTX, store MessageExportStore, data JobData, reportProgress func(string)) (JobData, error) {
// Counting all posts may fail or timeout when the posts table is large. If this happens, log a warning, but carry
// on with the job anyway. The only issue is that the progress % reporting will be inaccurate.
count, err := store.Post().AnalyticsPostCount(&model.PostCountOptions{ExcludeSystemPosts: true, SincePostID: data.JobStartId, SinceUpdateAt: data.ExportPeriodStartTime, UntilUpdateAt: data.JobEndTime})
if err != nil {
rctx.Logger().Warn("Worker: Failed to fetch total post count for job. An estimated value will be used for progress reporting.", mlog.Err(err))
data.TotalPostsExpected = EstimatedPostCount
} else {
data.TotalPostsExpected = int(count)
}
rctx.Logger().Info("Expecting to export total posts", mlog.Int("total_posts", data.TotalPostsExpected))
// Every time we claim the job, we need to gather the membership data that every batch will use.
// If we're here, then either this is the start of the job, or the job was stopped (e.g., the worker stopped)
// and we've claimed it again. Either way, we need to recalculate channel and member history data.
data.ChannelMetadata, data.ChannelMemberHistories, err = CalculateChannelExports(rctx,
ChannelExportsParams{
Store: store,
ExportPeriodStartTime: data.ExportPeriodStartTime,
ExportPeriodEndTime: data.JobEndTime,
ChannelBatchSize: data.ChannelBatchSize,
ChannelHistoryBatchSize: data.ChannelHistoryBatchSize,
ReportProgressMessage: reportProgress,
})
if err != nil {
return data, err
}
data.Cursor = model.MessageExportCursor{
LastPostUpdateAt: data.BatchStartTime,
LastPostId: data.BatchStartId,
UntilUpdateAt: data.JobEndTime,
}
return data, nil
}
type ChannelExportsParams struct {
Store MessageExportStore
ExportPeriodStartTime int64
ExportPeriodEndTime int64
ChannelBatchSize int
ChannelHistoryBatchSize int
ReportProgressMessage func(message string)
}
// CalculateChannelExports returns the channel info ( map[channelId]*MetadataChannel ) and the channel user
// joins/leaves ( map[channelId][]*model.ChannelMemberHistoryResult ) for any channel that has had activity
// (posts or user join/leaves) between ExportPeriodStartTime and ExportPeriodEndTime.
func CalculateChannelExports(rctx request.CTX, opt ChannelExportsParams) (map[string]*MetadataChannel, map[string][]*model.ChannelMemberHistoryResult, error) {
// Which channels had user activity in the export period?
activeChannelIds, err := opt.Store.ChannelMemberHistory().GetChannelsWithActivityDuring(opt.ExportPeriodStartTime, opt.ExportPeriodEndTime)
if err != nil {
return nil, nil, err
}
if len(activeChannelIds) == 0 {
return nil, nil, nil
}
rctx.Logger().Debug("Started CalculateChannelExports", mlog.Int("export_period_start_time", opt.ExportPeriodStartTime), mlog.Int("export_period_end_time", opt.ExportPeriodEndTime), mlog.Int("num_active_channel_ids", len(activeChannelIds)))
message := rctx.T("ent.message_export.actiance_export.calculate_channel_exports.channel_message", model.StringMap{"NumChannels": strconv.Itoa(len(activeChannelIds))})
opt.ReportProgressMessage(message)
// For each channel, get its metadata.
channelMetadata := make(map[string]*MetadataChannel, len(activeChannelIds))
// Use batches to reduce db load and network waste.
for pos := 0; pos < len(activeChannelIds); pos += opt.ChannelBatchSize {
upTo := min(pos+opt.ChannelBatchSize, len(activeChannelIds))
batch := activeChannelIds[pos:upTo]
channels, err := opt.Store.Channel().GetMany(batch, true)
if err != nil {
return nil, nil, err
}
for _, channel := range channels {
channelMetadata[channel.Id] = &MetadataChannel{
TeamId: model.NewPointer(channel.TeamId),
ChannelId: channel.Id,
ChannelName: channel.Name,
ChannelDisplayName: channel.DisplayName,
ChannelType: channel.Type,
RoomId: fmt.Sprintf("%v - %v", ChannelTypeDisplayName(channel.Type), channel.Id),
StartTime: opt.ExportPeriodStartTime,
EndTime: opt.ExportPeriodEndTime,
}
}
}
historiesByChannelId := make(map[string][]*model.ChannelMemberHistoryResult, len(activeChannelIds))
var batchTimes []int64
// Now that we have metadata, get channelMemberHistories for each channel.
// Use batches to reduce total db load and network waste.
for pos := 0; pos < len(activeChannelIds); pos += opt.ChannelHistoryBatchSize {
// This may take a while, so update the system console UI.
message := rctx.T("ent.message_export.actiance_export.calculate_channel_exports.activity_message", model.StringMap{
"NumChannels": strconv.Itoa(len(activeChannelIds)),
"NumCompleted": strconv.Itoa(pos),
})
opt.ReportProgressMessage(message)
start := time.Now()
upTo := min(pos+opt.ChannelHistoryBatchSize, len(activeChannelIds))
batch := activeChannelIds[pos:upTo]
channelMemberHistories, err := opt.Store.ChannelMemberHistory().GetUsersInChannelDuring(opt.ExportPeriodStartTime, opt.ExportPeriodEndTime, batch)
if err != nil {
return nil, nil, err
}
batchTimes = append(batchTimes, time.Since(start).Milliseconds())
// collect the channelMemberHistories by channelId
for _, entry := range channelMemberHistories {
historiesByChannelId[entry.ChannelId] = append(historiesByChannelId[entry.ChannelId], entry)
}
}
rctx.Logger().Info("GetUsersInChannelDuring batch times", mlog.Array("batch_times", batchTimes))
return channelMetadata, historiesByChannelId, nil
}
// ChannelHasActivity returns true if the channel (represented by the []*model.ChannelMemberHistoryResult slice)
// had user activity between startTime and endTime
func ChannelHasActivity(cmhs []*model.ChannelMemberHistoryResult, startTime int64, endTime int64) bool {
for _, cmh := range cmhs {
if (cmh.JoinTime >= startTime && cmh.JoinTime <= endTime) ||
(cmh.LeaveTime != nil && *cmh.LeaveTime >= startTime && *cmh.LeaveTime <= endTime) {
return true
}
}
return false
}
func GetJoinsAndLeavesForChannel(startTime int64, endTime int64, channelMembersHistory []*model.ChannelMemberHistoryResult,
postAuthors map[string]ChannelMember) ([]ChannelMemberJoin, []ChannelMemberLeave) {
var joins []ChannelMemberJoin
var leaves []ChannelMemberLeave
alreadyJoined := make(map[string]bool)
for _, cmh := range channelMembersHistory {
if cmh.UserDeleteAt > 0 && cmh.UserDeleteAt < startTime {
continue
}
if cmh.JoinTime > endTime {
continue
}
if cmh.LeaveTime != nil && *cmh.LeaveTime < startTime {
continue
}
if cmh.JoinTime <= endTime {
joins = append(joins, ChannelMemberJoin{
UserId: cmh.UserId,
IsBot: cmh.IsBot,
Email: cmh.UserEmail,
Username: cmh.Username,
Datetime: cmh.JoinTime,
})
alreadyJoined[cmh.UserId] = true
}
if cmh.LeaveTime != nil && *cmh.LeaveTime <= endTime {
leaves = append(leaves, ChannelMemberLeave{
UserId: cmh.UserId,
IsBot: cmh.IsBot,
Email: cmh.UserEmail,
Username: cmh.Username,
Datetime: *cmh.LeaveTime,
})
}
}
for _, member := range postAuthors {
if alreadyJoined[member.UserId] {
continue
}
joins = append(joins, ChannelMemberJoin{
UserId: member.UserId,
IsBot: member.IsBot,
Email: member.Email,
Username: member.Username,
Datetime: startTime,
})
}
return joins, leaves
}
// GetPostAttachments if the post included any files, we need to add special elements to the export.
func GetPostAttachments(db MessageExportStore, post *model.MessageExport) ([]*model.FileInfo, error) {
if len(post.PostFileIds) == 0 {
return []*model.FileInfo{}, nil
}
attachments, err := db.FileInfo().GetForPost(*post.PostId, true, true, false)
if err != nil {
return nil, fmt.Errorf("failed to get file info for a post: %w", err)
}
return attachments, nil
}
func ChannelTypeDisplayName(channelType model.ChannelType) string {
return map[model.ChannelType]string{
model.ChannelTypeOpen: "public",
model.ChannelTypePrivate: "private",
model.ChannelTypeDirect: "direct",
model.ChannelTypeGroup: "group",
}[channelType]
}
func GetBatchPath(exportDir string, prevPostUpdateAt int64, lastPostUpdateAt int64, batchNumber int) string {
if exportDir == "" {
exportDir = path.Join(model.ComplianceExportPath, time.Now().Format(model.ComplianceExportDirectoryFormat))
}
return path.Join(exportDir,
fmt.Sprintf("batch%03d-%d-%d.zip", batchNumber, prevPostUpdateAt, lastPostUpdateAt))
}
// GetExportBackend returns the file backend where the export will be created.
func GetExportBackend(rctx request.CTX, config *model.Config) (filestore.FileBackend, error) {
insecure := config.ServiceSettings.EnableInsecureOutgoingConnections
skipVerify := insecure != nil && *insecure
if config.FileSettings.DedicatedExportStore != nil && *config.FileSettings.DedicatedExportStore {
rctx.Logger().Debug("Worker: using dedicated export filestore", mlog.String("driver_name", *config.FileSettings.ExportDriverName))
backend, errFileBack := filestore.NewExportFileBackend(filestore.NewExportFileBackendSettingsFromConfig(&config.FileSettings, true, skipVerify))
if errFileBack != nil {
return nil, errFileBack
}
return backend, nil
}
backend, err := filestore.NewFileBackend(filestore.NewFileBackendSettingsFromConfig(&config.FileSettings, true, skipVerify))
if err != nil {
return nil, err
}
return backend, nil
}
// GetFileAttachmentBackend returns the file backend where file attachments are
// located for messages that will be exported. This may be the same backend
// where the export will be created.
func GetFileAttachmentBackend(rctx request.CTX, config *model.Config) (filestore.FileBackend, error) {
insecure := config.ServiceSettings.EnableInsecureOutgoingConnections
backend, err := filestore.NewFileBackend(filestore.NewFileBackendSettingsFromConfig(&config.FileSettings, true, insecure != nil && *insecure))
if err != nil {
return nil, err
}
return backend, nil
}
func IsDeletedMsg(post *model.MessageExport) bool {
if model.SafeDereference(post.PostDeleteAt) > 0 && post.PostProps != nil {
props := map[string]any{}
err := json.Unmarshal([]byte(*post.PostProps), &props)
if err != nil {
return false
}
if _, ok := props[model.PostPropsDeleteBy]; ok {
return true
}
}
return false
}