diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..5c0f323
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,5 @@
+__pycache__/
+*/__pycache__/
+**/__pycache__/
+*.pyc
+conversion_tools/logs/
diff --git a/README.md b/README.md
index ad819d0..88c71d1 100644
--- a/README.md
+++ b/README.md
@@ -48,6 +48,8 @@ This dataset is a collection of anonymized customer sessions containing products
   - Yelp-full: This is a combination dataset including four versions of yelp datasets mentioned above, where the duplicates are dropped and the number of total reviews is 28,908,240.
 - [Tmall](https://tianchi.aliyun.com/dataset/dataDetail?dataId=53):
   This dataset is provided by Ant Financial Services, using in the IJCAI16 contest.
+- [Tmall2014](https://tianchi.aliyun.com/dataset/140281):
+  This is a large-scale e-commerce dataset from Tmall.com containing user behavior logs from 2013. The dataset includes multiple types of user-item interactions: clicks, add-to-cart, favorites (collect), and purchases (alipay).
 - [DIGINETICA](https://competitions.codalab.org/competitions/11161):
   The dataset includes user sessions extracted from an e-commerce search engine logs, with anonymized user ids,
   hashed queries, hashed query terms, hashed product descriptions and meta-data, log-scaled prices, clicks, and purchases.
@@ -204,23 +206,24 @@ These datasets contain measurements of clothing fit from [RentTheRunway](https:/
 | 17 | [Ta Feng](https://github.com/RUCAIBox/RecommenderSystems-Datasets/tree/master/dataset_info/TaFeng)           | 32,266    | 23,812    | 817,741      | 99\.89%  | Click                      | √         | √            | √            | √                   |
 | 18 | [Foursquare](https://github.com/RUCAIBox/RecommenderSystems-Datasets/tree/master/dataset_info/Foursquare)      | \-        | \-        | \-           | \-       | Check-in                   | √         |              | √            |                     |
 | 19 | [Tmall](https://github.com/RUCAIBox/RecommenderSystems-Datasets/tree/master/dataset_info/Tmall)             | 963,923 | 2,353,207 | 44,528,127 | 99.99% | Click/Buy                  | √         |              |              | √                   |
-| 20 | [YOOCHOOSE](https://github.com/RUCAIBox/RecommenderSystems-Datasets/tree/master/dataset_info/YOOCHOOSE)         | 9,249,729  | 52,739    | 34,154,697    | 99.99%   | Click/Buy                  | √         |              |              | √                   |
-| 21 | [Retailrocket](https://github.com/RUCAIBox/RecommenderSystems-Datasets/tree/master/dataset_info/Retailrocket)      | 1,407,580  | 247,085   | 2,756,101     | 99.99%   | View/Addtocart/Transaction | √         |              |              |                     |
-| 22 | [LFM-1b](https://github.com/RUCAIBox/RecommenderSystems-Datasets/tree/master/dataset_info/LFM-1b)            | 120,322   | 3,123,496 | 1,088,161,692   | 99\.71%  | Click                      | √         | √            | √            | √                   |
-| 23 | [MIND](https://github.com/RUCAIBox/RecommenderSystems-Datasets/tree/master/dataset_info/MIND) | - | - | - | - | Click | √ | |  |  |
-| 24 | BeerAdvocate      | 33,388     | 66,055 | 1,586,614    | 99\.9281% | Rating<br/> \[0,5\] | √ | | √ |  |
-| 25 | Behance                                      | 63,497     | 178,788 | 1,000,000   | 99\.9912% | Likes | √ | | √ |  |
-| 26 | DianPing                                     | 542,706    | 243,247 | 4,422,473   | 99\.9967% | Rating<br/> \[0,5\] | √ | | √ | √ |
-| 27 | EndoMondo                                     | 1,104      | 253,020  | 253,020  | 99\.9094% | Workout Logs | √ | √ |  | √ |
-| 28 | Food                                        | 226,570    | 231,637   | 1,132,367 | 99\.9978% | Rating<br/> \[0,5\] | √ |  | √ |  |
-| 29 | GoodReads                               | 876,145    | 2,360,650    | 228,648,342 | 99\.9889% | Rating<br/> \[0,5\] | √ |  | √ |  |
-| 30 | [KGRec](https://github.com/RUCAIBox/RecommenderSystems-Datasets/tree/master/dataset_info/KGRec) | - | - | - | - | Click | | | √ |  |
-| 31 | ModCloth                                      | 47,958     | 1,378      | 82,790 | 99\.8747% | Rating<br/> \[0,5\] | | √ | √ | √ |
-| 32 | RateBeer                                  | 29,265     | 110,369    | 2,924,163  | 99\.9095% | Overall Rating<br/> \[0,20\] | √ | | √ | √ |
-| 33 | RentTheRunway                              | 105,571    | 5,850      | 192,544   | 99\.9688% | Rating<br/> \[0,10\] | √ | √ | √ | √ |
-| 34 | [Twitch](https://github.com/RUCAIBox/RecommenderSystems-Datasets/tree/master/dataset_info/Twitch)  | 15,524,309 | 6,161,666 | 474,676,929  | 99\.9995% | Click | | |  | √ |
-| 35 | Amazon_M2 | 3,606,349 | 1,410,675 | 15,306,183  | \- | Click | | | √ | √ |
-| 36 | Music4All-Onion  | 119,140 | 109,269 | 252,984,396 | \- | Click | √ | | √ | √ |
+| 20 | [Tmall2014](dataset_info/Tmall2014)       | ~1,500,000 | ~8,000,000 | ~22,400,000 (click) | 99.99% | Click/Cart/Collect/Alipay | √         |              |              |                     |
+| 21 | [YOOCHOOSE](https://github.com/RUCAIBox/RecommenderSystems-Datasets/tree/master/dataset_info/YOOCHOOSE)         | 9,249,729  | 52,739    | 34,154,697    | 99.99%   | Click/Buy                  | √         |              |              | √                   |
+| 22 | [Retailrocket](https://github.com/RUCAIBox/RecommenderSystems-Datasets/tree/master/dataset_info/Retailrocket)      | 1,407,580  | 247,085   | 2,756,101     | 99.99%   | View/Addtocart/Transaction | √         |              |              |                     |
+| 23 | [LFM-1b](https://github.com/RUCAIBox/RecommenderSystems-Datasets/tree/master/dataset_info/LFM-1b)            | 120,322   | 3,123,496 | 1,088,161,692   | 99\.71%  | Click                      | √         | √            | √            | √                   |
+| 24 | [MIND](https://github.com/RUCAIBox/RecommenderSystems-Datasets/tree/master/dataset_info/MIND) | - | - | - | - | Click | √ | |  |  |
+| 25 | BeerAdvocate      | 33,388     | 66,055 | 1,586,614    | 99\.9281% | Rating<br/> \[0,5\] | √ | | √ |  |
+| 26 | Behance                                      | 63,497     | 178,788 | 1,000,000   | 99\.9912% | Likes | √ | | √ |  |
+| 27 | DianPing                                     | 542,706    | 243,247 | 4,422,473   | 99\.9967% | Rating<br/> \[0,5\] | √ | | √ | √ |
+| 28 | EndoMondo                                     | 1,104      | 253,020  | 253,020  | 99\.9094% | Workout Logs | √ | √ |  | √ |
+| 29 | Food                                        | 226,570    | 231,637   | 1,132,367 | 99\.9978% | Rating<br/> \[0,5\] | √ |  | √ |  |
+| 30 | GoodReads                               | 876,145    | 2,360,650    | 228,648,342 | 99\.9889% | Rating<br/> \[0,5\] | √ |  | √ |  |
+| 31 | [KGRec](https://github.com/RUCAIBox/RecommenderSystems-Datasets/tree/master/dataset_info/KGRec) | - | - | - | - | Click | | | √ |  |
+| 32 | ModCloth                                      | 47,958     | 1,378      | 82,790 | 99\.8747% | Rating<br/> \[0,5\] | | √ | √ | √ |
+| 33 | RateBeer                                  | 29,265     | 110,369    | 2,924,163  | 99\.9095% | Overall Rating<br/> \[0,20\] | √ | | √ | √ |
+| 34 | RentTheRunway                              | 105,571    | 5,850      | 192,544   | 99\.9688% | Rating<br/> \[0,10\] | √ | √ | √ | √ |
+| 35 | [Twitch](https://github.com/RUCAIBox/RecommenderSystems-Datasets/tree/master/dataset_info/Twitch)  | 15,524,309 | 6,161,666 | 474,676,929  | 99\.9995% | Click | | |  | √ |
+| 36 | Amazon_M2 | 3,606,349 | 1,410,675 | 15,306,183  | \- | Click | | | √ | √ |
+| 37 | Music4All-Onion  | 119,140 | 109,269 | 252,984,396 | \- | Click | √ | | √ | √ |
 
 ### CTR Datasets
 
diff --git a/conversion_tools/README.md b/conversion_tools/README.md
index fa9de21..d2d9ef9 100644
--- a/conversion_tools/README.md
+++ b/conversion_tools/README.md
@@ -22,11 +22,12 @@
 | 17 | Ta Feng        |[Link](https://github.com/RUCAIBox/RecDatasets/blob/master/conversion_tools/usage/TaFeng.md)|
 | 18 | Foursquare     |[Link](https://github.com/RUCAIBox/RecDatasets/blob/master/conversion_tools/usage/Foursquare.md)|
 | 19 | Tmall          |[Link](https://github.com/RUCAIBox/RecDatasets/blob/master/conversion_tools/usage/Tmall.md)|
-| 20 | YOOCHOOSE      |[Link](https://github.com/RUCAIBox/RecDatasets/blob/master/conversion_tools/usage/YOOCHOOSE.md)|
-| 21 | Retailrocket   |[Link](https://github.com/RUCAIBox/RecDatasets/blob/master/conversion_tools/usage/Retailrocket.md)|
-| 22 | LFM\-1b        |[Link](https://github.com/RUCAIBox/RecDatasets/blob/master/conversion_tools/usage/LFM-1b.md)|
-| 23 | MIND           |[Link](https://github.com/RUCAIBox/RecDatasets/blob/master/conversion_tools/usage/MIND.md)|
-| 24 | Music4All_Onion         |[Link](https://github.com/RUCAIBox/RecSysDatasets/blob/master/conversion_tools/usage/Onion.md)|
+| 20 | Tmall2014      |[Link](usage/Tmall2014.md)|
+| 21 | YOOCHOOSE      |[Link](https://github.com/RUCAIBox/RecDatasets/blob/master/conversion_tools/usage/YOOCHOOSE.md)|
+| 22 | Retailrocket   |[Link](https://github.com/RUCAIBox/RecDatasets/blob/master/conversion_tools/usage/Retailrocket.md)|
+| 23 | LFM\-1b        |[Link](https://github.com/RUCAIBox/RecDatasets/blob/master/conversion_tools/usage/LFM-1b.md)|
+| 24 | MIND           |[Link](https://github.com/RUCAIBox/RecDatasets/blob/master/conversion_tools/usage/MIND.md)|
+| 25 | Music4All_Onion         |[Link](https://github.com/RUCAIBox/RecSysDatasets/blob/master/conversion_tools/usage/Onion.md)|
 
 
 ### CTR Datasets
diff --git a/conversion_tools/run.py b/conversion_tools/run.py
index 3d0c40b..cc36750 100644
--- a/conversion_tools/run.py
+++ b/conversion_tools/run.py
@@ -5,8 +5,11 @@
 
 import argparse
 import importlib
+import time
+from datetime import datetime
 
 from src.utils import dataset2class, click_dataset, multiple_dataset, multiple_item_features
+from src.logger import logger, format_to_str_box
 
 
 if __name__ == '__main__':
@@ -29,21 +32,89 @@
     assert args.input_path is not None, 'input_path can not be None, please specify the input_path'
     assert args.output_path is not None, 'output_path can not be None, please specify the output_path'
 
+    # 构建配置信息
+    config_info = {
+        "数据集类型": args.dataset,
+        "输入路径": args.input_path,
+        "输出路径": args.output_path,
+    }
+    
+    if args.interaction_type:
+        config_info["交互类型"] = args.interaction_type
+    
+    config_info["去重模式"] = "已启用" if args.duplicate_removal else "未启用"
+    config_info["开始时间"] = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
+    
+    # 使用 logger 输出配置信息
+    logger.info("=" * 80)
+    logger.info("📊 数据集转换工具启动")
+    logger.info(format_to_str_box(config_info))
+    logger.info("=" * 80)
+
+    start_time = time.time()
+
     input_args = [args.input_path, args.output_path]
     dataset_class_name = dataset2class[args.dataset.lower()]
     dataset_class = getattr(importlib.import_module('src.extended_dataset'), dataset_class_name)
     if dataset_class_name in multiple_dataset:
-        input_args.append(args.interaction_type)
+        # 只有当interaction_type不为None时才添加，否则传入'all'表示处理所有行为类型
+        if args.interaction_type is not None:
+            input_args.append(args.interaction_type)
+        else:
+            input_args.append('all')
     if dataset_class_name in click_dataset:
         input_args.append(args.duplicate_removal)
     if dataset_class_name in multiple_item_features:
         input_args.append(args.item_feature_name)
+    
+    logger.info(f"🔧 初始化数据集类: {dataset_class_name}")
     datasets = dataset_class(*input_args)
+    logger.info("✅ 数据集类初始化完成")
 
     if args.convert_inter:
+        logger.info("")
+        logger.info("=" * 80)
+        logger.info("🚀 开始转换交互数据 (Inter Data)")
+        logger.info("=" * 80)
         datasets.convert_inter()
+        logger.info("=" * 80)
+        logger.info("✅ 交互数据转换完成")
+        logger.info("=" * 80)
+        
     if args.convert_item:
+        logger.info("")
+        logger.info("=" * 80)
+        logger.info("🚀 开始转换物品特征 (Item Features)")
+        logger.info("=" * 80)
         datasets.convert_item()
+        logger.info("=" * 80)
+        logger.info("✅ 物品特征转换完成")
+        logger.info("=" * 80)
 
     if args.convert_user:
+        logger.info("")
+        logger.info("=" * 80)
+        logger.info("🚀 开始转换用户特征 (User Features)")
+        logger.info("=" * 80)
         datasets.convert_user()
+        logger.info("=" * 80)
+        logger.info("✅ 用户特征转换完成")
+        logger.info("=" * 80)
+
+    # 计算总耗时
+    end_time = time.time()
+    elapsed_time = end_time - start_time
+    
+    # 构建完成信息
+    completion_info = {
+        "状态": "所有任务完成",
+        "总耗时": f"{elapsed_time:.2f} 秒 ({elapsed_time/60:.2f} 分钟)",
+        "结束时间": datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
+        "输出目录": args.output_path
+    }
+    
+    logger.info("")
+    logger.info("=" * 80)
+    logger.info("🎉 转换任务执行完毕")
+    logger.info(format_to_str_box(completion_info))
+    logger.info("=" * 80)
diff --git a/conversion_tools/src/extended_dataset.py b/conversion_tools/src/extended_dataset.py
index 2fdabd5..4965fac 100644
--- a/conversion_tools/src/extended_dataset.py
+++ b/conversion_tools/src/extended_dataset.py
@@ -526,6 +526,176 @@ def merge_duplicate(self, inter_table):
         return inter_dict
 
 
+class TMALL2014Dataset(BaseDataset):
+    def __init__(self, input_path, output_path, interaction_type, duplicate_removal):
+        super(TMALL2014Dataset, self).__init__(input_path, output_path)
+        self.dataset_name = 'tmall2014'
+        self.interaction_type = interaction_type
+        self.duplicate_removal = duplicate_removal
+
+        # output file path (align with TMALLDataset style)
+        if self.interaction_type == 'all':
+            # 合并所有行为类型的情况
+            self.dataset_name = self.dataset_name + '-merged'
+        else:
+            # 单个行为类型的情况
+            self.dataset_name = self.dataset_name + '-' + self.interaction_type
+        
+        self.output_path = os.path.join(self.output_path, self.dataset_name)
+        self.check_output_path()
+        self.output_inter_file = os.path.join(self.output_path, self.dataset_name + '.inter')
+
+        # input file
+        # 直接使用传入的文件路径（可为绝对或相对路径）
+        self.inter_file = self.input_path
+
+        self.sep = ','
+
+        # selected feature fields - 根据是否合并所有行为类型来定义字段
+        if self.interaction_type == 'all':
+            # 合并模式：包含行为类型字段
+            if self.duplicate_removal:
+                self.inter_fields = {
+                    0: 'user_id:token',
+                    1: 'item_id:token',
+                    2: 'timestamp:float',
+                    3: 'action_type:token',
+                    4: 'interactions:float'
+                }
+            else:
+                self.inter_fields = {
+                    0: 'user_id:token',
+                    1: 'item_id:token',
+                    2: 'timestamp:float',
+                    3: 'action_type:token'
+                }
+        else:
+            # 单个行为类型模式：不包含行为类型字段
+            if self.duplicate_removal:
+                self.inter_fields = {
+                    0: 'user_id:token',
+                    1: 'item_id:token',
+                    2: 'timestamp:float',
+                    3: 'interactions:float'
+                }
+            else:
+                self.inter_fields = {
+                    0: 'user_id:token',
+                    1: 'item_id:token',
+                    2: 'timestamp:float'
+                }
+
+    def load_inter_data_streaming(self):
+        """流式读取数据，边读边yield，不占用大量内存
+        
+        原始格式（\x01分隔）：
+        item_id\x01user_id\x01action\x01timestamp
+        示例: 3903192\x01u6276408\x01click\x012013-08-26 10:41:11
+        """
+        import os
+        from datetime import datetime
+        
+        with open(self.inter_file, 'r') as fin:
+            file_size = os.path.getsize(self.inter_file)
+            
+            # 使用更快的进度条更新（行数而非字节）
+            processed_bytes = 0
+            update_interval = 10000  # 每10000行更新一次进度
+            line_count = 0
+            
+            with tqdm(total=file_size, unit='B', unit_scale=True) as pbar:
+                for line in fin:
+                    line_count += 1
+                    line_bytes = len(line)
+                    processed_bytes += line_bytes
+                    
+                    # 减少进度条更新频率
+                    if line_count % update_interval == 0:
+                        pbar.update(processed_bytes)
+                        processed_bytes = 0
+                    
+                    line = line.strip()
+                    if not line:
+                        continue
+                    
+                    try:
+                        # 使用 \x01 作为分隔符
+                        fields = line.split('\x01')
+                        if len(fields) != 4:
+                            continue
+                        
+                        item_id, user_id, action, vtime = fields
+                        
+                        # 根据模式过滤交互类型
+                        if self.interaction_type == 'all':
+                            # 合并模式：包含所有4种行为类型
+                            if action in ['click', 'cart', 'collect', 'alipay']:
+                                dt = datetime.strptime(vtime, '%Y-%m-%d %H:%M:%S')
+                                ts = int(dt.timestamp())
+                                yield [user_id, item_id, str(ts), action]
+                        else:
+                            # 单个行为类型模式：只包含指定类型
+                            if action == self.interaction_type:
+                                dt = datetime.strptime(vtime, '%Y-%m-%d %H:%M:%S')
+                                ts = int(dt.timestamp())
+                                yield [user_id, item_id, str(ts)]
+                    except Exception:
+                        continue
+                
+                # 更新剩余进度
+                if processed_bytes > 0:
+                    pbar.update(processed_bytes)
+
+    def convert_inter(self):
+        try:
+            with open(self.output_inter_file, 'w', buffering=1024*1024) as fp:  # 1MB 缓冲
+                fp.write('\t'.join([self.inter_fields[i] for i in range(len(self.inter_fields))]) + '\n')
+                
+                if self.duplicate_removal:
+                    inter_dict = {}
+                    for line in self.load_inter_data_streaming():
+                        key = tuple(line[:-1])
+                        t = line[-1]
+                        if key in inter_dict:
+                            inter_dict[key][0] = t
+                            inter_dict[key][1] += 1
+                        else:
+                            inter_dict[key] = [t, 1]
+                    
+                    for k, v in tqdm(inter_dict.items()):
+                        fp.write('\t'.join([str(item) for item in list(k) + v]) + '\n')
+                else:
+                    # 批量写入优化
+                    buffer = []
+                    buffer_size = 10000
+                    
+                    for line in self.load_inter_data_streaming():
+                        buffer.append('\t'.join(line))
+                        if len(buffer) >= buffer_size:
+                            fp.write('\n'.join(buffer) + '\n')
+                            buffer.clear()
+                    
+                    # 写入剩余数据
+                    if buffer:
+                        fp.write('\n'.join(buffer) + '\n')
+                        
+        except NotImplementedError:
+            print('This dataset can\'t be converted to inter file\n')
+        except Exception as e:
+            print(f'TMALL2014Dataset convert_inter error: {e}')
+
+    def merge_duplicate(self, inter_table):
+        inter_dict = {}
+        for line in inter_table:
+            key = tuple(line[:-1])
+            t = line[-1]
+            if key in inter_dict:
+                inter_dict[key][0] = t
+                inter_dict[key][1] += 1
+            else:
+                inter_dict[key] = [t, 1]
+        return inter_dict
+
 class NETFLIXDataset(BaseDataset):
     def __init__(self, input_path, output_path):
         super(NETFLIXDataset, self).__init__(input_path, output_path)
@@ -5314,3 +5484,162 @@ def convert_inter(self):
                 fout.write('\t'.join([current_list[0], item, rating, timestamp]) + '\n')
         fin.close()
         fout.close()
+
+
+class TaobaoDataset(BaseDataset):
+    def __init__(self, input_path, output_path, interaction_type, duplicate_removal):
+        super(TaobaoDataset, self).__init__(input_path, output_path)
+        self.dataset_name = 'taobao'
+        self.interaction_type = interaction_type
+        self.duplicate_removal = duplicate_removal
+
+        # 验证交互类型
+        valid_types = ['pv', 'cart', 'fav', 'buy', 'all']
+        assert self.interaction_type in valid_types, f'interaction_type must be in {valid_types}'
+
+        # 输出文件路径设置 - 与Rec_Tmall保持一致的结构
+        if self.interaction_type == 'all':
+            # 合并所有行为类型的情况
+            self.dataset_name = self.dataset_name + '-merged'
+        else:
+            # 单个行为类型的情况
+            self.dataset_name = self.dataset_name + '-' + self.interaction_type
+        
+        # 创建Rec_Taobao/processed/taobao-{type}/结构
+        self.output_path = os.path.join(self.output_path, 'Rec_Taobao', 'processed', self.dataset_name)
+        self.check_output_path()
+        self.output_inter_file = os.path.join(self.output_path, self.dataset_name + '.inter')
+
+        # 输入文件
+        self.inter_file = self.input_path
+        self.sep = ','
+
+        # 根据是否合并所有行为类型来定义字段
+        if self.interaction_type == 'all':
+            # 合并模式：包含行为类型字段
+            if self.duplicate_removal:
+                self.inter_fields = {
+                    0: 'user_id:token',
+                    1: 'item_id:token',
+                    2: 'timestamp:float',
+                    3: 'action_type:token',
+                    4: 'interactions:float'
+                }
+            else:
+                self.inter_fields = {
+                    0: 'user_id:token',
+                    1: 'item_id:token',
+                    2: 'timestamp:float',
+                    3: 'action_type:token'
+                }
+        else:
+            # 单个行为类型模式：不包含行为类型字段
+            if self.duplicate_removal:
+                self.inter_fields = {
+                    0: 'user_id:token',
+                    1: 'item_id:token',
+                    2: 'timestamp:float',
+                    3: 'interactions:float'
+                }
+            else:
+                self.inter_fields = {
+                    0: 'user_id:token',
+                    1: 'item_id:token',
+                    2: 'timestamp:float'
+                }
+
+    def load_inter_data_streaming(self):
+        """流式读取数据，边读边yield，不占用大量内存
+        
+        原始格式（CSV，逗号分隔）：
+        user_id,item_id,category_id,behavior_type,timestamp
+        示例: 1,2268318,2520377,pv,1511544070
+        """
+        import os
+        from datetime import datetime
+        
+        with open(self.inter_file, 'r') as fin:
+            file_size = os.path.getsize(self.inter_file)
+            
+            # 跳过标题行
+            next(fin)
+            
+            # 使用更快的进度条更新（行数而非字节）
+            processed_bytes = 0
+            update_interval = 10000  # 每10000行更新一次进度
+            line_count = 0
+            
+            with tqdm(total=file_size, unit='B', unit_scale=True) as pbar:
+                for line in fin:
+                    line_count += 1
+                    line_bytes = len(line)
+                    processed_bytes += line_bytes
+                    
+                    # 减少进度条更新频率
+                    if line_count % update_interval == 0:
+                        pbar.update(processed_bytes)
+                        processed_bytes = 0
+                    
+                    line = line.strip()
+                    if not line:
+                        continue
+                    
+                    try:
+                        # 使用逗号作为分隔符
+                        fields = line.split(',')
+                        if len(fields) != 5:
+                            continue
+                        
+                        user_id, item_id, category_id, behavior_type, timestamp = fields
+                        
+                        # 根据模式过滤交互类型
+                        if self.interaction_type == 'all':
+                            # 合并模式：包含所有4种行为类型
+                            if behavior_type in ['pv', 'cart', 'fav', 'buy']:
+                                yield [user_id, item_id, timestamp, behavior_type]
+                        else:
+                            # 单个行为类型模式：只包含指定类型
+                            if behavior_type == self.interaction_type:
+                                yield [user_id, item_id, timestamp]
+                    except Exception:
+                        continue
+                
+                # 更新剩余进度
+                if processed_bytes > 0:
+                    pbar.update(processed_bytes)
+
+    def convert_inter(self):
+        try:
+            with open(self.output_inter_file, 'w', buffering=1024*1024) as fp:  # 1MB 缓冲
+                fp.write('\t'.join([self.inter_fields[i] for i in range(len(self.inter_fields))]) + '\n')
+                
+                if self.duplicate_removal:
+                    inter_dict = {}
+                    for line in self.load_inter_data_streaming():
+                        key = tuple(line[:-1])
+                        t = line[-1]
+                        if key in inter_dict:
+                            inter_dict[key][0] = t
+                            inter_dict[key][1] += 1
+                        else:
+                            inter_dict[key] = [t, 1]
+                    
+                    for k, v in tqdm(inter_dict.items()):
+                        fp.write('\t'.join([str(item) for item in list(k) + v]) + '\n')
+                else:
+                    # 批量写入优化
+                    buffer = []
+                    buffer_size = 10000
+                    
+                    for line in self.load_inter_data_streaming():
+                        buffer.append('\t'.join(line))
+                        if len(buffer) >= buffer_size:
+                            fp.write('\n'.join(buffer) + '\n')
+                            buffer.clear()
+                    
+                    # 写入剩余数据
+                    if buffer:
+                        fp.write('\n'.join(buffer) + '\n')
+                        
+        except NotImplementedError:
+            print('This dataset can\'t be converted to inter file\n')
diff --git a/conversion_tools/src/logger.py b/conversion_tools/src/logger.py
new file mode 100644
index 0000000..9c487be
--- /dev/null
+++ b/conversion_tools/src/logger.py
@@ -0,0 +1,273 @@
+"""
+一个全面的日志模块，提供彩色控制台输出和轮转文件日志功能.
+
+该模块实现了单例模式的日志记录器，支持控制台和文件日志记录，
+当日志文件达到10MB大小时会自动进行轮转.
+"""
+
+import logging
+import sys
+from logging.handlers import RotatingFileHandler
+from pathlib import Path
+from typing import Optional, Union
+
+from colorama import Fore, Style, init  # type: ignore
+
+# 创建日志格式
+LOG_FORMAT = "%(asctime)s [%(levelname)s] [%(module)s.%(funcName)s] - %(message)s"
+# 包含 完整文件路径 和 行号
+# LOG_FORMAT = "%(asctime)s [%(levelname)s] [%(pathname)s:%(lineno)d] - %(message)s"
+# 包含 文件名 和 行号
+# LOG_FORMAT = "%(asctime)s [%(levelname)s] [%(filename)s:%(lineno)d] - %(message)s"
+
+BERTOPIC_LOG_FORMAT = "%(asctime)s [%(levelname)s] [BERTopic] - %(message)s"
+DATE_FORMAT = "%Y-%m-%d %H:%M:%S"
+
+# 初始化 colorama
+init(autoreset=True)
+
+# 定义日志级别对应的颜色
+LOG_COLORS = {
+    "DEBUG": Fore.CYAN,
+    "INFO": Fore.GREEN,
+    "WARNING": Fore.YELLOW,
+    "ERROR": Fore.RED,
+    "CRITICAL": Fore.RED + Style.BRIGHT,
+}
+
+# 日志文件配置
+MAX_LOG_SIZE = 10 * 1024 * 1024  # 10MB
+BACKUP_COUNT = 5  # 保留5个备份文件
+
+
+class ColoredFormatter(logging.Formatter):
+    """自定义格式化器，为日志级别添加颜色."""
+
+    def format(self, record):
+        """格式化日志记录，为日志级别添加颜色."""
+        # 获取原始日志消息
+        message = super().format(record)
+
+        # 为日志级别添加颜色
+        level_color = LOG_COLORS.get(record.levelname, "")
+        if level_color:
+            # 只对日志级别关键字进行着色
+            level_name = record.levelname
+            colored_level = f"{level_color}{level_name}{Style.RESET_ALL}"
+            message = message.replace(level_name, colored_level)
+
+        return message
+
+
+def setup_bertopic_logger(log_dir: Path):
+    """
+    配置 BERTopic 的日志记录器.
+
+    Args:
+        log_dir (Path): 日志目录路径
+        bertopic_formatter (ColoredFormatter): BERTopic 专用的格式化器.
+    """
+    # 创建 BERTopic 专用格式化器
+    bertopic_formatter = ColoredFormatter(BERTOPIC_LOG_FORMAT, datefmt=DATE_FORMAT)
+
+    # 配置 BERTopic 的日志记录器
+    bertopic_logger = logging.getLogger("BERTopic")
+    bertopic_logger.setLevel(logging.INFO)
+    bertopic_logger.propagate = False  # 禁止传播到根记录器
+
+    # 移除现有的处理器（如果有的话）
+    for handler in bertopic_logger.handlers[:]:
+        bertopic_logger.removeHandler(handler)
+
+    # 为 BERTopic 创建专用的处理器
+    bertopic_console_handler = logging.StreamHandler(sys.stdout)
+    bertopic_console_handler.setFormatter(bertopic_formatter)
+    bertopic_logger.addHandler(bertopic_console_handler)
+
+    # 使用轮转文件处理器
+    bertopic_file_handler = RotatingFileHandler(
+        log_dir / "pipeline.log",
+        maxBytes=MAX_LOG_SIZE,
+        backupCount=BACKUP_COUNT,
+        encoding="utf-8",
+    )
+    bertopic_file_handler.setFormatter(bertopic_formatter)
+    bertopic_logger.addHandler(bertopic_file_handler)
+
+
+class Logger:
+    """单例模式实现日志记录器."""
+
+    _instance: Optional["Logger"] = None
+    _initialized: bool = False
+
+    def __new__(cls):
+        """创建单例实例."""
+        if cls._instance is None:
+            cls._instance = super().__new__(cls)
+        return cls._instance
+
+    def __init__(self):
+        """初始化日志记录器."""
+        if self._initialized:
+            return
+
+        self._initialized = True
+
+        # 配置根日志记录器
+        root_logger = logging.getLogger()
+        root_logger.setLevel(logging.INFO)
+
+        # 移除所有现有的处理器
+        for handler in root_logger.handlers[:]:
+            root_logger.removeHandler(handler)
+
+        # 创建日志目录
+        log_dir = Path("logs")
+        log_dir.mkdir(exist_ok=True)
+
+        # 创建格式化器
+        colored_formatter = ColoredFormatter(LOG_FORMAT, datefmt=DATE_FORMAT)
+        plain_formatter = logging.Formatter(LOG_FORMAT, datefmt=DATE_FORMAT)
+
+        # 控制台处理器（带颜色）
+        console_handler = logging.StreamHandler(sys.stdout)
+        console_handler.setFormatter(colored_formatter)
+        root_logger.addHandler(console_handler)
+
+        # 文件处理器（不带颜色，使用轮转）
+        file_handler = RotatingFileHandler(
+            log_dir / "pipeline.log",
+            maxBytes=MAX_LOG_SIZE,
+            backupCount=BACKUP_COUNT,
+            encoding="utf-8",
+        )
+        file_handler.setFormatter(plain_formatter)
+        root_logger.addHandler(file_handler)
+
+        # 配置 BERTopic 的日志记录器
+        setup_bertopic_logger(log_dir)
+
+        # 创建项目特定的logger
+        self.logger = logging.getLogger("TextMiningPipeline")
+        self.logger.setLevel(logging.INFO)
+
+        self.logger.propagate = True
+
+    def get_logger(self) -> logging.Logger:
+        """获取日志记录器实例."""
+        return self.logger
+
+    def set_level(self, level: int):
+        """设置日志级别."""
+        self.logger.setLevel(level)
+        logging.getLogger().setLevel(level)
+
+
+# Global logger instance
+logger = Logger().get_logger()
+
+
+def format_to_str_box(data: Union[dict[str, str], str], max_width: int = 80) -> str:
+    """
+    将字典或字符串格式化为指定格式的盒子字符串，自动处理长行.
+
+    一个中文字符的宽度等于两个英文字符的宽度.
+
+    参数:
+        data: 可以是字典或字符串
+              - 如果是字典：按key: value格式逐行显示
+              - 如果是字符串：将字符串按行分割并显示在盒子中
+        max_width: 每行最大显示宽度（不包括边框），默认80字符
+
+    返回:
+        格式化后的盒子字符串
+    """
+    # 计算字符的显示宽度
+    def get_display_width(text):
+        return sum(2 if "\u4e00" <= char <= "\u9fff" else 1 for char in text)
+
+    def wrap_text(text: str, available_width: int) -> list[str]:
+        """将长文本按指定宽度换行."""
+        if get_display_width(text) <= available_width:
+            return [text]
+
+        words = text.split()
+        lines = []
+        current_line: list[str] = []
+        current_width = 0
+
+        for word in words:
+            word_width = get_display_width(word)
+            if (
+                current_width + word_width + (1 if current_line else 0)
+                <= available_width
+            ):
+                if current_line:
+                    current_width += 1  # 空格的宽度
+                current_line.append(word)
+                current_width += word_width
+            else:
+                if current_line:
+                    lines.append(" ".join(current_line))
+                current_line = [word]
+                current_width = word_width
+
+        if current_line:
+            lines.append(" ".join(current_line))
+        return lines
+
+    result = ""
+    border_length = max_width + 4  # 添加左右边距
+
+    if isinstance(data, str):
+        lines = []
+        for line in data.split("\n"):
+            lines.extend(wrap_text(line, max_width))
+
+        # 创建顶部边框
+        result = "+" + "-" * (border_length - 2) + "+\n"
+
+        # 添加每一行内容
+        for line in lines:
+            display_width = get_display_width(line)
+            padding = border_length - 4 - display_width  # -4 是因为"| "和" |"
+            result += f"| {line}" + " " * padding + " |\n"
+
+        # 添加底部边框
+        result += "+" + "-" * (border_length - 2) + "+"
+
+    elif isinstance(data, dict):
+        # 创建顶部边框
+        result = "+" + "-" * (border_length - 2) + "+\n"
+
+        # 添加每一行内容
+        for key, value in data.items():
+            prefix = f"{key}: "
+            prefix_width = get_display_width(prefix)
+            available_width = max_width - prefix_width
+
+            # 处理值的换行
+            value_lines = wrap_text(str(value), available_width)
+
+            # 添加第一行（带键名）
+            first_line = prefix + value_lines[0]
+            display_width = get_display_width(first_line)
+            padding = border_length - 4 - display_width
+            result += f"| {first_line}" + " " * padding + " |\n"
+
+            # 添加后续行（如果有的话）
+            for line in value_lines[1:]:
+                display_width = get_display_width(line)
+                # 对齐前一行的值
+                indent = " " * prefix_width
+                padding = border_length - 4 - display_width - prefix_width
+                result += f"| {indent}{line}" + " " * padding + " |\n"
+
+        # 添加底部边框
+        result += "+" + "-" * (border_length - 2) + "+"
+
+    else:
+        raise TypeError("输入必须是字符串或字典")
+
+    return "\n" + result
diff --git a/conversion_tools/src/utils.py b/conversion_tools/src/utils.py
index ea4882c..de46008 100644
--- a/conversion_tools/src/utils.py
+++ b/conversion_tools/src/utils.py
@@ -12,6 +12,7 @@
     'avazu': 'AVAZUDataset',
     'adult': 'ADULTDataset',
     'tmall': 'TMALLDataset',
+    'tmall_2014': 'TMALL2014Dataset',
     'netflix': 'NETFLIXDataset',
     'criteo': 'CRITEODataset',
     'foursquare': 'FOURSQUAREDataset',
@@ -63,20 +64,23 @@
     'mind_large_dev': 'MINDLargeDevDataset',
     'mind_small_train': 'MINDSmallTrainDataset',
     'mind_small_dev': 'MINDSmallDevDataset',
-    'cosmetics': 'CosmeticsDataset'
+    'cosmetics': 'CosmeticsDataset',
+    'taobao': 'TaobaoDataset'
 }
 
 click_dataset = {
     'YOOCHOOSEDataset',
     'RETAILROCKETDataset',
     'TMALLDataset',
+    'TMALL2014Dataset',
     'IPINYOUDataset',
     'TAFENGDataset',
     'LFM1bDataset',
     'GOWALLADataset',
     'DIGINETICADataset',
     'FOURSQUAREDataset',
-    'STEAMDataset'
+    'STEAMDataset',
+    'TaobaoDataset'
 }
 
 multiple_dataset = {
@@ -85,8 +89,10 @@
     'RETAILROCKETDataset',
     'TAFENGDataset',
     'TMALLDataset',
+    'TMALL2014Dataset',
     'IPINYOUDataset',
-    'LFM1bDataset'
+    'LFM1bDataset',
+    'TaobaoDataset'
 }
 
 multiple_item_features = {
diff --git a/conversion_tools/usage/Taobao.md b/conversion_tools/usage/Taobao.md
new file mode 100644
index 0000000..f8649c5
--- /dev/null
+++ b/conversion_tools/usage/Taobao.md
@@ -0,0 +1,117 @@
+# Taobao Dataset
+
+## Dataset Information
+
+**For detailed dataset information, please visit:** [Taobao User Behavior Dataset](https://tianchi.aliyun.com/dataset/dataDetail?dataId=649)
+
+## Prerequisites
+
+```bash
+git clone https://github.com/RUCAIBox/RecDatasets
+cd RecDatasets/conversion_tools
+pip install -r requirements.txt
+```
+
+## Data Conversion
+
+### Basic Usage
+
+```bash
+python run.py --dataset taobao \
+  --input_path /path/to/Taobao.csv \
+  --output_path output_data/taobao \
+  --interaction_type pv \
+  --convert_inter
+```
+
+### Parameters
+
+- `--dataset`: `taobao` (required)
+- `--input_path`: Path to the input data file (required)
+- `--output_path`: Directory to store converted files (required)
+- `--interaction_type`: `pv`, `cart`, `fav`, `buy`, or omit to merge all types (optional)
+- `--convert_inter`: Enable conversion (required)
+- `--duplicate_removal`: Enable deduplication (optional)
+
+**Note**: When `--interaction_type` is omitted, all four interaction types (pv, cart, fav, buy) will be merged into a single file with an additional `action_type` column.
+
+### Convert All Interaction Types
+
+#### Method 1: Convert Separately
+```bash
+for type in pv cart fav buy; do
+  python run.py --dataset taobao \
+    --input_path /path/to/Taobao.csv \
+    --output_path output_data/taobao \
+    --interaction_type $type \
+    --convert_inter
+done
+```
+
+#### Method 2: Convert All Types in One File (Recommended)
+```bash
+python run.py --dataset taobao \
+  --input_path /path/to/Taobao.csv \
+  --output_path output_data/taobao \
+  --convert_inter
+```
+
+## Output Format
+
+### Single Interaction Type
+Output file: `output_data/taobao/taobao-{interaction_type}/taobao-{interaction_type}.inter`
+
+```
+user_id:token	item_id:token	timestamp:float
+1	2268318	1511544070
+```
+
+### All Interaction Types (Merged)
+Output file: `output_data/taobao/taobao-merged/taobao-merged.inter`
+
+```
+user_id:token	item_id:token	timestamp:float	action_type:token
+1	2268318	1511544070	pv
+1	2268318	1511544071	cart
+1	2268318	1511544072	fav
+1	2268318	1511544073	buy
+```
+
+### With `--duplicate_removal`
+
+#### Single Type:
+```
+user_id:token	item_id:token	timestamp:float	interactions:float
+1	2268318	1511544070	3
+```
+
+#### Merged Types:
+```
+user_id:token	item_id:token	timestamp:float	action_type:token	interactions:float
+1	2268318	1511544070	pv	1
+1	2268318	1511544071	cart	2
+```
+
+## Dataset Statistics
+
+- **Total interactions**: ~100 million
+- **Behavior types**: pv (page view), cart (add to cart), fav (favorite), buy (purchase)
+- **Time period**: 2017-11-25 to 2017-12-03
+- **Users**: ~1 million
+- **Items**: ~4 million
+
+## Input Format
+
+The input CSV file should have the following format:
+```
+user_id,item_id,category_id,behavior_type,timestamp
+1,2268318,2520377,pv,1511544070
+1,2333346,2520771,pv,1511561733
+```
+
+Where:
+- `user_id`: User identifier
+- `item_id`: Item identifier  
+- `category_id`: Category identifier
+- `behavior_type`: One of `pv`, `cart`, `fav`, `buy`
+- `timestamp`: Unix timestamp
diff --git a/conversion_tools/usage/Tmall2014.md b/conversion_tools/usage/Tmall2014.md
new file mode 100644
index 0000000..ee6aae7
--- /dev/null
+++ b/conversion_tools/usage/Tmall2014.md
@@ -0,0 +1,94 @@
+# Tmall2014
+
+## Dataset Information
+
+**For detailed dataset information, please visit:** [Tianchi Tmall Recommendation Dataset](https://tianchi.aliyun.com/dataset/140281)
+
+## Prerequisites
+
+```bash
+git clone https://github.com/RUCAIBox/RecDatasets
+cd RecDatasets/conversion_tools
+pip install -r requirements.txt
+```
+
+## Data Conversion
+
+### Basic Usage
+
+```bash
+python run.py --dataset tmall_2014 \
+  --input_path /path/to/tianchi_2014002_rec_tmall_log_partc.txt \
+  --output_path output_data/tmall2014 \
+  --interaction_type click \
+  --convert_inter
+```
+
+### Parameters
+
+- `--dataset`: `tmall_2014` (required)
+- `--input_path`: Path to the input data file (required)
+- `--output_path`: Directory to store converted files (required)
+- `--interaction_type`: `click`, `cart`, `collect`, or `alipay` (optional, omit to merge all types)
+- `--convert_inter`: Enable conversion (required)
+- `--duplicate_removal`: Enable deduplication (optional)
+
+**Note**: When `--interaction_type` is omitted, all four interaction types (click, cart, collect, alipay) will be merged into a single file with an additional `action_type` column.
+
+### Convert All Interaction Types
+
+#### Method 1: Convert Separately
+```bash
+for type in click cart collect alipay; do
+  python run.py --dataset tmall_2014 \
+    --input_path /path/to/data.txt \
+    --output_path output_data/tmall2014 \
+    --interaction_type $type \
+    --convert_inter
+done
+```
+
+#### Method 2: Convert All Types in One File (Recommended)
+```bash
+python run.py --dataset tmall_2014 \
+  --input_path /path/to/data.txt \
+  --output_path output_data/tmall2014 \
+  --convert_inter
+```
+
+## Output Format
+
+### Single Interaction Type
+Output file: `output_data/tmall2014/tmall2014-{interaction_type}/tmall2014-{interaction_type}.inter`
+
+```
+user_id:token	item_id:token	timestamp:float
+u6276408	3903192	1377496871
+```
+
+### All Interaction Types (Merged)
+Output file: `output_data/tmall2014/tmall2014-merged/tmall2014-merged.inter`
+
+```
+user_id:token	item_id:token	timestamp:float	action_type:token
+u6276408	3903192	1377496871	click
+u6276408	3903192	1377496872	cart
+u6276408	3903192	1377496873	collect
+u6276408	3903192	1377496874	alipay
+```
+
+### With `--duplicate_removal`
+
+#### Single Type:
+```
+user_id:token	item_id:token	timestamp:float	interactions:float
+u6276408	3903192	1377496871	3
+```
+
+#### Merged Types:
+```
+user_id:token	item_id:token	timestamp:float	action_type:token	interactions:float
+u6276408	3903192	1377496871	click	1
+u6276408	3903192	1377496872	cart	2
+```
+
diff --git a/dataset_info/Tmall2014/README.md b/dataset_info/Tmall2014/README.md
new file mode 100644
index 0000000..7f327ea
--- /dev/null
+++ b/dataset_info/Tmall2014/README.md
@@ -0,0 +1,20 @@
+# Tmall2014
+
+## Dataset Overview
+
+Tmall2014 is a large-scale e-commerce dataset collected from Tmall.com (formerly Taobao Mall), containing user behavior logs from 2013. The dataset includes multiple types of user-item interactions: clicks, add-to-cart, favorites (collect), and purchases (alipay).
+
+**For detailed dataset information, please visit:** [Tianchi Tmall Recommendation Dataset](https://tianchi.aliyun.com/dataset/140281)
+
+## Data Format
+
+The original data file uses `\x01` (ASCII control character) as field separator. After conversion, the data is in RecBole atomic file format (tab-separated):
+
+```
+user_id:token	item_id:token	timestamp:float
+u6276408	3903192	1377496871
+```
+
+## Usage
+
+Please refer to the [conversion tool documentation](../../conversion_tools/usage/Tmall2014.md) for instructions on how to convert this dataset to RecBole format.