aws上傳文件、刪除文件、圖像識別

目錄

  • aws的上傳、刪除s3文件以及圖像識別文字功能
    • 準備工作
      • 安裝aws cli
      • 初始化配置AWS CLI
      • s3存儲桶開通
      • 圖像識別文字功能開通
      • aws的sdk
    • 上傳文件
      • 方法一
      • 方法二
    • 刪除文件
    • 圖像識別文字
      • 識別發票、賬單這種key,value的形式
      • 單純的識別文字
aws的上傳、刪除s3文件以及圖像識別文字功能準備工作安裝aws cli根據自己的操作系統 , 下載相應的安裝包安裝 。安裝過程很簡單,在此不再贅述 。
在安裝完成之后,運行以下兩個命令來驗證AWS CLI是否安裝成功 。參考以下示例 , 在MacOS上打開Terminal程序 。如果是Windows系統,打開cmd 。
  • where aws / which aws 查看AWS CLI安裝路徑
  • aws --version 查看AWS CLI版本
【aws上傳文件、刪除文件、圖像識別】zonghan@MacBook-Pro ~ % aws --versionaws-cli/2.0.30 Python/3.7.4 Darwin/21.6.0 botocore/2.0.0dev34zonghan@MacBook-Pro ~ % which aws/usr/local/bin/aws初始化配置AWS CLI在使用AWS CLI前,可使用aws configure命令,完成初始化配置 。
zonghan@MacBook-Pro ~ % aws configureAWS Access Key ID [None]: AKIA3GRZL6WIQEXAMPLEAWS Secret Access Key [None]: k+ci5r+hAcM3x61w1exampleDefault region name [None]: ap-east-1Default output format [None]: json
  • AWS Access Key ID 及AWS Secret Access Key可在AWS管理控制臺獲取,AWS CLI將會使用此信息作為用戶名、密碼連接AWS服務 。
    點擊AWS管理控制臺右上角的用戶名 --> 選擇Security Credentials

aws上傳文件、刪除文件、圖像識別

文章插圖
  • 點擊Create New Access Key以創建一對Access Key ID 及Secret Access Key,并保存(且僅能在創建時保存)

aws上傳文件、刪除文件、圖像識別

文章插圖
  • Default region name,用以指定要連接的AWS 區域代碼 。每個AWS區域對應的代碼可通過 此鏈接查找 。
  • Default output format,用以指定命令行輸出內容的格式,默認使用JSON作為所有輸出的格式 。也可以使用以下任一格式:JSON(JavaScript Object Notation)YAML: 僅在 AWS CLI v2 版本中可用TextTable
更多詳細的配置請看該文章
s3存儲桶開通該電腦配置的認證用戶在aws的s3上有權限訪問一個s3的存儲桶,這個一般都是管理員給你開通
圖像識別文字功能開通該電腦配置的認證用戶在aws的Amazon Textract的權限,這個一般都是管理員給你開通
aws的sdkimport boto3from botocore.exceptions import ClientError, BotoCoreError安裝上述boto3的模塊,一般會同時安裝botocore模塊
上傳文件方法一使用upload_file方法來上傳文件
import loggingimport boto3from botocore.exceptions import ClientErrorimport osdef upload_file(file_path, bucket, file_name=None):"""Upload a file to an S3 bucket:param file_name: File to upload:param bucket: Bucket to upload to:param object_name: S3 object name. If not specified then file_name is used:return: True if file was uploaded, else False"""# If S3 object_name was not specified, use file_nameif object_name is None:object_name = os.path.basename(file_name)# Upload the files3_client = boto3.client('s3')# s3 = boto3.resource('s3')try:response = s3_client.upload_file(file_path, bucket, file_name)# response = s3.Bucket(bucket).upload_file(file_name, object_name)except ClientError as e:logging.error(e)return Falsereturn True方法二使用PutObject來上傳文件
import loggingimport osimport boto3from botocore.exceptions import ClientError, BotoCoreErrorfrom django.conf import settingsfrom celery import shared_tasklogger = logging.getLogger(__name__)def upload_file_to_aws(file_path, bucket, file_name=None):"""Upload a file to an S3 bucket:param file_path: File to upload:param file_name: S3 object name. If not specified then file_path is used:return: True if file was uploaded, else False"""# If S3 object_name was not specified, use file_nameif file_name is None:file_name = os.path.basename(file_path)# Upload the files3 = boto3.resource('s3')try:with open(file_path, 'rb') as f:data = https://www.huyubaike.com/biancheng/f.read()obj = s3.Object(bucket, file_name)obj.put(Body=data)except BotoCoreError as e:logger.info(e)return Falsereturn True刪除文件def delete_aws_file(file_name, bucket):try:s3_client = boto3.client("s3")s3_client.delete_object(Bucket=bucket, Key=file_name)except Exception as e:logger.info(e)圖像識別文字識別發票、賬單這種key,value的形式def get_labels_and_values(result, field):if "LabelDetection" in field:key = field.get("LabelDetection")["Text"]value = https://www.huyubaike.com/biancheng/field.get("ValueDetection")["Text"]if key and value:if key.endswith(":"):key = key[:-1]result.append({key: value})def process_text_detection(bucket, document):try:client = boto3.client("textract", region_name="ap-south-1")response = client.analyze_expense(Document={"S3Object": {"Bucket": bucket, "Name": document}})except Exception as e:logger.info(e)raise "An unknown error occurred on the aws service"result = {}for expense_doc in response["ExpenseDocuments"]:for line_item_group in expense_doc["LineItemGroups"]:for line_items in line_item_group["LineItems"]:for expense_fields in line_items["LineItemExpenseFields"]:get_labels_and_values(result, expense_fields)for summary_field in expense_doc["SummaryFields"]:get_labels_and_values(result, summary_field)return resultdef get_extract_info(bucket, document):return process_text_detection(bucket, document)

推薦閱讀